Train on code your competitors can't access.
Covering every stage of code-data collection and annotation — from pre-training to post-training.
An engineering team of 124, 21+ programming languages, and 24/7 worldwide delivery — from a pilot to 12,000+ datapoints per month.
124 engineers, annotators, and domain experts. Rigorous hiring pipeline — up to 100 structured interviews per week.
Cross-validation across practicing industry experts. Project-specific optimizations speed up labeling by up to 300%.
Compliance with data security and confidentiality standards. Full PII redaction across enterprise data.
Not just a data supplier — we own every stage from sourcing to evaluation. Plug us in at any step, or hand off the whole chain. Every handoff is reproducible.
Non-public repos, real enterprise content, licensed archives — never seen in public sets.
Prompt → response pairs, multi-turn dialogues, code-task authoring at scale.
Human-in-the-loop labeling, rationales, pairwise ranking across 40 criteria.
SWE-Bench, Multi-SWE, Terminal-Bench, RAG eval, Dockerized reproducible environments.
Agent-trajectory scoring, plan/thought eval, safety probes, regression testing.
Non-public code repositories, SWE-Bench benchmarks, alignment sets, and enterprise data.
Production code from real companies — never indexed, never crawled.
Around 3,000 proprietary codebases that have never appeared in public training sets (GitHub, GitLab, HuggingFace). Production-grade repositories from real companies — primarily sourced from a network of outsourcing agencies and startups whose products were discontinued or acquired.
Distribution: JS/TS 35%, PHP 30%, Obj-C/Swift 12%, Java 8%, Python 4%, Other 11%.
Composition: 54% discontinued / 46% active or maintained. Full legal rights to license every repository.
Start with a curated pilot subset to validate quality and fit before scaling.
Fully compatible with the Multi-SWE-Bench framework.
A ready-made non-public Python repository is available as a pilot — fully annotated and bench-ready.
The full alignment stack — fine-tuning, evaluation, safety, and red-teaming — covering both code models and code agents.
Internal enterprise data sourced from real companies — active, acquired, or wound down — each with certified consent to license.
Start with a 10-company pilot at a bulk rate.
Beyond the ready-to-license catalog, our team builds bespoke datasets from scoping to delivery — sourcing, annotation, QC, delivery. Pre-training corpora, evaluation benchmarks, RLHF pairs, agent trajectories, safety probes, RAG sets — whatever your pipeline needs.
A dataset on its own rarely moves the needle. Our team stands up the infrastructure around it — benchmark harnesses, eval pipelines, annotation tooling, embedded experts — so the data is usable on day one and keeps producing signal long after.
Dockerized SWE-Bench, Multi-SWE-Bench, and Terminal-Bench-style harnesses with golden/test patches, install scripts, and Parquet metadata — patch-apply, build, and test runs work identically on our machines and yours.
Schema design, format conversion, ingestion adapters, and continuous delivery on your cadence. Datasets land in the shape your pipeline already expects — no glue code on your side.
Agent-trajectory and plan/thought scoring, RAG retrieval accuracy with path:line citations, pairwise ranking across 40 criteria — calibrated against expert baselines.
Argilla and bespoke portals stood up per project: task taxonomy, batch assignment, multi-stage QC gates, and per-rater calibration configured to match your evaluation methodology.
Practicing developers, ML engineers, and domain specialists working as an extension of your team — scoping task taxonomies, defining quality criteria, and resolving edge cases as they surface.
Adversarial multi-step scenarios, MCP tool-access stress tests, dialog safety probes, and regression suites that catch failure modes before deployment.
Most AI vendors do everything. Fermatix does code data — and only code data — at production grade for the labs and startups building frontier code models.
Public code corpora are saturated. The frontier of code-model performance now depends on code that hasn't been seen — proprietary repositories, real enterprise interactions, multimodal corpora with documented provenance. Sourcing it takes a different kind of organization: legal infrastructure, anonymization pipelines, expert-only annotation. That's the only thing Fermatix builds.
Three years working with leading code-AI teams has shaped the catalog. Every dataset in production today exists because a partner asked for something they couldn't get elsewhere. We deliver the data, document its origin, and stay out of the way.
Long-term engagements with frontier code-AI teams. We grow inside our clients' roadmaps, not outside them.
Non-public codebases and enterprise data competitors can't access — continuously sourced, all properly licensed, traced to origin.
Every output reviewed by working engineers with production experience. No crowd-sourced labeling, no untrained annotators.
Expanded and improved version of the agent quality standard
16.04.25One consistent quality standard, no matter what you code in
24.12.24Cutting errors by 40% and costs by 60%
Curated subset of non-public repositories and benchmark tasks for hands-on quality validation. Schedule a technical deep dive with our engineering team.
Email: hi@fermatix.ai
Website: Fermatix.AI
AVENIDAS INTELIGENTES, LDA
Lg Alberto Sampaio, 3 A, Sala 10
Linda a Velha, 2795-007
Portugal