Train on code your competitors can't access.
An engineering team of 30, 21+ programming languages, and 24/7 worldwide delivery — from a pilot to 12,000+ datapoints per month.
30 engineers, annotators, and domain experts. Rigorous hiring pipeline — up to 100 structured interviews per week.
Cross-validation by practicing industry experts. Multi-stage QC gates and per-rater calibration.
Compliance with data security and confidentiality standards. Full PII redaction across enterprise data.
Not just a data supplier — we own every stage from sourcing to evaluation. Plug us in at any step, or hand off the whole chain. Every handoff is reproducible.
Non-public repos, real enterprise content, licensed archives — never seen in public sets.
Prompt → response pairs, multi-turn dialogues, code-task authoring at scale.
Human-in-the-loop labeling, rationales, pairwise ranking across 40 criteria.
SWE-Bench, Multi-SWE, Terminal-Bench, RAG eval, Dockerized reproducible environments.
Agent-trajectory scoring, plan/thought eval, safety probes, regression testing.
Non-public code repositories, SWE-Bench benchmarks, alignment sets, and enterprise data.
Production code from real companies — never indexed, never crawled.
Around 3,000 proprietary codebases that have never appeared in public training sets (GitHub, GitLab, HuggingFace). Production-grade repositories from real companies — primarily sourced from a network of outsourcing agencies and startups whose products were discontinued or acquired.
Distribution: JS/TS 35%, PHP 30%, Obj-C/Swift 12%, Java 8%, Python 4%, Other 11%.
Composition: 54% discontinued / 46% active or maintained. Full legal rights to license every repository.
Start with a curated pilot subset to validate quality and fit before scaling.
Fully compatible with the Multi-SWE-Bench framework.
A ready-made non-public Python repository is available as a pilot — fully annotated and bench-ready.
Fine-tuning, alignment, RAG and safety data for code models.
Trajectories, architecture eval, dialog scoring and safety probes for code agents.
Internal enterprise data sourced from real companies — active, acquired, or wound down — each with certified consent to license.
Start with a 10-company pilot at a bulk rate.
Not just data delivery — our engineers integrate at every stage of your pipeline. Standard formats, your evaluation frameworks, schema design, ingestion adapters, continuous delivery on your cadence — data arrives ready to train or benchmark, no glue code on your side.
JSONL for SFT and DPO, Parquet for pre-training corpora, HuggingFace Datasets for publishing. Conversation data in ShareGPT, Alpaca, or OpenAI chat schemas — drop-in for your training loop, no conversion on your side.
Dockerized SWE-Bench and Multi-SWE-Bench harnesses (Harbor-compatible), RAG eval — patch-apply, build, and test runs work identically on our machines and yours.
Practicing developers, ML engineers, and domain specialists as an extension of your team — scoping taxonomies, defining quality criteria, resolving edge cases as they surface.
Expanded and improved version of the agent quality standard
16.04.25One consistent quality standard, no matter what you code in
24.12.24Cutting errors by 40% and costs by 60%
Curated subset of non-public repositories and benchmark tasks for hands-on quality validation. Schedule a technical deep dive with our engineering team.
Email: hi@fermatix.ai
Website: Fermatix.AI
AVENIDAS INTELIGENTES, LDA
Lg Alberto Sampaio, 3 A, Sala 10
Linda a Velha, 2795-007
Portugal