Data Fabrication rewards miners who generate useful agentic coding conversation datasets. Miners submit complete dataset-generation harnesses, the subnet executes and reviews them, then rewards hotkeys that produce high-quality, diverse, verifiable, and original examples.
The subnet is built for synthetic data work where quality matters more than volume. A strong submission should produce conversations with realistic coding tasks, tool calls, reasoning traces, final answers, and enough variation to be valuable for downstream agent training.
- Miners submit a complete harness package.
- The challenge rejects unsafe or malformed archives before execution.
- The harness is reviewed for structure, safety, and originality.
- The harness generates an agentic coding dataset.
- The dataset is parsed and scored for quality, behavior, diversity, and verifiability.
- Similarity checks identify cloned or low-effort submissions.
- The best completed score per miner becomes the raw Platform weight.
Data Fabrication rewards:
- high-quality coding tasks with clear intent;
- coherent multi-turn conversations;
- realistic tool and function-call usage;
- reasoning that supports the final answer;
- verifiable outputs and useful final responses;
- diverse examples rather than repeated templates;
- original harness design rather than copied structure.
Final score:
score = weighted_quality + weighted_agentic_signals + weighted_originality
Dataset quality is dominant, with additional weight for agentic tool use, reasoning, coding relevance, verifiability, diversity, and originality. Scores are normalized to [0, 1], so Platform weights can directly use each minerβs best completed score.
flowchart LR
Miner["Miner submits harness"] --> Review["Safety and originality review"]
Review --> Run["Dataset generation"]
Run --> Score["Quality scoring"]
Score --> Store["Persisted result"]
Store --> Weights["Platform weights"]
Miners design harnesses that generate agentic coding conversations. The goal is to maximize useful dataset quality while staying inside the published format, safety, and originality constraints.
Validators run the challenge, configure execution limits, inspect evaluation health, and expose the current score-derived weights to Platform.
Platform proxies public challenge data, reads the protected weight contract, and normalizes the raw scores into final subnet emissions.
Detailed guides live under docs/:
data-fabrication/
βββ docs/
β βββ miner/
β βββ validator/
βββ src/data_fabrication/
βββ tests/
βββ config.example.yaml
βββ Dockerfile
Apache-2.0
