data-fabrication

Agentic coding dataset fabrication subnet for Platform

Overview

Data Fabrication rewards miners who generate useful agentic coding conversation datasets. Miners submit complete dataset-generation harnesses, the subnet executes and reviews them, then rewards hotkeys that produce high-quality, diverse, verifiable, and original examples.

The subnet is built for synthetic data work where quality matters more than volume. A strong submission should produce conversations with realistic coding tasks, tool calls, reasoning traces, final answers, and enough variation to be valuable for downstream agent training.

What The Subnet Does

Miners submit a complete harness package.
The challenge rejects unsafe or malformed archives before execution.
The harness is reviewed for structure, safety, and originality.
The harness generates an agentic coding dataset.
The dataset is parsed and scored for quality, behavior, diversity, and verifiability.
Similarity checks identify cloned or low-effort submissions.
The best completed score per miner becomes the raw Platform weight.

Reward Focus

Data Fabrication rewards:

high-quality coding tasks with clear intent;
coherent multi-turn conversations;
realistic tool and function-call usage;
reasoning that supports the final answer;
verifiable outputs and useful final responses;
diverse examples rather than repeated templates;
original harness design rather than copied structure.

Scoring

Final score:

score = weighted_quality + weighted_agentic_signals + weighted_originality

Dataset quality is dominant, with additional weight for agentic tool use, reasoning, coding relevance, verifiability, diversity, and originality. Scores are normalized to [0, 1], so Platform weights can directly use each miner’s best completed score.

Lifecycle

flowchart LR
    Miner["Miner submits harness"] --> Review["Safety and originality review"]
    Review --> Run["Dataset generation"]
    Run --> Score["Quality scoring"]
    Score --> Store["Persisted result"]
    Store --> Weights["Platform weights"]

Roles

Miners

Miners design harnesses that generate agentic coding conversations. The goal is to maximize useful dataset quality while staying inside the published format, safety, and originality constraints.

Validators

Validators run the challenge, configure execution limits, inspect evaluation health, and expose the current score-derived weights to Platform.

Platform

Platform proxies public challenge data, reads the protected weight contract, and normalizes the raw scores into final subnet emissions.

Documentation

Detailed guides live under docs/:

Repository Layout

data-fabrication/
├── docs/
│   ├── miner/
│   └── validator/
├── src/data_fabrication/
├── tests/
├── config.example.yaml
└── Dockerfile

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs		docs
src/data_fabrication		src/data_fabrication
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-fabrication

Overview

What The Subnet Does

Reward Focus

Scoring

Lifecycle

Roles

Miners

Validators

Platform

Documentation

Repository Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

data-fabrication

Overview

What The Subnet Does

Reward Focus

Scoring

Lifecycle

Roles

Miners

Validators

Platform

Documentation

Repository Layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages