Skip to content

PlatformNetwork/data-fabrication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

data-fabrication

Agentic coding dataset fabrication subnet for Platform

License Platform SDK

Data Fabrication Banner

Overview

Data Fabrication rewards miners who generate useful agentic coding conversation datasets. Miners submit complete dataset-generation harnesses, the subnet executes and reviews them, then rewards hotkeys that produce high-quality, diverse, verifiable, and original examples.

The subnet is built for synthetic data work where quality matters more than volume. A strong submission should produce conversations with realistic coding tasks, tool calls, reasoning traces, final answers, and enough variation to be valuable for downstream agent training.

What The Subnet Does

  1. Miners submit a complete harness package.
  2. The challenge rejects unsafe or malformed archives before execution.
  3. The harness is reviewed for structure, safety, and originality.
  4. The harness generates an agentic coding dataset.
  5. The dataset is parsed and scored for quality, behavior, diversity, and verifiability.
  6. Similarity checks identify cloned or low-effort submissions.
  7. The best completed score per miner becomes the raw Platform weight.

Reward Focus

Data Fabrication rewards:

  • high-quality coding tasks with clear intent;
  • coherent multi-turn conversations;
  • realistic tool and function-call usage;
  • reasoning that supports the final answer;
  • verifiable outputs and useful final responses;
  • diverse examples rather than repeated templates;
  • original harness design rather than copied structure.

Scoring

Final score:

score = weighted_quality + weighted_agentic_signals + weighted_originality

Dataset quality is dominant, with additional weight for agentic tool use, reasoning, coding relevance, verifiability, diversity, and originality. Scores are normalized to [0, 1], so Platform weights can directly use each miner’s best completed score.

Lifecycle

flowchart LR
    Miner["Miner submits harness"] --> Review["Safety and originality review"]
    Review --> Run["Dataset generation"]
    Run --> Score["Quality scoring"]
    Score --> Store["Persisted result"]
    Store --> Weights["Platform weights"]
Loading

Roles

Miners

Miners design harnesses that generate agentic coding conversations. The goal is to maximize useful dataset quality while staying inside the published format, safety, and originality constraints.

Validators

Validators run the challenge, configure execution limits, inspect evaluation health, and expose the current score-derived weights to Platform.

Platform

Platform proxies public challenge data, reads the protected weight contract, and normalizes the raw scores into final subnet emissions.

Documentation

Detailed guides live under docs/:

Repository Layout

data-fabrication/
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ miner/
β”‚   └── validator/
β”œβ”€β”€ src/data_fabrication/
β”œβ”€β”€ tests/
β”œβ”€β”€ config.example.yaml
└── Dockerfile

License

Apache-2.0

About

[πŸ›’οΈ] data-fabrication is a challenge project from the Platform subnet, where developers are incentivized to create diverse and high-performance datasets. Datasets are evaluated in isolated environments, rewarded based on quality and utility, and continuously improved through encrypted and competitive collaboration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors