Skip to content

LiberCoders/FeatureBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

arXiv License DockerHub HuggingFace Leaderboard


FeatureBench is a test-driven data generation and evaluation pipeline for feature-level coding benchmarks. It provides a unified CLI to run inference, evaluation, and dataset generation.

📰 News

🎁 2026.02.06: We now support one-click inference for mainstream agent frameworks, including OpenHands, Claude Code, Codex, Gemini CLI, and mini-swe-agent. All supported agent frameworks can be found here. We have also open-sourced the FeatureBench data pipeline.

🚀 Quickstart

Prerequisites:

  • uv for Python environment management
  • docker for reproducible builds and evaluation
# pypi
pip install featurebench
# or uv add featurebench

# local
git clone https://github.com/LiberCoders/FeatureBench.git
cd FeatureBench
uv sync
source .venv/bin/activate

Configure:

cp config_example.toml config.toml

See docs/config.md for a comprehensive reference (harness, infer, data pipeline) with examples.

Optional: pre-pull images to reduce network variance:

fb pull --mode lite                 # lite split image list (13 images)
fb pull --mode full                 # full split image list (24 images)
fb pull --mode /path/to/images.txt  # one image name per line

# full list: featurebench/resources/constants/full_images.txt
# lite list: featurebench/resources/constants/lite_images.txt

Run inference:

fb infer \
    --config-path config.toml \
    --agent mini_swe_agent \
    --model openai/qwen3-coder-480b-a35b-instruct \
    --split lite

Run evaluation:

fb eval \
    -p runs/<timestamp>/output.jsonl \
    --split lite
    # use -p gold to verify the gold patches

🧭 CLI Overview

fb provides three core commands:

✍️ Citation

If you found FeatureBench useful, please cite us as:

@article{zhou2026featurebench,
  title={FeatureBench: Benchmarking Agentic Coding for Complex Feature Development},
  author={Zhou, Qixing and Zhang, Jiacheng and Wang, Haiyang and Hao, Rui and Wang, Jiahe and Han, Minghao and Yang, Yuxue and Wu, Shuzhe and Pan, Feiyang and Fan, Lue and others},
  journal={arXiv preprint arXiv:2602.10975},
  year={2026}
}

📧 Contact

If you have any questions, feel free to contact qixingzhou1125@gmail.com or zjcheng2022@gmail.com.

About

[ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors