PaperCoder is a multi-agent LLM system that transforms paper into a code repository.
It follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents.
Our method outperforms strong baselines on both Paper2Code and PaperBench and produces faithful, high-quality implementations.
- ⚡ Quick Start
- 📚 Detailed Setup Instructions
- 📦 Paper2Code Benchmark Datasets
- 📊 Model-based Evaluation of Repositories
- Note: The following command runs example paper (Attention Is All You Need).
- 💵 Estimated cost for using o3-mini: $0.50–$0.70
pip install openai
export OPENAI_API_KEY="<OPENAI_API_KEY>"
cd scripts
bash run.sh- If you encounter any issues installing vLLM, please refer to the official vLLM repository.
- The default model is
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.
pip install vllm
cd scripts
bash run_llm.sh- PaperCoder now supports any LLM provider available through LiteLLM
- Configure your model settings in a
.envfile in the project root directory (see.env.example) - Supports standard LiteLLM provider syntax including:
- AWS Bedrock (
bedrock/model-name) - requires boto3 - OpenAI (
openai/model-name) - uses o3-mini by default - Anthropic (
anthropic/model-name) - direct API access
- AWS Bedrock (
Choose ONE of the following provider configurations in your .env file:
AWS_REGION=<your-region>
BEDROCK_MODEL=<model-name> # e.g., anthropic.claude-3-sonnet-20240229-v1:0
DISABLE_PROMPT_CACHING=0
AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials
AWS_CONFIG_FILE=~/.aws/config
OPENAI_API_KEY=<your-openai-api-key>
OPENAI_MODEL=o3-mini # Default if not specified
ANTHROPIC_API_KEY=<your-anthropic-api-key>
ANTHROPIC_MODEL=claude-3-sonnet-20240229 # Default if not specified
# Install LiteLLM
pip install litellm
# For provider-specific dependencies:
# - AWS Bedrock requires boto3
pip install boto3
# Copy and modify the example .env file
cp .env.example .env
# Edit the .env file with your provider configuration
# Run the scripts - they will use LiteLLM if configured or fall back to vLLM
cd scripts
bash run_llm.shoutputs
├── Transformer
│ ├── analyzing_artifacts
│ ├── coding_artifacts
│ └── planning_artifacts
└── Transformer_repo # Final output repository- 💡 To use the
o3-miniversion, make sure you have the latestopenaipackage installed. - 📦 Install only what you need:
- For OpenAI API:
openai - For open-source models:
vllm- If you encounter any issues installing vLLM, please refer to the official vLLM repository.
- For other LLM providers (like AWS Bedrock):
litellm- Check the LiteLLM documentation for supported models and configurations.
- For OpenAI API:
pip install openai
pip install vllm
pip install litellm- Or, if you prefer, you can install all dependencies using
pip:
pip install -r requirements.txtThe following process describes how to convert a paper PDF into JSON format.
If you have access to the LaTeX source and plan to use it with PaperCoder, you may skip this step and proceed to 🚀 Running PaperCoder.
Note: In our experiments, we converted all paper PDFs to JSON format.
- Clone the
s2orc-doc2jsonrepository to convert your PDF file into a structured JSON format.
(For detailed configuration, please refer to the official repository.)
git clone https://github.com/allenai/s2orc-doc2json.git- Run the PDF processing service.
cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run- Convert your PDF into JSON format.
mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
-i ${PDF_PATH} \
-t ./s2orc-doc2json/temp_dir/ \
-o ./s2orc-doc2json/output_dir/paper_coder- Note: The following command runs example paper (Attention Is All You Need).
If you want to run PaperCoder on your own paper, please modify the environment variables accordingly.
- 💵 Estimated cost for using o3-mini: $0.50–$0.70
# Using the PDF-based JSON format of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"
cd scripts
bash run.sh# Using the LaTeX source of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"
cd scripts
bash run_latex.sh- The default model is
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct(using vLLM) - For LiteLLM integration, create a
.envfile in the project root with your provider configuration
# Using the PDF-based JSON format of the paper
cd scripts
bash run_llm.sh# Using the LaTeX source of the paper
cd scripts
bash run_latex_llm.sh# Example .env configuration (AWS Bedrock with Claude)
AWS_REGION=eu-north-1
BEDROCK_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
DISABLE_PROMPT_CACHING=0
AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials
AWS_CONFIG_FILE=~/.aws/config
# Or for OpenAI
# OPENAI_API_KEY=your-api-key
# OPENAI_MODEL=o3-mini
# Or for Anthropic Direct API
# ANTHROPIC_API_KEY=your-api-key
# ANTHROPIC_MODEL=claude-3-sonnet-20240229-
Huggingface dataset: paper2code
-
You can find the description of the Paper2Code benchmark dataset in data/paper2code.
-
For more details, refer to Section 4.1 "Paper2Code Benchmark" in the paper.
-
We evaluate repository quality using a model-based approach, supporting both reference-based and reference-free settings.
The model critiques key implementation components, assigns severity levels, and generates a 1–5 correctness score averaged over 8 samples using o3-mini-high. -
For more details, please refer to Section 4.3.1 (Paper2Code Benchmark) of the paper.
-
Note: The following examples evaluate the sample repository (Transformer_repo).
Please modify the relevant paths and arguments if you wish to evaluate a different repository.
pip install tiktoken
export OPENAI_API_KEY="<OPENAI_API_KEY>"target_repo_diris the generated repository.
cd codes/
python eval.py \
--paper_name Transformer \
--pdf_json_path ../examples/Transformer_cleaned.json \
--data_dir ../data \
--output_dir ../outputs/Transformer \
--target_repo_dir ../outputs/Transformer_repo \
--eval_result_dir ../results \
--eval_type ref_free \
--generated_n 8 \
--papercodertarget_repo_diris the generated repository.gold_repo_dirshould point to the official repository (e.g., author-released code).
cd codes/
python eval.py \
--paper_name Transformer \
--pdf_json_path ../examples/Transformer_cleaned.json \
--data_dir ../data \
--output_dir ../outputs/Transformer \
--target_repo_dir ../outputs/Transformer_repo \
--gold_repo_dir ../examples/Transformer_gold_repo \
--eval_result_dir ../results \
--eval_type ref_based \
--generated_n 8 \
--papercoder========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
📈 Score: 4.5000
✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
============================================