📄 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

📄 Read the paper on arXiv

PaperCoder is a multi-agent LLM system that transforms paper into a code repository. It follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents.
Our method outperforms strong baselines on both Paper2Code and PaperBench and produces faithful, high-quality implementations.

⚡ Quick Start

Note: The following command runs example paper (Attention Is All You Need).

Using OpenAI API

💵 Estimated cost for using o3-mini: $0.50–$0.70

pip install openai

export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run.sh

Using Open Source Models with vLLM

If you encounter any issues installing vLLM, please refer to the official vLLM repository.
The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.

pip install vllm

cd scripts
bash run_llm.sh

Using Other LLM Providers with LiteLLM

PaperCoder now supports any LLM provider available through LiteLLM
Configure your model settings in a .env file in the project root directory (see .env.example)
Supports standard LiteLLM provider syntax including:
- AWS Bedrock (bedrock/model-name) - requires boto3
- OpenAI (openai/model-name) - uses o3-mini by default
- Anthropic (anthropic/model-name) - direct API access

LiteLLM Provider Configurations

Choose ONE of the following provider configurations in your .env file:

1. AWS Bedrock

AWS_REGION=<your-region>
BEDROCK_MODEL=<model-name>  # e.g., anthropic.claude-3-sonnet-20240229-v1:0
DISABLE_PROMPT_CACHING=0
AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials
AWS_CONFIG_FILE=~/.aws/config

2. OpenAI

OPENAI_API_KEY=<your-openai-api-key>
OPENAI_MODEL=o3-mini  # Default if not specified

3. Anthropic Direct API

ANTHROPIC_API_KEY=<your-anthropic-api-key>
ANTHROPIC_MODEL=claude-3-sonnet-20240229  # Default if not specified

# Install LiteLLM
pip install litellm

# For provider-specific dependencies:
# - AWS Bedrock requires boto3
pip install boto3

# Copy and modify the example .env file
cp .env.example .env
# Edit the .env file with your provider configuration

# Run the scripts - they will use LiteLLM if configured or fall back to vLLM
cd scripts
bash run_llm.sh

Output Folder Structure (Only Important Files)

outputs
├── Transformer
│   ├── analyzing_artifacts
│   ├── coding_artifacts
│   └── planning_artifacts
└── Transformer_repo # Final output repository

📚 Detailed Setup Instructions

🛠️ Environment Setup

💡 To use the o3-mini version, make sure you have the latest openai package installed.
📦 Install only what you need:
- For OpenAI API: openai
- For open-source models: vllm
  - If you encounter any issues installing vLLM, please refer to the official vLLM repository.
- For other LLM providers (like AWS Bedrock): litellm
  - Check the LiteLLM documentation for supported models and configurations.

pip install openai 
pip install vllm
pip install litellm

Or, if you prefer, you can install all dependencies using pip:

pip install -r requirements.txt

📄 (Option) Convert PDF to JSON

The following process describes how to convert a paper PDF into JSON format.
If you have access to the LaTeX source and plan to use it with PaperCoder, you may skip this step and proceed to 🚀 Running PaperCoder.
Note: In our experiments, we converted all paper PDFs to JSON format.

Clone the s2orc-doc2json repository to convert your PDF file into a structured JSON format.
(For detailed configuration, please refer to the official repository.)

git clone https://github.com/allenai/s2orc-doc2json.git

Run the PDF processing service.

cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run

Convert your PDF into JSON format.

mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
    -i ${PDF_PATH} \
    -t ./s2orc-doc2json/temp_dir/ \
    -o ./s2orc-doc2json/output_dir/paper_coder

🚀 Running PaperCoder

Note: The following command runs example paper (Attention Is All You Need).
If you want to run PaperCoder on your own paper, please modify the environment variables accordingly.

Using OpenAI API

💵 Estimated cost for using o3-mini: $0.50–$0.70

# Using the PDF-based JSON format of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run.sh

# Using the LaTeX source of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run_latex.sh

Using Open Source Models with vLLM or LiteLLM

The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct (using vLLM)
For LiteLLM integration, create a .env file in the project root with your provider configuration

# Using the PDF-based JSON format of the paper
cd scripts
bash run_llm.sh

# Using the LaTeX source of the paper
cd scripts
bash run_latex_llm.sh

# Example .env configuration (AWS Bedrock with Claude)
AWS_REGION=eu-north-1
BEDROCK_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
DISABLE_PROMPT_CACHING=0
AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials
AWS_CONFIG_FILE=~/.aws/config

# Or for OpenAI
# OPENAI_API_KEY=your-api-key
# OPENAI_MODEL=o3-mini

# Or for Anthropic Direct API
# ANTHROPIC_API_KEY=your-api-key
# ANTHROPIC_MODEL=claude-3-sonnet-20240229

📦 Paper2Code Benchmark Datasets

Huggingface dataset: paper2code
You can find the description of the Paper2Code benchmark dataset in data/paper2code.
For more details, refer to Section 4.1 "Paper2Code Benchmark" in the paper.

📊 Model-based Evaluation of Repositories Generated by PaperCoder

We evaluate repository quality using a model-based approach, supporting both reference-based and reference-free settings.
The model critiques key implementation components, assigns severity levels, and generates a 1–5 correctness score averaged over 8 samples using o3-mini-high.
For more details, please refer to Section 4.3.1 (Paper2Code Benchmark) of the paper.
Note: The following examples evaluate the sample repository (Transformer_repo).
Please modify the relevant paths and arguments if you wish to evaluate a different repository.

🛠️ Environment Setup

pip install tiktoken
export OPENAI_API_KEY="<OPENAI_API_KEY>"

📝 Reference-free Evaluation

target_repo_dir is the generated repository.

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --eval_result_dir ../results \
    --eval_type ref_free \
    --generated_n 8 \
    --papercoder

📝 Reference-based Evaluation

target_repo_dir is the generated repository.
gold_repo_dir should point to the official repository (e.g., author-released code).

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --gold_repo_dir ../examples/Transformer_gold_repo \
    --eval_result_dir ../results \
    --eval_type ref_based \
    --generated_n 8 \
    --papercoder

📄 Example Output

========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
        📈 Score: 4.5000
        ✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
============================================

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📄 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

🗺️ Table of Contents

⚡ Quick Start

Using OpenAI API

Using Open Source Models with vLLM

Using Other LLM Providers with LiteLLM

LiteLLM Provider Configurations

1. AWS Bedrock

2. OpenAI

3. Anthropic Direct API

Output Folder Structure (Only Important Files)

📚 Detailed Setup Instructions

🛠️ Environment Setup

📄 (Option) Convert PDF to JSON

🚀 Running PaperCoder

Using OpenAI API

Using Open Source Models with vLLM or LiteLLM

📦 Paper2Code Benchmark Datasets

📊 Model-based Evaluation of Repositories Generated by PaperCoder

🛠️ Environment Setup

📝 Reference-free Evaluation

📝 Reference-based Evaluation

📄 Example Output

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

📄 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

🗺️ Table of Contents

⚡ Quick Start

Using OpenAI API

Using Open Source Models with vLLM

Using Other LLM Providers with LiteLLM

LiteLLM Provider Configurations

1. AWS Bedrock

2. OpenAI

3. Anthropic Direct API

Output Folder Structure (Only Important Files)

📚 Detailed Setup Instructions

🛠️ Environment Setup

📄 (Option) Convert PDF to JSON

🚀 Running PaperCoder

Using OpenAI API

Using Open Source Models with vLLM or LiteLLM

📦 Paper2Code Benchmark Datasets

📊 Model-based Evaluation of Repositories Generated by PaperCoder

🛠️ Environment Setup

📝 Reference-free Evaluation

📝 Reference-based Evaluation

📄 Example Output