Building Foundation Models for Human Behavior Simulation

We are building human simulators: foundation models that imitate how people think, feel, decide, and act across interactive scenarios. The next foundation model should not just answer humans, but simulate human-side behavior with realistic social grounding.

This repository is the training and evaluation codebase for the OdysSim project. It contains the full pipeline for behavioral foundation models, from midtraining on large-scale human-behavior data to task-specific RL, verbal-feedback post-training, expert distillation, and SOUL evaluation.

Ditto

Reinforcing Human Behavior Simulation via Verbal Feedback

Doc · Paper · Model · Recipe

OdysSim

Building Foundation Models for Human Behavior Simulation

Paper · Models · Data

Benchmark

Features

Behavioral midtraining / SFT: train base models on large-scale human-behavior corpora such as OdysSim, with social grounding in the prompt format.
Multi-turn, multi-agent RL: built on top of verl, with rollout loops for human-simulation training across interacting agents.
Learning from verbal feedback: efficient support for verbal-feedback RL, forward distillation, and reverse/on-policy distillation from LLM-judge critiques.
Unified SOUL evaluation suite: 20+ human-likeness tasks with training environments.
Unified SFT/RL/evaluation framework: midtraining, post-training, and evaluation share one code path.

News

[2026/06/11] We released OdysSim.

[2026/05/20] We released Ditto.

Models

Model	Link
OSim-8B	OdysSim collection
Ditto-8B	sunweiwei/Ditto-8B

Setup

Note: This repo is built on top of verl v0.7.0, with this patch applied to support multi-agent RL, on-policy distillation, and several model fixes.

Run inside the official verl 0.7.0 image verlai/verl:vllm012.latest.

Code structure

verl/                              Core RL/SFT training infrastructure
agents/                            Agent rollout loops and task environments
sft/                               SFT and midtraining utilities
recipe/ditto/                      Frozen recipe for the Ditto paper
plot/NeurIPS2026_user_sim_phase3/  OdysSim paper source
data/                              Local data directory

run_sft.sh                         Midtraining / SFT entry
run_rl.sh                          Per-task RL entry: GRPO or verbal-feedback RL
recipe/ditto/eval.sh               Eval-only entry across the SOUL suite
train_sft.py                       SFT trainer
train_ppo.py                       PPO/GRPO trainer

Data

OdysSim release data:

Split	Dataset
Midtraining	`cmu-lti/osim-mid-training`
Post-training	`cmu-lti/osim-post-training`

huggingface-cli download cmu-lti/osim-mid-training  --repo-type dataset --local-dir data/osim_mid_training
huggingface-cli download cmu-lti/osim-post-training --repo-type dataset --local-dir data/osim_post_training

Ditto / legacy task data used by the current run_rl.sh and recipe/ditto/eval.sh scripts:

Split	Dataset
RL Train	`sunweiwei/sim-rl-data`
Eval	`sunweiwei/sim-eval-data`

huggingface-cli download sunweiwei/sim-rl-data   --repo-type dataset --local-dir data/sim_rl_data
huggingface-cli download sunweiwei/sim-eval-data --repo-type dataset --local-dir data/sim_eval_data

Each task has its own train / validation parquet.

Midtraining / SFT

run_sft.sh is the entry point for SFT-style training and OdysSim midtraining. By default it follows the paper setup: Qwen3-8B base, 16K-token prompts, 8K-token responses, batch size 1024, peak LR 1e-5, 4500 training steps, and lazy loading for the full OdysSim corpus.

After downloading cmu-lti/osim-mid-training into data/osim_mid_training, the script auto-detects the train and validation shards. For a custom layout, pass explicit globs through TRAIN_FILES and VAL_FILES.

# Default: DATA_DIR=data/osim_mid_training
bash run_sft.sh

# Explicit shard layout
TRAIN_FILES="data/osim_mid_training/train_shard_*.parquet" \
VAL_FILES="data/osim_mid_training/val_shard_*.parquet" \
bash run_sft.sh

Common overrides:

DATA_DIR=data/osim_mid_training \
ACTOR_MODEL_PATH=Qwen/Qwen3-8B \
EXPERIMENT_NAME=osim-8b-mid \
N_GPUS=8 \
TOTAL_TRAINING_STEPS=4500 \
bash run_sft.sh

Optional RL-style generative evaluation during SFT is disabled by default. To enable it, set RL_TEST_FILES and a positive RL_TEST_FREQ.

RL Post-training

Post-training is per task. The agent_version setting in run_rl.sh selects the objective:

default = vanilla GRPO
copy = verbal-feedback RL, as used by Ditto

The training loop calls an OpenAI-compatible judge model for verbal critique / rewrite when verbal-feedback RL is enabled, so set the API env vars first:

export OPENAI_API_KEY=...
export OPENAI_BASE_URL=https://api.openai.com/v1/

Run one task:

# Top-level script defaults to vanilla GRPO.
bash run_rl.sh sotopia

# Ditto recipe defaults to verbal-feedback RL.
bash recipe/ditto/run_rl.sh sotopia

Supported tasks: sotopia, coser, lifechoices, userllm, mirrorbench, fantom, hitom, paratomi, mistakes, twinvoice, social_r1, behaviorchain, sim_math, sim_doc, humanual_{book,chat,email,news,opinion,politics}, alignx, socsci210, humanllm.

Evaluation

recipe/ditto/eval.sh runs the full 27-task SOUL evaluation suite in two modes: local for a checkpoint or open-source HF model via vLLM, and api for any OpenAI-compatible endpoint.

# Eval the released Ditto-8B checkpoint
bash recipe/ditto/eval.sh local

# Eval your own trained checkpoint
ACTOR_MODEL_PATH=outputs/ditto-rl-sotopia/global_step_200 \
bash recipe/ditto/eval.sh local

# Eval an open-source HF model
ACTOR_MODEL_PATH=Qwen/Qwen3-8B-Instruct \
bash recipe/ditto/eval.sh local

# Eval an API model
OPENAI_AGENT_MODEL=gpt-5.4-mini \
OPENAI_AGENT_BASE_URL=https://api.openai.com/v1/ \
OPENAI_AGENT_API_KEY=$OPENAI_API_KEY \
bash recipe/ditto/eval.sh api

# Eval a local vLLM / SGLang server through an OpenAI-compatible endpoint
OPENAI_AGENT_MODEL=Qwen3-8B-Instruct \
OPENAI_AGENT_BASE_URL=http://localhost:8000/v1/ \
OPENAI_AGENT_API_KEY=EMPTY \
bash recipe/ditto/eval.sh api

Citation

@article{zhou2026odyssim,
  title  = {OdysSim: Building Foundation Models for Human Behavior Simulation},
  author = {Zhou, Xuhui and Sun, Weiwei and Du, Weihua and Liu, Jiarui and Sun, Haojia and Ma, Qianou and Wu, Tongshuang and Yang, Yiming and Sap, Maarten},
  year   = {2026}
}

@article{sun2026ditto,
  title         = {Reinforcing Human Behavior Simulation via Verbal Feedback},
  author        = {Sun, Weiwei and Zhou, Xuhui and Liu, Jiarui and Du, Weihua and Sun, Haojia and Xie, Yiqing and Ma, Qianou and Chen, Sihao and Wan, Mengting and Yang, Longqi and Zhou, Pei and Wu, Sherry and Welleck, Sean and Neubig, Graham and Yang, Yiming and Sap, Maarten},
  year          = {2026},
  eprint        = {2605.20506},
  archivePrefix = {arXiv},
  url           = {http://arxiv.org/abs/2605.20506}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,889 Commits
.gemini		.gemini
.github		.github
.vscode		.vscode
agents		agents
assets		assets
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
sft		sft
tests		tests
verl		verl
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
README_DITTO.md		README_DITTO.md
README_VERL.md		README_VERL.md
pyproject.toml		pyproject.toml
requirements-cuda.txt		requirements-cuda.txt
requirements-npu.txt		requirements-npu.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
run_rl.sh		run_rl.sh
run_sft.sh		run_sft.sh
setup.py		setup.py
train_ppo.py		train_ppo.py
train_sft.py		train_sft.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Foundation Models for Human Behavior Simulation

Ditto

OdysSim

Benchmark

Features

News

Models

Setup

Code structure

Data

Midtraining / SFT

RL Post-training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Building Foundation Models for Human Behavior Simulation

Ditto

OdysSim

Benchmark

Features

News

Models

Setup

Code structure

Data

Midtraining / SFT

RL Post-training

Evaluation

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages