This includes the experiments answering the following research questions RQ1, RQ2, RQ3 mentioned in the paper:
-
RQ1. Model performance comparison on synthetic and manual data: How do the manual and synthetic training data in SWE-Synth influence the performance of models, when the training data is controlled to have either (a) the same total number of variants, or (b) the same total number of trajectories?
-
RQ2. Synthetic Data Scaling: How does increasing the number of synthetic training instances affect model performance?
-
RQ3. Human Study: How well can human subjects distinguish SWE-Synth's results from real-world, manually collected bugs?
Run create_swegym_logs.sh to create the SWE-Gym-logs dataset for the experiments.
bash swesynth/experiments/swegym_comparison/setup/create_swegym_logs.shWe released the resulting SWE-Gym logs dataset in the following link: https://huggingface.co/datasets/swesynth/SWE-Gym-logs
After created the dataset, you need to install llama-factory and moatless as described in setup instructions
bash swesynth/lib/llama_factory/install-llama-factory.sh
bash swesynth/lib/moatless/install-moatless-fork.shFrom created SWE-Gym log data, run the following command to rollout trajectories on SWE-Gym Lite logs and then cap the instances:
bash swesynth/experiments/swegym_comparison/same_instances/rollout_swegym_lite.sh
python -m swesynth.experiments.swegym_comparison.same_instances.cap_instancesthen run the following command to collect the trajectories from SWE-Synth with the same total number of variants:
python -m swesynth.experiments.swegym_comparison.same_instances.sample_variantsThen modify llama-factory dataset_info.json to this, then run the following command to train the models with different capping per instance settings using config files in ./same_instances folder.
conda activate swesynth-llama-factory
llamafactory-cli train swesynth/experiments/swegym_comparison/same_instances/swegym-lite-cap1.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/same_instances/swegym-lite-cap2.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/same_instances/swegym-lite-cap3.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/same_instances/swegym-mutant-cap1.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/same_instances/swegym-mutant-cap2.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/same_instances/swegym-mutant-cap3.yamlFrom created SWE-Gym log data, run the following command to rollout trajectories on SWE-Gym Lite logs and then cap the trajectories:
bash swesynth/experiments/swegym_comparison/same_trajectories/rollout_swegym_lite.sh
python -m swesynth.experiments.swegym_comparison.same_trajectories.cap_trajthen run the following command to collect the trajectories from SWE-Synth with the same total number of trajectories:
python -m swesynth.experiments.swegym_comparison.same_trajectories.sample_trajThen modify llama-factory dataset_info.json to this, then run the following command to train the models with different settings using config files in ./same_trajectories folder.
llamafactory-cli train swesynth/experiments/swegym_comparison/same_trajectories/mutant-1k.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/same_trajectories/swegym-lite-cap30.yamlFrom created SWE-Gym log data, run the following command to rollout and collect trajectories on SWE-Gym Full logs:
bash swesynth/experiments/swegym_comparison/data_scaling/rollout_swegym_full.shFollows the similar setup as above to train remaining experiments using config files in ./data_scaling folder.
llamafactory-cli train swesynth/experiments/swegym_comparison/data_scaling/swegym-full-cap1.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/data_scaling/swegym-full-cap2.yaml
llamafactory-cli train swesynth/experiments/swegym_comparison/data_scaling/swegym-lite-cap14.yamlRun the following command to sample the synthetic and manual data for the human study:
python -m swesynth.experiments.swegym_comparison.human_study.random_real_bug
python -m swesynth.experiments.swegym_comparison.human_study.random_synthetic_bugNoted that readers are encouraged to take our survey (RQ3), available in the following link: https://survey.swesynth.com
The source code for the human study can be found in human_study/web folder.