Skip to content

Latest commit

 

History

History
119 lines (88 loc) · 4.14 KB

File metadata and controls

119 lines (88 loc) · 4.14 KB

Benchmarking imitation

The src/imitation/scripts/config/tuned_hps directory provides the tuned hyperparameter configs for benchmarking imitation. For v0.4.0, these correspond to the hyperparameters used in the paper imitation: Clean Imitation Learning Implementations.

Configuration files can be loaded either from the CLI or from the Python API.

Single benchmark

To run a single benchmark from the command line:

python -m imitation.scripts.<train_script> <algo> with <algo>_<env>

train_script can be either 1) train_imitation with algo as bc or dagger or 2) train_adversarial with algo as gail or airl. The env can be either of seals_ant, seals_half_cheetah, seals_hopper, seals_swimmer, or seals_walker. The hyperparameters for other environments are not tuned yet. You may be able to get reasonable performance by using hyperparameters tuned for a similar environment; alternatively, you can tune the hyperparameters using the tuning script.

To view the results:

python -m imitation.scripts.analyze analyze_imitation with \
    source_dir_str="output/sacred" table_verbosity=0  \
    csv_output_path=results.csv \
    run_name="<name>"

To run a single benchmark from Python add the config to your Sacred experiment ex:

...
from imitation.scripts.<train_script> import <train_ex>
<train_ex>.run(command_name="<algo>", named_configs=["<algo>_<env>"])

Entire benchmark suite

Running locally

To generate the commands to run the entire benchmarking suite with multiple random seeds:

python experiments/commands.py \
  --name=<name> \
  --cfg_pattern "benchmarking/example_*.json" \
  --seeds 0 1 2 \
  --output_dir=output

To run those commands in parallel:

python experiments/commands.py \
  --name=<name> \
  --cfg_pattern "benchmarking/example_*.json" \
  --seeds 0 1 2 \
  --output_dir=output | parallel -j 8

(You may need to brew install parallel to get this to work on Mac.)

Running on Hofvarpnir

To generate the commands for the Hofvarpnir cluster:

python experiments/commands.py \
  --name=<name> \
  --cfg_pattern "benchmarking/example_*.json" \
  --seeds 0 1 2 \
  --output_dir=/data/output \
  --remote

To run those commands pipe them into bash:

python experiments/commands.py \
  --name <name> \
  --cfg_pattern "benchmarking/example_*.json" \
  --seeds 0 1 2 \
  --output_dir /data/output \
  --remote | bash

Results

To produce a table with all the results:

python -m imitation.scripts.analyze analyze_imitation with \
    source_dir_str="output/sacred" table_verbosity=0  \
    csv_output_path=results.csv \
    run_name="<name>"

To compute a p-value to test whether the differences from the paper are statistically significant:

python -m imitation.scripts.compare_to_baseline results.csv

Tuning Hyperparameters

The hyperparameters of any algorithm in imitation can be tuned using src/imitation/scripts/tuning.py. The benchmarking hyperparameter configs were generated by tuning the hyperparameters using the search space defined in the scripts/config/tuning.py.

The tuning script proceeds in two phases:

  1. Tune the hyperparameters using the search space provided.
  2. Re-evaluate the best hyperparameter config found in the first phase based on the maximum mean return on a separate set of seeds. Report the mean and standard deviation of these trials.

To use it with the default search space:

python -m imitation.scripts.tuning with <algo> 'parallel_run_config.base_named_configs=["<env>"]'

In this command:

  • <algo> provides the default search space and settings for the specific algorithm, which is defined in the scripts/config/tuning.py
  • <env> sets the environment to tune the algorithm in. They are defined in the algo-specifc scripts/config/train_[adversarial|imitation|preference_comparisons|rl].py files. For the already tuned environments, use the <algo>_<env> named configs here.

See the documentation of scripts/tuning.py and scripts/parallel.py for many other arguments that can be provided through the command line to change the tuning behavior.