Deep Deterministic Policy Gradient (DDPG) with Prioritized Experience Replay

Authors: Luca Iezzi and Giulia Ciabatti.

This consists of a complete reimplementation of DDPG with PrioritizedExperience Replay, and its adaptation on Pendulum-v1 and MountainCarContinuous-v0, from OpenAI Gym.

Agent playing

Usage

This implementation is based on Python 3.8 and PyTorch Lightning. To install all the requirements:

$ pip install -r requirements.txt

Hyperparameters

PER_BUFFER

ACTOR

CRITIC

hyperparameter	value
WARM_POPULATE	10000
ALPHA	0.6
BETA	0.1
PRIORITIZED_REPLAY_ALPHA size	0.6
PRIORITIZED_REPLAY_BETA0 size	0.4
PRIORITIZED_REPLAY_BETA_ITERS	None
PRIORITIZED_REPLAY_EPS rate	1e-6
BATCH_SIZE	64
Latent size	64
EPISODES	150

hyperparameter	value
OU_NOISE_STD	0.8
OPTIMIZER	Adam
LEARNING RATE	1e-4
GAMMA	0.99
TAU	5e-3

hyperparameter	value
OPTIMIZER	Adam
LEARNING RATE	5e-4
GAMMA	0.99
TAU	5e-3

Results

Running

The complete pipeline to train the 3 model components:

1. Train the agent

In simple_config.py, set ENV=[gym env you want to train on], set TRAIN=True and run:

$ python main.py

1. Test the agent

In main.py, manually copy the path of one of the checkpoints in ckpt/ in the variable model, set TRAIN=False and RENDER=True in simple_config.py and run:

$ python main.py

Credits

Some of the implementations (e.g. SumTree and MinTree) have been taken by existing repos, but it's been impossible to track the original author :( . If you recognize your code there, don't hesitate to drop me an email and I will add your repo to the credits!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ckpt		ckpt
playing		playing
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
agent_utils.py		agent_utils.py
buffer_utils.py		buffer_utils.py
buffers.py		buffers.py
dataset.py		dataset.py
ddpg.py		ddpg.py
general_utils.py		general_utils.py
main.py		main.py
simple_config.py		simple_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Deterministic Policy Gradient (DDPG) with Prioritized Experience Replay

Usage

Hyperparameters

Results

Running

1. Train the agent

1. Test the agent

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Deterministic Policy Gradient (DDPG) with Prioritized Experience Replay

Usage

Hyperparameters

Results

Running

1. Train the agent

1. Test the agent

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages