Skip to content

ebennettp/q-learning_taxi-v3

Repository files navigation

Overview

An implementation of Q-learning with automated hyperparameter optimization. It is based on Gymnasium's Taxi-v3 environment and uses Optuna to find best set parameters to maximaze rewards. The Q-table values are updated using the Bellman equation. This can be used as a template for other environments.

Scripts (entry points)

optimize.py: Wraps the agent's training to measure reward across multiple trials with different hyperparameters values to efficiently find the best set of parameters. Parameter values that yield the best results are then stored to the local directory. Number of trials and ranges for hyperparameters are specified in the configuration file (for more details see the Configuration section).

use.py: Train and evaluate the Q-learning agent. Training stage will read the stored hyperparameters, train a new agent and save the Q-table. Evaluation stage will measure average reward over multiple episodes. Training and evaluation can be run separately using script arguements.

Configuration

The configuration is loaded from a JSON file, by defaut on path ./config.json which contains configurable settings for training, evaluation and the hyperparameter optimization. See an example in config.json.

Hyperprameters

lr: Learning rate for the Q-learning algorithm. gamma: Discount factor for future rewards. epsilon: Exploration rate for the epsilon-greedy policy. epsilon_decay: Decay rate for the exploration rate. epsilon_min: Minimum exploration rate.

About

A Q-Learning implementation for the Gym's Taxi-v3 environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages