An implementation of Q-learning with automated hyperparameter optimization. It is based on Gymnasium's Taxi-v3 environment and uses Optuna to find best set parameters to maximaze rewards. The Q-table values are updated using the Bellman equation. This can be used as a template for other environments.
optimize.py: Wraps the agent's training to measure reward across multiple trials with different hyperparameters values to efficiently find the best set of parameters. Parameter values that yield the best results are then stored to the local directory. Number of trials and ranges for hyperparameters are specified in the configuration file (for more details see the Configuration section).
use.py: Train and evaluate the Q-learning agent. Training stage will read the stored hyperparameters, train a new agent and save the Q-table. Evaluation stage will measure average reward over multiple episodes. Training and evaluation can be run separately using script arguements.
The configuration is loaded from a JSON file, by defaut on path ./config.json which contains configurable settings for training, evaluation and the hyperparameter optimization. See an example in config.json.
lr: Learning rate for the Q-learning algorithm. gamma: Discount factor for future rewards. epsilon: Exploration rate for the epsilon-greedy policy. epsilon_decay: Decay rate for the exploration rate. epsilon_min: Minimum exploration rate.