Book Recommender as a Multiple-Play Multi-Armed Bandit (UCB Solution)

Overview

This project treats a book recommendation scenario as a multiple-play multi-armed bandit problem:

We implemented a UCB1-style algorithm adapted for selecting the top-6 books simultaneously, using individual per-book reward feedback.

Modified the environment to return individual purchase information (essential for proper credit assignment)
Fixed numerical stability issues in UCB computation (safe handling of unexplored books)
Used standard UCB1 confidence term with optional tuning of the exploration constant
Ran both short (10k steps) and long (100k steps) simulations

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
LICENSE		LICENSE
README.md		README.md