Skip to content

theveryhim/UCB-Based-Recommender-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Book Recommender as a Multiple-Play Multi-Armed Bandit (UCB Solution)

Overview

This project treats a book recommendation scenario as a multiple-play multi-armed bandit problem:

  • There are N books (arms)
  • At each round we display 6 books to a random user
  • Reward = number of displayed books the user actually purchases (0–6)
  • Goal: maximize cumulative (or average) purchases over many interactions

We implemented a UCB1-style algorithm adapted for selecting the top-6 books simultaneously, using individual per-book reward feedback.

Key Implementation Details

  • Modified the environment to return individual purchase information (essential for proper credit assignment)
  • Fixed numerical stability issues in UCB computation (safe handling of unexplored books)
  • Used standard UCB1 confidence term with optional tuning of the exploration constant
  • Ran both short (10k steps) and long (100k steps) simulations

Results

10,000-step run (standard UCB, c = 2)

Average Reward (10k steps)

100,000-step run (tuned UCB, c = 5)

Average Reward (100k steps)

About

Implementing a UCB1-style algorithm adapted for selecting the top-6 books simultaneously

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors