A minimal, educational RL environment for Pokemon Red. Built to accompany the blog post "The RL Environment Field Guide".
Left: Naive agent gets stuck. Right: Learning agent remembers blocked paths and finds a way through.
This repo contains:
PokemonEnv: A minimal Gymnasium-compatible RL environment for Pokemon Redpokemon_rl/: A self-improving RL framework with goals, rewards, pattern detection, and knowledge extraction- Demo scripts: Create visualizations of agent behavior
- Example agents: Naive, learning, Claude-powered, and reward-hacking agents
It's designed to be educational, not production-ready. Use it to understand how RL environments work, experiment with reward functions, and see reward hacking in action.
pokemon-rl-env/
├── run_demo.py # Quick start script - verify your setup
├── src/
│ ├── environment.py # Core RL environment (minimal, Gymnasium-compatible)
│ ├── pokemon_env_full.py # Full-featured environment with goal tracking
│ ├── agents.py # Example agents (naive, learning, reward hacker)
│ ├── claude_agent.py # LLM-powered agent using Claude API
│ └── create_demo.py # GIF demo generation
├── pokemon_rl/ # Self-improving RL framework
│ ├── goals.py # Goal hierarchy and completion predicates
│ ├── rewards.py # Goal-conditioned reward functions
│ ├── patterns.py # Automatic failure/success pattern detection
│ ├── knowledge_extractor.py # Extract learnings from trajectories
│ ├── evaluator.py # Performance evaluation on scenarios
│ └── orchestrator.py # Coordinate the learning loop
├── examples/
│ └── basic_usage.py # Usage examples & tutorials
├── prompts/
│ └── system_prompt.md # System prompt with learned navigation knowledge
├── data/
│ └── sample_snapshots/ # Sample save states for testing
├── assets/demos/ # Pre-generated demo GIFs
├── requirements.txt
└── README.md
pip install -r requirements.txtYou must provide your own ROM file. The easiest way is to build from source using the pret/pokered disassembly:
# Clone the disassembly repo
git clone https://github.com/pret/pokered.git
cd pokered
# Install RGBDS (the assembler)
# macOS: brew install rgbds
# Ubuntu: sudo apt install rgbds
# Build the ROM
make
# This creates pokered.gbc - copy it to your project
cp pokered.gbc ../pokemon-rl-env/Alternatively, dump from a physical cartridge using a GB Operator.
# Quick verification - runs a learning agent for 200 steps
python run_demo.py --rom pokered.gbcThis will automatically use the included sample save states from data/sample_snapshots/.
# Create comparison GIF of naive vs learning agents
python src/create_demo.py --rom pokered.gbc --state data/sample_snapshots/snapshot_005000.state --output assets/demos/The repo includes sample save states in data/sample_snapshots/ for testing:
| File | Location | Pokemon | Description |
|---|---|---|---|
snapshot_001000.state |
Player House 2F | None | Very early game |
snapshot_005000.state |
Route 1 | Lv13 starter | After leaving Pallet Town |
snapshot_045000.state |
Route 1 | Lv19 starter | Trained on Route 1 |
snapshot_050000.state |
Route 1 (battle) | Lv19 starter | In a wild battle |
from src.environment import PokemonEnv
# Create environment
env = PokemonEnv("pokered.gbc")
state = env.reset()
# Take actions
for _ in range(100):
action = env.action_space.sample() # Random action
state, reward, done, info = env.step(action)
env.close(){
'x': 5, # Player X position
'y': 30, # Player Y position
'map_id': 12, # Current map (12 = Route 1)
'hp': 45, # Party HP
'badges': 0, # Badge count (bitfield)
'in_battle': False, # Battle state
}8 discrete actions matching Game Boy buttons:
| Index | Action |
|---|---|
| 0 | A |
| 1 | B |
| 2 | Left |
| 3 | Right |
| 4 | Up |
| 5 | Down |
| 6 | Start |
| 7 | Select |
- +1 for visiting a new tile
- +100 for earning a badge
- +10 for reaching a new map
- -0.1 per step (encourages efficiency)
Customize by subclassing:
class MyPokemonEnv(PokemonEnv):
def compute_reward(self, old_state, new_state):
reward = 0
if new_state['badges'] > old_state['badges']:
reward += 1000 # Big reward for badges!
return rewardJust goes up. Gets stuck at ledges.
from src.agents import NaiveAgent
agent = NaiveAgent()
action = agent.act(state) # Always returns 'up'Remembers blocked positions and explores alternatives.
from src.agents import LearningAgent
agent = LearningAgent()
action = agent.act(state)
agent.learn(old_state, action, new_state) # Learn from feedbackUses Claude API with system prompt containing learned navigation knowledge.
from src.claude_agent import ClaudeAgent
agent = ClaudeAgent(system_prompt_path="prompts/system_prompt.md")
action = agent.select_action(state)Demonstrates reward hacking by always fleeing from battles.
from src.agents import RewardHackerAgent
agent = RewardHackerAgent()
# Maximizes "survival" reward by running from everythingThe pokemon_rl/ module implements a self-improving agent system:
from pokemon_rl import (
GoalManager, # Hierarchical goal tracking
RewardFunction, # Goal-conditioned rewards
PatternDetector, # Detect stuck loops, oscillations
KnowledgeExtractor,# Extract learnings from trajectories
Evaluator, # Benchmark performance
Orchestrator, # Run the complete learning loop
)
# Example: Detect patterns in a trajectory
detector = PatternDetector()
patterns = detector.analyze_trajectory("data/traj_001.jsonl")
for p in patterns:
print(f"{p.type}: {p.description}")Key Pokemon Red memory addresses (from PRET pokered):
| Address | Description |
|---|---|
0xD35E |
Current map ID |
0xD362 |
Player X position |
0xD361 |
Player Y position |
0xD057 |
Battle type (>0 = in battle) |
0xD163 |
Party Pokemon count |
0xD16C |
First Pokemon HP |
0xD356 |
Badges (bitfield) |
This repo demonstrates common reward hacking behaviors:
- Tile Toggling: Agent bounces between tiles to farm exploration rewards
- Heal Farming: Agent damages itself then heals for HP rewards
- Battle Fleeing: Agent runs from all battles to maximize "survival" rewards
See assets/demos/demo_reward_hack_flee.gif for a visual demonstration.
- Python 3.8+
- PyBoy (Game Boy emulator)
- Pillow (image processing)
- NumPy
- anthropic (for Claude agent)
- Inspired by Peter Whidden's Pokemon RL
- Memory addresses from PRET disassembly
- Built with PyBoy
MIT License. Note: You must provide your own Pokemon Red ROM.
