Pokemon RL Environment

A minimal, educational RL environment for Pokemon Red. Built to accompany the blog post "The RL Environment Field Guide".

Left: Naive agent gets stuck. Right: Learning agent remembers blocked paths and finds a way through.

What's This?

This repo contains:

PokemonEnv: A minimal Gymnasium-compatible RL environment for Pokemon Red
pokemon_rl/: A self-improving RL framework with goals, rewards, pattern detection, and knowledge extraction
Demo scripts: Create visualizations of agent behavior
Example agents: Naive, learning, Claude-powered, and reward-hacking agents

It's designed to be educational, not production-ready. Use it to understand how RL environments work, experiment with reward functions, and see reward hacking in action.

Project Structure

pokemon-rl-env/
├── run_demo.py                # Quick start script - verify your setup
├── src/
│   ├── environment.py         # Core RL environment (minimal, Gymnasium-compatible)
│   ├── pokemon_env_full.py    # Full-featured environment with goal tracking
│   ├── agents.py              # Example agents (naive, learning, reward hacker)
│   ├── claude_agent.py        # LLM-powered agent using Claude API
│   └── create_demo.py         # GIF demo generation
├── pokemon_rl/                # Self-improving RL framework
│   ├── goals.py               # Goal hierarchy and completion predicates
│   ├── rewards.py             # Goal-conditioned reward functions
│   ├── patterns.py            # Automatic failure/success pattern detection
│   ├── knowledge_extractor.py # Extract learnings from trajectories
│   ├── evaluator.py           # Performance evaluation on scenarios
│   └── orchestrator.py        # Coordinate the learning loop
├── examples/
│   └── basic_usage.py         # Usage examples & tutorials
├── prompts/
│   └── system_prompt.md       # System prompt with learned navigation knowledge
├── data/
│   └── sample_snapshots/      # Sample save states for testing
├── assets/demos/              # Pre-generated demo GIFs
├── requirements.txt
└── README.md

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Get a Pokemon Red ROM

You must provide your own ROM file. The easiest way is to build from source using the pret/pokered disassembly:

# Clone the disassembly repo
git clone https://github.com/pret/pokered.git
cd pokered

# Install RGBDS (the assembler)
# macOS: brew install rgbds
# Ubuntu: sudo apt install rgbds

# Build the ROM
make

# This creates pokered.gbc - copy it to your project
cp pokered.gbc ../pokemon-rl-env/

Alternatively, dump from a physical cartridge using a GB Operator.

3. Verify Your Setup

# Quick verification - runs a learning agent for 200 steps
python run_demo.py --rom pokered.gbc

This will automatically use the included sample save states from data/sample_snapshots/.

4. Create Demo GIFs (Optional)

# Create comparison GIF of naive vs learning agents
python src/create_demo.py --rom pokered.gbc --state data/sample_snapshots/snapshot_005000.state --output assets/demos/

Sample Save States

The repo includes sample save states in data/sample_snapshots/ for testing:

File	Location	Pokemon	Description
`snapshot_001000.state`	Player House 2F	None	Very early game
`snapshot_005000.state`	Route 1	Lv13 starter	After leaving Pallet Town
`snapshot_045000.state`	Route 1	Lv19 starter	Trained on Route 1
`snapshot_050000.state`	Route 1 (battle)	Lv19 starter	In a wild battle

The Environment

Basic Usage

from src.environment import PokemonEnv

# Create environment
env = PokemonEnv("pokered.gbc")
state = env.reset()

# Take actions
for _ in range(100):
    action = env.action_space.sample()  # Random action
    state, reward, done, info = env.step(action)

env.close()

State Space

{
    'x': 5,              # Player X position
    'y': 30,             # Player Y position
    'map_id': 12,        # Current map (12 = Route 1)
    'hp': 45,            # Party HP
    'badges': 0,         # Badge count (bitfield)
    'in_battle': False,  # Battle state
}

Action Space

8 discrete actions matching Game Boy buttons:

Index	Action
0	A
1	B
2	Left
3	Right
4	Up
5	Down
6	Start
7	Select

Default Reward Function

+1 for visiting a new tile
+100 for earning a badge
+10 for reaching a new map
-0.1 per step (encourages efficiency)

Customize by subclassing:

class MyPokemonEnv(PokemonEnv):
    def compute_reward(self, old_state, new_state):
        reward = 0
        if new_state['badges'] > old_state['badges']:
            reward += 1000  # Big reward for badges!
        return reward

Example Agents

Naive Agent

Just goes up. Gets stuck at ledges.

from src.agents import NaiveAgent
agent = NaiveAgent()
action = agent.act(state)  # Always returns 'up'

Learning Agent

Remembers blocked positions and explores alternatives.

from src.agents import LearningAgent
agent = LearningAgent()
action = agent.act(state)
agent.learn(old_state, action, new_state)  # Learn from feedback

Claude Agent (LLM-Powered)

Uses Claude API with system prompt containing learned navigation knowledge.

from src.claude_agent import ClaudeAgent

agent = ClaudeAgent(system_prompt_path="prompts/system_prompt.md")
action = agent.select_action(state)

Reward Hacker Agent

Demonstrates reward hacking by always fleeing from battles.

from src.agents import RewardHackerAgent
agent = RewardHackerAgent()
# Maximizes "survival" reward by running from everything

The Self-Improving Framework

The pokemon_rl/ module implements a self-improving agent system:

from pokemon_rl import (
    GoalManager,       # Hierarchical goal tracking
    RewardFunction,    # Goal-conditioned rewards
    PatternDetector,   # Detect stuck loops, oscillations
    KnowledgeExtractor,# Extract learnings from trajectories
    Evaluator,         # Benchmark performance
    Orchestrator,      # Run the complete learning loop
)

# Example: Detect patterns in a trajectory
detector = PatternDetector()
patterns = detector.analyze_trajectory("data/traj_001.jsonl")
for p in patterns:
    print(f"{p.type}: {p.description}")

Memory Addresses

Key Pokemon Red memory addresses (from PRET pokered):

Address	Description
`0xD35E`	Current map ID
`0xD362`	Player X position
`0xD361`	Player Y position
`0xD057`	Battle type (>0 = in battle)
`0xD163`	Party Pokemon count
`0xD16C`	First Pokemon HP
`0xD356`	Badges (bitfield)

Reward Hacking Examples

This repo demonstrates common reward hacking behaviors:

Tile Toggling: Agent bounces between tiles to farm exploration rewards
Heal Farming: Agent damages itself then heals for HP rewards
Battle Fleeing: Agent runs from all battles to maximize "survival" rewards

See assets/demos/demo_reward_hack_flee.gif for a visual demonstration.

Requirements

Python 3.8+
PyBoy (Game Boy emulator)
Pillow (image processing)
NumPy
anthropic (for Claude agent)

Credits

Inspired by Peter Whidden's Pokemon RL
Memory addresses from PRET disassembly
Built with PyBoy

License

MIT License. Note: You must provide your own Pokemon Red ROM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pokemon RL Environment

What's This?

Project Structure

Quick Start

1. Install Dependencies

2. Get a Pokemon Red ROM

3. Verify Your Setup

4. Create Demo GIFs (Optional)

Sample Save States

The Environment

Basic Usage

State Space

Action Space

Default Reward Function

Example Agents

Naive Agent

Learning Agent

Claude Agent (LLM-Powered)

Reward Hacker Agent

The Self-Improving Framework

Memory Addresses

Reward Hacking Examples

Requirements

Credits

License

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
data/sample_snapshots		data/sample_snapshots
docs		docs
examples		examples
pokemon_rl		pokemon_rl
prompts		prompts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_demo.py		run_demo.py

Folders and files

Latest commit

History

Repository files navigation

Pokemon RL Environment

What's This?

Project Structure

Quick Start

1. Install Dependencies

2. Get a Pokemon Red ROM

3. Verify Your Setup

4. Create Demo GIFs (Optional)

Sample Save States

The Environment

Basic Usage

State Space

Action Space

Default Reward Function

Example Agents

Naive Agent

Learning Agent

Claude Agent (LLM-Powered)

Reward Hacker Agent

The Self-Improving Framework

Memory Addresses

Reward Hacking Examples

Requirements

Credits

License

Related

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages