Skip to content

hanfang/pokemon-rl-env

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pokemon RL Environment

A minimal, educational RL environment for Pokemon Red. Built to accompany the blog post "The RL Environment Field Guide".

Naive vs Learning Agent

Left: Naive agent gets stuck. Right: Learning agent remembers blocked paths and finds a way through.

What's This?

This repo contains:

  • PokemonEnv: A minimal Gymnasium-compatible RL environment for Pokemon Red
  • pokemon_rl/: A self-improving RL framework with goals, rewards, pattern detection, and knowledge extraction
  • Demo scripts: Create visualizations of agent behavior
  • Example agents: Naive, learning, Claude-powered, and reward-hacking agents

It's designed to be educational, not production-ready. Use it to understand how RL environments work, experiment with reward functions, and see reward hacking in action.

Project Structure

pokemon-rl-env/
├── run_demo.py                # Quick start script - verify your setup
├── src/
│   ├── environment.py         # Core RL environment (minimal, Gymnasium-compatible)
│   ├── pokemon_env_full.py    # Full-featured environment with goal tracking
│   ├── agents.py              # Example agents (naive, learning, reward hacker)
│   ├── claude_agent.py        # LLM-powered agent using Claude API
│   └── create_demo.py         # GIF demo generation
├── pokemon_rl/                # Self-improving RL framework
│   ├── goals.py               # Goal hierarchy and completion predicates
│   ├── rewards.py             # Goal-conditioned reward functions
│   ├── patterns.py            # Automatic failure/success pattern detection
│   ├── knowledge_extractor.py # Extract learnings from trajectories
│   ├── evaluator.py           # Performance evaluation on scenarios
│   └── orchestrator.py        # Coordinate the learning loop
├── examples/
│   └── basic_usage.py         # Usage examples & tutorials
├── prompts/
│   └── system_prompt.md       # System prompt with learned navigation knowledge
├── data/
│   └── sample_snapshots/      # Sample save states for testing
├── assets/demos/              # Pre-generated demo GIFs
├── requirements.txt
└── README.md

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Get a Pokemon Red ROM

You must provide your own ROM file. The easiest way is to build from source using the pret/pokered disassembly:

# Clone the disassembly repo
git clone https://github.com/pret/pokered.git
cd pokered

# Install RGBDS (the assembler)
# macOS: brew install rgbds
# Ubuntu: sudo apt install rgbds

# Build the ROM
make

# This creates pokered.gbc - copy it to your project
cp pokered.gbc ../pokemon-rl-env/

Alternatively, dump from a physical cartridge using a GB Operator.

3. Verify Your Setup

# Quick verification - runs a learning agent for 200 steps
python run_demo.py --rom pokered.gbc

This will automatically use the included sample save states from data/sample_snapshots/.

4. Create Demo GIFs (Optional)

# Create comparison GIF of naive vs learning agents
python src/create_demo.py --rom pokered.gbc --state data/sample_snapshots/snapshot_005000.state --output assets/demos/

Sample Save States

The repo includes sample save states in data/sample_snapshots/ for testing:

File Location Pokemon Description
snapshot_001000.state Player House 2F None Very early game
snapshot_005000.state Route 1 Lv13 starter After leaving Pallet Town
snapshot_045000.state Route 1 Lv19 starter Trained on Route 1
snapshot_050000.state Route 1 (battle) Lv19 starter In a wild battle

The Environment

Basic Usage

from src.environment import PokemonEnv

# Create environment
env = PokemonEnv("pokered.gbc")
state = env.reset()

# Take actions
for _ in range(100):
    action = env.action_space.sample()  # Random action
    state, reward, done, info = env.step(action)

env.close()

State Space

{
    'x': 5,              # Player X position
    'y': 30,             # Player Y position
    'map_id': 12,        # Current map (12 = Route 1)
    'hp': 45,            # Party HP
    'badges': 0,         # Badge count (bitfield)
    'in_battle': False,  # Battle state
}

Action Space

8 discrete actions matching Game Boy buttons:

Index Action
0 A
1 B
2 Left
3 Right
4 Up
5 Down
6 Start
7 Select

Default Reward Function

  • +1 for visiting a new tile
  • +100 for earning a badge
  • +10 for reaching a new map
  • -0.1 per step (encourages efficiency)

Customize by subclassing:

class MyPokemonEnv(PokemonEnv):
    def compute_reward(self, old_state, new_state):
        reward = 0
        if new_state['badges'] > old_state['badges']:
            reward += 1000  # Big reward for badges!
        return reward

Example Agents

Naive Agent

Just goes up. Gets stuck at ledges.

from src.agents import NaiveAgent
agent = NaiveAgent()
action = agent.act(state)  # Always returns 'up'

Learning Agent

Remembers blocked positions and explores alternatives.

from src.agents import LearningAgent
agent = LearningAgent()
action = agent.act(state)
agent.learn(old_state, action, new_state)  # Learn from feedback

Claude Agent (LLM-Powered)

Uses Claude API with system prompt containing learned navigation knowledge.

from src.claude_agent import ClaudeAgent

agent = ClaudeAgent(system_prompt_path="prompts/system_prompt.md")
action = agent.select_action(state)

Reward Hacker Agent

Demonstrates reward hacking by always fleeing from battles.

from src.agents import RewardHackerAgent
agent = RewardHackerAgent()
# Maximizes "survival" reward by running from everything

The Self-Improving Framework

The pokemon_rl/ module implements a self-improving agent system:

from pokemon_rl import (
    GoalManager,       # Hierarchical goal tracking
    RewardFunction,    # Goal-conditioned rewards
    PatternDetector,   # Detect stuck loops, oscillations
    KnowledgeExtractor,# Extract learnings from trajectories
    Evaluator,         # Benchmark performance
    Orchestrator,      # Run the complete learning loop
)

# Example: Detect patterns in a trajectory
detector = PatternDetector()
patterns = detector.analyze_trajectory("data/traj_001.jsonl")
for p in patterns:
    print(f"{p.type}: {p.description}")

Memory Addresses

Key Pokemon Red memory addresses (from PRET pokered):

Address Description
0xD35E Current map ID
0xD362 Player X position
0xD361 Player Y position
0xD057 Battle type (>0 = in battle)
0xD163 Party Pokemon count
0xD16C First Pokemon HP
0xD356 Badges (bitfield)

Reward Hacking Examples

This repo demonstrates common reward hacking behaviors:

  1. Tile Toggling: Agent bounces between tiles to farm exploration rewards
  2. Heal Farming: Agent damages itself then heals for HP rewards
  3. Battle Fleeing: Agent runs from all battles to maximize "survival" rewards

See assets/demos/demo_reward_hack_flee.gif for a visual demonstration.

Requirements

  • Python 3.8+
  • PyBoy (Game Boy emulator)
  • Pillow (image processing)
  • NumPy
  • anthropic (for Claude agent)

Credits

License

MIT License. Note: You must provide your own Pokemon Red ROM.

Related

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages