An AI system that uses Large Language Models (LLMs) to play point-and-click adventure games on Linux. The system captures game screenshots, analyzes them using various LLMs (local or remote), and performs actions based on the analysis.
- Multi-LLM Support: Works with local Ollama models, OpenAI, Anthropic, and Hugging Face models
- Dynamic Window Selection: Automatically detects and targets game windows
- Live Status Display: Real-time visualization of AI's analysis and actions
- Context Memory System: Maintains game state and strategy across iterations
- Session Logging: Saves screenshots and LLM responses for analysis
- Grid-Based Navigation: Uses a numbered cell system for precise interaction
- Long-term Strategy Development: Updates game context, map, and objectives every 10 iterations
- Twitch Chat Integration: Allows viewers to provide hints and suggestions through chat
- Real-time Chat Monitoring: Displays chat messages and user suggestions in a dedicated window
- User Command Execution: Processes and executes valid commands from chat users
- Replaced pixel-based coordinates with a numbered cell grid system
- Makes it easier for LLMs to understand and interact with the game
- More accurate than pixel counting for click actions
- Based on the GridGPT approach
Every 10 iterations, the system pauses to update three key components:
-
Game Map:
- Tracks discovered rooms and their connections
- Maintains a persistent map of the game world
- Updates based on the last 10 screen descriptions
-
Game Objectives:
- Maintains a prioritized list of goals
- Tracks completed and active objectives
- Includes discovered clues and hints
-
Game Context:
- Summarizes recent actions
- Reduces repetition in future actions
- Improves action variation
- Extended game screen text duration
- Takes snapshots every 3 seconds
- Ensures dialogue and important text is captured
The system now includes a powerful Twitch chat integration that allows viewers to participate in the gameplay:
-
Chat Command System:
- Users can provide hints and suggestions through chat
- Commands are processed every 5 iterations
- Supports both cell-based and pixel-based coordinates
- Valid commands are displayed in real-time
-
Chat Monitor Window:
- Real-time display of chat messages
- Shows connection status and statistics
- Lists recent user suggestions
- Tracks executed commands
-
Command Processing:
- Validates user commands before execution
- Converts pixel coordinates to cell numbers
- Maintains a history of executed commands
- Provides feedback on command execution
-
Safety Features:
- Command validation and sanitization
- Rate limiting for command execution
- Error handling for invalid commands
- Connection status monitoring
The system has demonstrated impressive capabilities:
- Successfully explores game environments
- Discovers hidden passages and items
- Understands game mechanics and puzzles
- Maintains context across different game areas
For example, with GPT-4.1, the system:
- Explored the main hall and office
- Discovered the passage behind the clock
- Found Fred's lab
- Identified the need for a diamond to power the time machine
This system could serve as a benchmark for evaluating LLM performance in adventure games:
- Tests spatial reasoning
- Evaluates puzzle-solving abilities
- Measures context retention
- Assesses strategic planning
Complex games like "Day of the Tentacle" (with multiple timelines) could provide excellent test cases for evaluating LLM capabilities.
- Linux system with X11
- Python 3.8+
- Required Python packages (see requirements.txt)
- X11 tools (xdotool, xprop)
- Optional: Ollama for local LLM support
- Optional: API keys for remote LLM services
- Clone the repository
- Install required packages:
pip install -r requirements.txt
- Install X11 tools:
bash sudo dnf install xdotool xorg-x11-utils - (Optional) Set up Ollama for local LLM support
- (Optional) Configure API keys for remote LLM services
- Run the script:
python play.py
- Select the game window when prompted
- Choose an LLM model
- Watch the AI play!
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2025 Luis Hernandez @luishg
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.