This document explains the architecture and workflow of the current agent implementation in the parser repository.
The agent is designed to perform tasks in a web browser environment using Monte Carlo Tree Search (MCTS) for planning. It moves away from pixel-based planning to a DOM-based approach, using the capabilities of the browser's Document Object Model to understand and interact with the page.
The codebase is modular, with each component handling a specific aspect of the agent's lifecycle:
- Role: Configures and launches the agent.
- Functionality:
- Instantiates specific implementations of
ActionGenerator,RewardModel, andPriorPolicy. - Configures the
MCTSPlannerwith search parameters (simulations, depth, top-k). - Creates an
AgentRunnerand aMockBrowserEnv. - Runs a baseline episode and prints the results.
- Instantiates specific implementations of
AgentRunner: The main loop that orchestrates the agent-environment interaction.- Observes the current state.
- Queries the
plannerfor the next best action. - Executes the action in the environment.
- Records the trace of events.
MockBrowserEnv: A placeholder environment that simulates a browser's state and responses for testing purposes without a real browser instance.
MCTSPlanner: Implements the Monte Carlo Tree Search algorithm.- Uses
ActionGeneratorto expand the tree. - Uses
PriorPolicyto guide selection. - Uses
RewardModelto evaluate leaf nodes.
- Uses
MCTSConfig: define hyperparameters like number of simulations (simulations=80), rollout depth (rollout_depth=5), etc.PriorPolicy: A heuristic or learned model that suggests promising actions to explore first.
ActionGenerator: Responsible for analyzing the current DOM state and enforcing rules to generate a list of valid, executable actions (e.g.,click,type,scroll).
Encoder: Transforms the raw DOM tree into a structured state representation (embedding or feature vector) that the models and planner can process.
RewardModel: A component that assigns a scalar value to a state or action sequence, indicating how close the agent is to completing the user's objective.
TraceRecorder: Captures the full history of the agent's reasoning process, including search trees, selected actions, and environment feedback, for later analysis.
- Initialize:
main.pybuilds the components and the runner. - Observe: The runner asks the environment for the current state (DOM).
- Plan: The
MCTSPlannerbuilds a search tree:- Select: Choose a promising node using Upper Confidence Bound (UCB).
- Expand: Generate possible next actions using
ActionGenerator. - Simulate: Estimate the value of the new state using the
RewardModelor a rollout policy. - Backpropagate: Update the values of ancestor nodes.
- Act: The best action from the search is selected and executed in the real (or mock) environment.
- Repeat: The loop continues until the task is done or a limit is reached.