This repository contains our team’s BrowserGym Green Agent for evaluating web-based tasks on the AgentBeats platform. It supports three official BrowserGym benchmarks:AssistantBench, MiniWoB, and WebLINX.
All benchmarks are integrated into a single Green Agent that can:
-
Initialize and manage BrowserGym environments
-
Evaluate task requests from AgentBeats
-
Validate white-agent actions against task requirements
-
Return success/failure results and final scores
This unified evaluator provides a consistent assessment pipeline across all three benchmarks and supports both local demonstration and remote deployment on AgentBeats. It is submitted as our team’s entry for the AgentBeats Green Agent Challenge.