Skip to content

AlibabaResearch/RoTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

[ICML 2026 Spotlight] Official code for the paper:
"Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents"

Tianpeng Bu*, Xin Liu*, Qihua Chen*, Hao Jiang, Shurui Li, Hongtao Duan, Lu Jiang, Lulu Hu, Bin Yang, Minying Zhang (* equal contribution)
Alibaba Cloud Computing


Overview

GUI agents frequently fail to recover from their own policy-induced errors, limiting real-world deployment. This paper addresses the problem at both the evaluation and data levels:

  • GUI-RobustEval — a benchmark of 1,216 executable test cases that systematically measure error recovery capabilities across 11 error types and 4 error depths.
  • RoTS — a scalable, tree-based online trajectory synthesis framework that produces 800k high-quality training samples covering diverse policy-induced error modes and long-horizon recovery strategies.
  • RoTS-7B / RoTS-32B — models fine-tuned on the RoTS dataset, achieving state-of-the-art performance among open-weight models on OSWorld.

For full details and results, please refer to the paper.


Releases

[2026/05/27 Update] We are actively cleaning up our code and data for open-source release. Stay tuned — we look forward to sharing everything with the community soon!

Artifact Status
GUI-RobustEval 🔜 In preparation
RoTS Dataset 🔜 In preparation
Sampling Pipeline 🔜 In preparation
Post-processing Pipeline 🔜 In preparation

Citation

@inproceedings{bu2026rots,
  title     = {Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust {GUI} Agents},
  author    = {Bu, Tianpeng and Liu, Xin and Chen, Qihua and Jiang, Hao and Li, Shurui and Duan, Hongtao and Jiang, Lu and Hu, Lulu and Yang, Bin and Zhang, Minying},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  series    = {Proceedings of Machine Learning Research},
  volume    = {306},
  year      = {2026},
  publisher = {PMLR},
}

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors