Benchmarking memory for mobile GUI agents
MemGUI-Bench Leaderboard
A memory-centric benchmark for mobile GUI agents in dynamic environments, covering short-term recall, long-term improvement, cross-app workflows, and MemGUI-Eval based judgment.
Live results
Leaderboard
p@1 measures first-attempt success, p@3 measures best-of-three success, and IRR/MTPR/FRR isolate memory quality and recovery.
Benchmark
MemGUI-Bench contains 128 memory-intensive mobile GUI tasks across 26 apps and 68 scenarios. Tasks stress cross-step retention, cross-app transfer, and cross-session learning.
Evaluation
MemGUI-Eval uses progressive scrutiny: lightweight triage, trajectory description, semantic judgment, and targeted visual verification when needed.
Resources
Citation
Use MemGUI-Bench in your work
@article{liu2026memgui,
title={MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments},
author={Liu, Guangyi and Zhao, Pengxiang and Liang, Yaozhen and Luo, Qinyi and Tang, Shunye and Chai, Yuxiang and Lin, Weifeng and Xiao, Han and Wang, WenHao and Chen, Siheng and others},
journal={arXiv preprint arXiv:2602.06075},
year={2026}
}