MemGUI-Bench Leaderboard

Evaluating GUI agents' memory capabilities in Dynamic Environments

🌳 Uses UI Tree 🧠 Has Long-Term Memory Workflow Multi-agent Framework Model End-to-End Model
Performance breakdown by cross-application complexity (1 to 4 apps). SR = Success Rate, IRR = Information Retention Rate.
Performance breakdown by task difficulty level. IRR = Information Retention Rate (memory fidelity metric).
Computational efficiency metrics. Step Ratio = actual steps / golden steps (lower is better), Time/Step = seconds per action, Cost/Step = API cost per action.

Notes