Leaderboard | MemGUI-Bench

🌳 Uses UI Tree 🧠 Has Long-Term Memory Workflow Multi-agent Framework Model End-to-End Model

Performance breakdown by cross-application complexity (1 to 4 apps). SR = Success Rate, IRR = Information Retention Rate.

Performance breakdown by task difficulty level. IRR = Information Retention Rate (memory fidelity metric).

Computational efficiency metrics. Step Ratio = actual steps / golden steps (lower is better), Time/Step = seconds per action, Cost/Step = API cost per action.

Notes

p@k denotes pass@k multi-attempt evaluation: p@1 = first attempt, p@3 = best of 3 attempts
Rank: 🥇 Gold (1st), 🥈 Silver (2nd), 🥉 Bronze (3rd), 🔵 Top 10
Tags: 🌳 Uses UI Tree, 🧠 Long-Term Memory
Type: Workflow = Agentic Workflow (multi-agent), Model = End-to-End Model
Highlighting: Bold = best, Underline = second best
Memory Metrics: IRR = Information Retention Rate, MTPR = Memory-Task Proficiency Ratio, FRR = Failure Recovery Rate
Efficiency: Steps = step ratio (lower is better), Time/Step = seconds/action, Cost/Step = $/action
See submission guidelines to add your results