🌳 Uses UI Tree
🧠Has Long-Term Memory
Workflow Multi-agent Framework
Model End-to-End Model
Performance breakdown by cross-application complexity (1 to 4 apps).
SR = Success Rate, IRR = Information Retention Rate.
Performance breakdown by task difficulty level.
IRR = Information Retention Rate (memory fidelity metric).
Computational efficiency metrics.
Step Ratio = actual steps / golden steps (lower is better),
Time/Step = seconds per action,
Cost/Step = API cost per action.
Notes
- p@k denotes pass@k multi-attempt evaluation: p@1 = first attempt, p@3 = best of 3 attempts
- Rank: 🥇 Gold (1st), 🥈 Silver (2nd), 🥉 Bronze (3rd), 🔵 Top 10
- Tags: 🌳 Uses UI Tree, 🧠Long-Term Memory
- Type:
Workflow= Agentic Workflow (multi-agent),Model= End-to-End Model - Highlighting: Bold = best, Underline = second best
- Memory Metrics:
IRR= Information Retention Rate,MTPR= Memory-Task Proficiency Ratio,FRR= Failure Recovery Rate - Efficiency:
Steps= step ratio (lower is better),Time/Step= seconds/action,Cost/Step= $/action - See submission guidelines to add your results