RAB
Real-world Agentic Benchmark
Local LLMs · multi-step tool use · deep-validation scoring
Leaderboard
Capability map
Findings
Benchmarks ≠ capability.
See the leaderboard
Key findings