RAB

Real-world Agentic Benchmark

Local LLMs · multi-step tool use · deep-validation scoring

Benchmarks ≠ capability.

See the leaderboard

Key findings