What gets measured
- Total primary request body tokens
- System and default developer context
- Declared tools and tool-definition cost
- Bundled skills and skill-definition cost
- MCP and sub-agent declarations
hibench is an open-source benchmark for the hidden default cost of coding agents. It answers one question: how much context does an agent load before it starts useful work?
Benchmark data last updated June 24, 2026 UTC
Isolate
Run each agent in Docker inside a fresh, empty Git repo.
Intercept
Point it at a local recorder with a dummy key — no upstream model call.
Capture
Send the prompt Hi and record the first outbound request body.
Count
Tokenize every field with o200k_base and break it down.
16
agents benchmarked
1050
captured versions
Every agent has a pinned Docker image, version catalog, and one canonical capture per release. Source, parsers, and export tables live on GitHub.