Default cost is real cost
Every tool schema, skill description and system instruction is sent on the very first turn. That baseline is paid on each request, before any useful work happens.
hibench measures the default context footprint of coding agents: system prompts, tools, skills, MCP servers, and sub-agents loaded before the first user request is answered.
Top 5 + flop 5 token ranking
heaviest and lightest latest versions · total request tokens
Every tool schema, skill description and system instruction is sent on the very first turn. That baseline is paid on each request, before any useful work happens.
We capture the first real outbound request and count it with a single fixed tokenizer so numbers stay comparable across agents, versions and models.
Footprints drift over releases. hibench keeps one canonical capture per version so you can watch context grow (or shrink) over time.
Isolate
Run each agent in Docker inside a fresh, empty Git repo.
Intercept
Point it at a local recorder with a dummy key — no upstream model call.
Capture
Send the prompt Hi and record the first outbound request body.
Count
Tokenize every field with o200k_base and break it down.
21,618
default tokens · 26 tools · 98 versions
18,540
default tokens · 33 tools · 63 versions
16,379
default tokens · 15 tools · 3 versions
15,744
default tokens · 25 tools · 70 versions
12,395
default tokens · 17 tools · 55 versions
11,686
default tokens · 27 tools · 6 versions
9,762
default tokens · 13 tools · 101 versions
8,730
default tokens · 11 tools · 114 versions
6,785
default tokens · 9 tools · 102 versions
3,890
default tokens · 25 tools · 22 versions
1,255
default tokens · 4 tools · 100 versions