Know what your agent loads before it writes.

hibench measures the default context footprint of coding agents: system prompts, tools, skills, MCP servers, and sub-agents loaded before the first user request is answered.

Benchmark data last updated July 7, 2026 UTC

View rankings Explore agents

Top 5 and bottom 5 by default footprint

Largest and smallest latest versions · o200k_base total request tokens

1Claude Code2OpenClaw3Cursor CLI4Copilot CLI5Droid12Gemini CLI13OpenCode14Mistral Vibe15Cline16Pi

Goal & philosophy

①

Default cost is real cost

Every tool schema, skill description and system instruction is sent on the very first turn. That baseline is paid on each request, before any useful work happens.

②

Measure, don't guess

We capture the first real outbound request and count it with a single fixed tokenizer so numbers stay comparable across agents, versions and models.

③

Track the evolution

Footprints drift over releases. hibench keeps one canonical capture per version so you can watch context grow (or shrink) over time.

How it works

1
Isolate
Run each agent in Docker inside a fresh, empty Git repo.
2
Intercept
Point it at a local recorder with a dummy key — no upstream model call.
3
Capture
Send the prompt Hi and record the first outbound request body.
4
Count
Tokenize every field with o200k_base and break it down.