Benchmark table

Seed scores before the first repeated run

This table is intentionally marked as a seed dataset. It is useful for launch layout and scoring calibration, but it should not be treated as a final buyer recommendation until the live runs are repeated and linked.

Tool	Coding	Research	Workflow	Cost clarity	Best fit
Codex	96	85	85	90	verified local fixes with a transparent cost signal
Claude Code	92	80	90	50	the leanest, most idiomatic patch — it matched the maintainers' own fix
Gemini CLI	95	95	92	65	the deepest read of a bug, if you can spare the speed