TB TripleBench

Benchmark table

Seed scores before the first repeated run

This table is intentionally marked as a seed dataset. It is useful for launch layout and scoring calibration, but it should not be treated as a final buyer recommendation until the live runs are repeated and linked.

Tool Coding Research Workflow Cost clarity Best fit
Codex 96 85 85 90 verified local fixes with a transparent cost signal
Claude Code 92 80 90 50 the leanest, most idiomatic patch — it matched the maintainers' own fix
Gemini CLI 95 95 92 65 the deepest read of a bug, if you can spare the speed