lean-bench

6.8. Failure modes and synthesized rows🔗

If the child exits 0 with a parseable JSONL row → use it. If the killer task fired (i.e. killedRef = true) → synthesize a killedAtCap row regardless of the exit code the OS reports. Other non-zero exits with no kill get a synthesized error row. The roundtrip path is exercised by test/; the kill path is exercised by the example benchmarks at the top end of the doubling ladder.

6.8.1. Exit codes🔗

run and compare distinguish three outcomes:

  • 0 — all benchmarks produced verdict-eligible data and any baseline comparison passed.

  • 1 — baseline regression. Some benchmark exceeded the regression threshold against the loaded baseline.

  • 2exitNoUsableData. At least one parametric benchmark produced zero verdict-eligible rows, meaning every rung was filtered out (killed at cap, errored, or otherwise unusable). A registration that calibrates this badly invalidates the regression check itself, so this code supersedes 1. Issue #47.

The summary line on stderr names the offending benchmarks so CI logs surface the calibration failure without parsing per-result advisories.