1.7. Reading the output
Each row of the report is a per-call wall time at one rung of the
ladder. The C column is the unit-cost ratio
C = perCallNanos / complexity(param). When the declared
complexity matches reality, C is approximately constant across
the ladder; when it doesn't, C drifts. The verdict line is a
log-log slope check on C vs param: |β| ≤ 0.15 reads
"consistent with declared complexity", anything wider reads
"inconclusive". β's sign tells you the direction — positive means
actually slower than declared, negative means actually faster.
The harness self-measures its per-spawn overhead and prints it
under the verdict as per-spawn floor. Any rung whose total
sample is shorter than signalFloorMultiplier × spawn_floor
(default 10×) is dominated by subprocess startup rather than the
function, so its per-call number is meaningless; those rows are
tagged [<floor] and dropped from the verdict.
If too few rungs land — every row is below the floor, every row
hits the cap, the ladder was truncated mid-flight, or fewer than
three rows survive the warmup-trim plus signal-floor filter — the
report can't compute a verdict, and ‼ advisory lines appear
naming concrete knobs to turn next (--max-seconds-per-call,
--param-ceiling, paramSchedule := .custom #[...],
setup_fixed_benchmark, …).
A wide-ladder goodFib run, on stable hardware, gives a
consistent verdict. † flags the warmup-trim region (excluded
from the slope fit); [<floor] flags rungs dominated by spawn
overhead; the verdict line reports cMin, cMax, and β:
$ lake exe fib_benchmark_example run goodFib --param-ceiling 4096 --max-seconds-per-call 1
LeanBench.Examples.Fib.goodFib expected complexity: n [warm cache]
env: lean=4.30.0-rc2, os=linux, arch=x86_64, commit=<HASH>, <DATE>, host=<HOST>
param per-call repeats C
0 46.000 ns ×2^23 C= —
1 107.000 ns ×2^23 C= —
2 82.000 ns ×2^22 C=41.343 †
4 112.000 ns ×2^21 C= — [<floor]
8 200.000 ns ×2^22 C=25.061 †
16 278.000 ns ×2^21 C=17.416
32 459.000 ns ×2^20 C=14.353
64 1.236 µs ×2^19 C=19.324
128 2.334 µs ×2^18 C=18.241
256 3.743 µs ×2^17 C=14.623
512 7.506 µs ×2^16 C=14.662
1_024 17.287 µs ×2^15 C=16.882
2_048 18.792 µs ×2^13 C= — [<floor]
4_096 43.304 µs ×2^13 C=10.572
verdict: consistent with declared complexity (cMin=10.572, cMax=19.324, β=-0.065)
per-spawn floor (harness self-measurement): 33.998 ms
β=-0.065 sits inside the |β| ≤ 0.15 band, so the verdict
reads "consistent". The table tells the same story: per-call time
grows roughly linearly with param once spawn overhead amortises,
and C stabilises near 15 ns. The two [<floor] rungs are
batches that ran shorter than 10× the spawn floor — pure noise,
and correctly dropped from the slope fit.
The opposite case: a tight ladder where the cap fires before any
real ladder data accumulates. The verdict reads "inconclusive"
and the ‼ advisories name the knob to turn:
$ lake exe fib_benchmark_example run goodFib --param-ceiling 16 --max-seconds-per-call 0.5 --signal-floor-multiplier 1.0
LeanBench.Examples.Fib.goodFib expected complexity: n [warm cache]
env: lean=4.30.0-rc2, os=linux, arch=x86_64, commit=<HASH>, <DATE>, host=<HOST>
param per-call repeats C
0 55.000 ns ×2^23 C= —
1 113.000 ns ×2^23 C= —
2 79.000 ns ×2^22 C=39.733
4 140.000 ns ×2^22 C=35.008
8 234.000 ns ×2^22 C=29.356
16 247.000 ns ×2^21 C=15.492
verdict: inconclusive (cMin=15.492, cMax=39.733, β=-0.433, looks faster than declared by ~n^0.433)
per-spawn floor (harness self-measurement): 32.899 ms
β=-0.433 sits well outside the band, so the verdict reads
"inconclusive" with a directional hint ("looks faster than
declared by ~n^0.433"). It's a calibration failure, not a bug:
at small n the per-call setup dominates the linear algorithm,
dragging C down across rungs. Widening the ladder
(--param-ceiling) lets the asymptotic cost win and tips the
verdict back to consistent — exactly what the previous example
shows.