4.6. Warm versus cold reading
Every benchmark run has a cold regime where the per-call overhead
(child startup, JIT-style adaptive code paths in the runtime,
cache fills, branch predictor warmup) dominates the algorithm.
The first few rungs of the ladder are systematically slower than
the asymptotic regime would predict. The harness reports per-call
wall time including this overhead and trims the leading
verdictWarmupFraction (20% by default) of ratios before fitting
the slope (see
Quickstart).
What this means in practice:
-
Don't read the first few rows as algorithm data. Tombstones (
†) on early rungs in the report mark the trim region. The trimmed tail is the source of truth for the verdict; the early rows are kept for context only. -
The per-spawn floor sets a hard noise floor. Every report prints
per-spawn floor: X ms. Any data point withtotal_nanossmaller than ~10× the floor is noise rather than signal. The auto-tuner usually drivesinner_repeatsup enough that batches hittargetInnerNanos := 500ms, which is well above the floor; but for very fast operations on smallparamyou can see flat or non-monotone per-call times untilparamgrows past where startup dominates. -
A single child process per param means no warm cache between params. Each measurement is a fresh subprocess, by design (so the wall cap is enforceable without external
timeout(1)). The flip side is no L1/L2 carry-over, no JIT-style steady state across rungs. The defaultwarmmode does still amortise within a rung — the auto-tuner runs the function many times inside a single spawn, so caches and branch predictors reach steady state for that param. If you want the cold per-call cost (cache refill on every measurement, no internal averaging), use--cache-mode coldand read the Cache modes section of Advanced. The two modes measure different things; either can be appropriate. -
Single-shot per param is fragile near the boundary. The default
--outer-trials 1collects one batch per ladder rung, and a single noisy spawn at the high end can flip a verdict from consistent to inconclusive. Bump--outer-trials 3(or higher) to get a per-param median + spread; the verdict then sees the median per param, not a single noisy sample. See Outer trials in Advanced for what the summary block reports and what its limits are. Cost scales linearly with the trial count, so it's a deliberate trade.
For exponential-complexity benchmarks, the ladder shifts from
doubling to a linear sweep over (lastOk, firstFail) and the
log-x range narrows. The slope fit is rejected for narrow ranges
and the verdict falls back to a multiplicative range check —
cMax / cMin ≤ max(narrowRangeNoiseFloor, exp(slopeTolerance · xRange)),
see
Advanced.
On those benchmarks the verdict's β line shows — and you
should read cMin/cMax directly.