lean-bench

3.2. Reading the verdict🔗

The raw ratios array [(param, perCallNanos / complexity(param)), …] is the source of truth. The verdict is a heuristic decoration.

The decision rule is the log-log slope β of C vs param over the trimmed tail (the leading verdictWarmupFraction of ratios — 20% by default — is dropped first, so the cold regime can't corrupt the fit):

  • "consistent with declared complexity" if |β| ≤ slopeTolerance (0.15 by default)

  • "inconclusive" otherwise

β is printed on the verdict line alongside cMin/cMax. A correct complexity declaration gives β ≈ 0; positive β means the function grows faster than declared by roughly a factor of n^β, negative β means slower. The verdict line surfaces a "looks slower/faster than declared by ~n^β" hint whenever |β| is both outside tolerance and at least 0.05.

An inconclusive verdict is not "the implementation is wrong" — it just means the model + measurement noise doesn't support the strong claim. Things that produce an inconclusive verdict:

  • declared complexity is wrong (e.g. n for an O(n²) algorithm — β comes back near 1)

  • declared complexity is too coarse (e.g. 2^n for an algorithm whose actual cost is φ^n — β comes back negative; the deviation is exponential, not polynomial, so the "~n^β" shorthand is a rough summary rather than an exact exponent)

  • cache effects (smooth at small n, jumps at L1 / L2 boundaries)

  • arbitrary-precision arithmetic on growing results (Lean's Nat is bignum; if f n returns a number with k bits, every operation on it costs O(k))

For exponential complexity, the verdict's β line shows because the slope fit is rejected for narrow log-x ranges (the fit would be ill-conditioned). The cMin/cMax range check takes over: the verdict is consistent iff cMax/cMin ≤ max(narrowRangeNoiseFloor, exp(slopeTolerance · xRange)). Default narrowRangeNoiseFloor = 1.50 admits the 15-25% spread that single-shot near-cap measurements show on noisy hardware. Tighten if you want to discriminate finer model differences and accept the false-negative rate; widen further on chronically noisier hardware.

β tells you the direction; the raw ratios tell you the magnitude. Together they're enough to tell "off by a polynomial factor" from "off by a constant factor but drifting" from "just noisy".