6.5. What we left out🔗

Elaboration-time subprocess sanity check ("does f 0 finish in < 1s?"). Considered and rejected: it makes the elaborator depend on the compiled binary, which is circular and phase-dependent. v0.1 static checks only verify symbol existence, arity, and complexity type. Compiled-code sanity checks live in lean-bench verify, which invokes the registered runner in-process against f 0 and f 1 for each registered benchmark and reports any non-ok outcome.
Strong verdict labels. The verdict fits the log-log slope β of C vs param over the trimmed tail (leading 20% of ratios dropped by default). |β| ≤ 0.15 → "consistent with declared complexity", otherwise "inconclusive". β's sign is reported alongside the verdict so callers can see direction, but the verdict itself stays binary: the raw ratios array is the source of truth and the verdict is decoration. The labels deliberately don't say "matches" or "doesn't match" because the heuristic isn't defended statistically.
Statistical machinery. No outlier rejection, confidence intervals, repeated outer trials. v0.1 trusts that targetInnerNanos := 500ms × auto-tuned inner-repeats is enough signal-to-noise for the orders of magnitude we care about.