4. Lean-specific benchmarking pitfalls
A microbenchmark in Lean 4 measures wall time on compiled code.
That sounds simple, but Lean's runtime — bignum Nat,
reference-counted allocation, structural sharing, ahead-of-time
compilation versus elaborator evaluation — has enough sharp edges
that a benchmark can silently measure something other than the
algorithm you wrote down. This guide walks through the failure
modes we have hit ourselves.
lean-bench papers over some of these (it forces every result,
hashes when it can, and runs in a clean child process for each
measurement) but it cannot save you from a benchmark whose
declared parameter doesn't actually drive the work, or whose
return value's bignum arithmetic dominates the algorithm. Knowing
what the harness does and does not handle is the point of this
document.
For background on the harness mechanics referenced below, see Design. For the user-facing setup, see Quickstart.