4.2. Force the work
Lean is call-by-value at runtime, but the compiler is aggressive. A benchmark that constructs a value and never observes it can be optimised down to nothing — or, worse, partially elided so the "measurement" tracks loop overhead rather than the algorithm. Two ways this happens in practice:
Loop-invariant hoisting. If your inner loop calls a pure function on a loop-invariant argument, the compiler can lift the call out:
-- Naive timing loop. Compiler sees `f param` as loop-invariant and -- hoists it out: you've timed one call plus N empty iterations. for _ in [0:N] do let _ := f param
lean-bench's generated loop body routes every result through
LeanBench.blackBox, a @[noinline] consumer with a real side
effect. The compiler can't prove the call is dead, so the
argument can't be elided. If you're writing your own timing
harness without lean-bench, you need an equivalent: a global
ref, an IO print of the result, or anything the compiler must
assume is observable.
Lazy data structures. Lean's thunks and Lazy values aren't
forced by binding them to a name. A benchmark that returns
Thunk.mk fun _ => expensive and times "the call" measures
allocation of a closure, not expensive. The harness defends
against this by hashing the result when the return type has
Hashable (which forces deep evaluation of any thunk it
traverses), or by calling sizeOf on the result when it doesn't.
Both paths walk enough of the value to defeat thunk-based
laziness for ordinary data; neither is bullet-proof against
bespoke laziness (a custom type that wraps an IO action it
never runs, say).
The take-away: for lean-bench, pick a return type that is
Hashable whenever you can — compare will also be able to use
the hash for cross-implementation agreement checks. If you must
return something non-Hashable, ensure the function's body
actually constructs the value rather than capturing it in a
closure.