4.2. Force the work🔗

Lean is call-by-value at runtime, but the compiler is aggressive. A benchmark that constructs a value and never observes it can be optimised down to nothing — or, worse, partially elided so the "measurement" tracks loop overhead rather than the algorithm. Two ways this happens in practice:

Loop-invariant hoisting. If your inner loop calls a pure function on a loop-invariant argument, the compiler can lift the call out:

-- Naive timing loop. Compiler sees `f param` as loop-invariant and
-- hoists it out: you've timed one call plus N empty iterations.
for _ in [0:N] do
  let _ := f param

lean-bench's generated loop body routes every result through LeanBench.blackBox, a @[noinline] consumer with a real side effect. The compiler can't prove the call is dead, so the argument can't be elided. If you're writing your own timing harness without lean-bench, you need an equivalent: a global ref, an IO print of the result, or anything the compiler must assume is observable.

Lazy data structures. Lean's thunks and Lazy values aren't forced by binding them to a name. A benchmark that returns Thunk.mk fun _ => expensive and times "the call" measures allocation of a closure, not expensive. The harness defends against this by hashing the result when the return type has Hashable (which forces deep evaluation of any thunk it traverses), or by calling sizeOf on the result when it doesn't. Both paths walk enough of the value to defeat thunk-based laziness for ordinary data; neither is bullet-proof against bespoke laziness (a custom type that wraps an IO action it never runs, say).

The take-away: for lean-bench, pick a return type that is Hashable whenever you can — compare will also be able to use the hash for cross-implementation agreement checks. If you must return something non-Hashable, ensure the function's body actually constructs the value rather than capturing it in a closure.