lean-bench

4.8. Summary🔗

The mistakes are concrete, not abstract. If your verdict is inconclusive or your C constant looks suspicious:

  1. Is the return type a bignum, or built from bignums? You may be measuring Nat arithmetic rather than your algorithm.

  2. Is the result actually observed? lean-bench forces every result — via Hashable.hash when the return type has a Hashable instance, via Hashable.hash (sizeOf …) otherwise. result_hash in the JSONL row is null only when the return type is non-Hashable (which disables cross-implementation agreement checking, but does not disable forcing).

  3. Is the result observation itself dominating the timing? Switch to a smaller summary return type if so.

  4. Are sharing effects flipping the algorithm between cheap and expensive paths? Test in isolation.

  5. Are you reading early-rung rows? Look at the trimmed tail.

  6. Is the benchmark fixed-problem? Read median, not mean, and confirm hashes agree across repeats.

When in doubt, the raw ratios array in the JSONL output is the source of truth. The verdict is decoration.