4.8. Summary🔗

The mistakes are concrete, not abstract. If your verdict is inconclusive or your C constant looks suspicious:

Is the return type a bignum, or built from bignums? You may be measuring Nat arithmetic rather than your algorithm.
Is the result actually observed? lean-bench forces every result — via Hashable.hash when the return type has a Hashable instance, via Hashable.hash (sizeOf …) otherwise. result_hash in the JSONL row is null only when the return type is non-Hashable (which disables cross-implementation agreement checking, but does not disable forcing).
Is the result observation itself dominating the timing? Switch to a smaller summary return type if so.
Are sharing effects flipping the algorithm between cheap and expensive paths? Test in isolation.
Are you reading early-rung rows? Look at the trimmed tail.
Is the benchmark fixed-problem? Read median, not mean, and confirm hashes agree across repeats.

When in doubt, the raw ratios array in the JSONL output is the source of truth. The verdict is decoration.