lean-bench

1.8. Auto-fit🔗

setup_benchmark requires you to declare a complexity model. That's the right workflow when you know what you expect — the verdict catches mismatches against the declared model. But it's not the only useful workflow. Sometimes you have a function whose complexity is genuinely unclear and you'd like the tool to suggest one.

Pass --auto-fit on run (or compare) to see this. After the standard verdict block, the report prints a ranked list of catalog models by goodness-of-fit. The catalog is fixed:

Catalog entry

What it represents

1

constant time

n

linear

n * Nat.log2 (n + 1)

linearithmic

n^2

quadratic

n^3

cubic

2^n

exponential

The ranking score is stdLogC — the standard deviation of log C across the verdict-eligible rungs, where C = perCallNanos / model(param). A perfect fit drives C to a constant (stdLogC = 0); the worse a model is, the more C has to vary to explain the data, and the higher its score. Every model is scored against the same rung set, so the column is directly comparable across catalog entries. The table is sorted by score so the runner-ups make it easy to see how decisive the pick was.

The arrow tags the best fit only when the verdict is decisive — ≥ 3 usable rungs survived AND the runner-up's score is at least 1.5× the winner's. On a narrow ladder where adjacent catalog entries can't be separated, no arrow is drawn and a ‼ inconclusive line names the reason.

The catalog labels are exactly the strings you would write after => in setup_benchmark, so adopting the suggestion is mechanical:

def myFibA (_ : Nat) : Nat := 0 -- Before setup_benchmark myFibA unused variable `n` Note: This linter can be disabled with `set_option linter.unusedVariables false`n => 1 def myFibB (_ : Nat) : Nat := 0 -- After (auto-fit suggested `n`) setup_benchmark myFibB n => n

This is a heuristic, not a proof. The catalog is fixed at six entries, so anything not in the catalog gets the closest neighbour; adjacent catalog entries (e.g. n vs n * Nat.log2 (n + 1)) are hard to discriminate on short ladders. Read stdLogC to gauge how good the closest match actually is; values above ~1.0 mean even the winner doesn't fit well.