lean-bench

1.6. Configuring a benchmark🔗

BenchmarkConfig carries the knobs the harness reads at run time: target inner-batch wall time, the per-batch wallclock cap, the doubling-ladder bounds, the warmup-trim count, and the slope-tolerance threshold for the verdict. Defaults are sensible for typical workloads.

Add a where { … } block to setup_benchmark. Anything you don't mention keeps its default:

def slowQuadratic (_ : Nat) : Nat := 0 setup_benchmark slowQuadratic n => n * n where { maxSecondsPerCall := 0.5 paramCeiling := 16384 }

Both run and compare accept the same set of flags. They layer on top of whatever was declared, so you can ship a tight default and loosen or tighten it interactively. The Float flags accept JSON-style numbers (0.5, 1e-3, 3); +1, .5, and 1. are not accepted — write 1, 0.5, and 1.0.

Available flags:

Flag

Type

BenchmarkConfig field

--max-seconds-per-call

Float

maxSecondsPerCall

--target-inner-nanos

Nat

targetInnerNanos

--param-floor

Nat

paramFloor

--param-ceiling

Nat

paramCeiling

--warmup-fraction

Float

verdictWarmupFraction

--slope-tolerance

Float

slopeTolerance

--param-schedule

ParamSchedule

paramSchedule

--cache-mode

CacheMode

cacheMode

--outer-trials

Nat

outerTrials

--auto-fit

(boolean flag)

(heuristic — see below)

--param-schedule accepts auto (default), doubling, or linear. --cache-mode warm|cold selects what is being measured. The default warm is the v0.1 design: the child auto-tunes an inner-repeat count and runs the function many times in a single spawn. cold respawns the child for every ladder rung. --outer-trials N runs N independent measurements per rung and reports per-param median / min / max / spread instead of a single sample.

For workloads that don't fit any built-in ladder — corpus inputs where the natural n values are non-uniform, or a single-rung "this is the size that matters" run — declare a custom ladder:

def cornerCase (_ : Nat) : Nat := 0 setup_benchmark cornerCase n => n where { paramSchedule := .custom #[1, 100, 10_000, 1_000_000] }

.custom skips the doubling probe and the linear-bracket sweep entirely; the harness walks the declared params in order and stops when one hits the wallclock cap. An empty .custom #[] is rejected by config validation.