1.6. Configuring a benchmark
BenchmarkConfig carries the knobs the harness reads at run time:
target inner-batch wall time, the per-batch wallclock cap, the
doubling-ladder bounds, the warmup-trim count, and the
slope-tolerance threshold for the verdict. Defaults are sensible
for typical workloads.
Add a where { … } block to setup_benchmark. Anything you don't
mention keeps its default:
def slowQuadratic (_ : Nat) : Nat := 0
setup_benchmark slowQuadratic n => n * n where {
maxSecondsPerCall := 0.5
paramCeiling := 16384
}
Both run and compare accept the same set of flags. They layer
on top of whatever was declared, so you can ship a tight default
and loosen or tighten it interactively. The Float flags accept
JSON-style numbers (0.5, 1e-3, 3); +1, .5, and 1. are
not accepted — write 1, 0.5, and 1.0.
Available flags:
Flag | Type |
|
|---|---|---|
| Float |
|
| Nat |
|
| Nat |
|
| Nat |
|
| Float |
|
| Float |
|
| ParamSchedule |
|
| CacheMode |
|
| Nat |
|
| (boolean flag) | (heuristic — see below) |
--param-schedule accepts auto (default), doubling, or linear.
--cache-mode warm|cold selects what is being measured. The default
warm is the v0.1 design: the child auto-tunes an inner-repeat count
and runs the function many times in a single spawn. cold respawns
the child for every ladder rung. --outer-trials N runs N
independent measurements per rung and reports per-param median /
min / max / spread instead of a single sample.
For workloads that don't fit any built-in ladder — corpus inputs
where the natural n values are non-uniform, or a single-rung
"this is the size that matters" run — declare a custom ladder:
def cornerCase (_ : Nat) : Nat := 0
setup_benchmark cornerCase n => n where {
paramSchedule := .custom #[1, 100, 10_000, 1_000_000]
}
.custom skips the doubling probe and the linear-bracket sweep
entirely; the harness walks the declared params in order and stops
when one hits the wallclock cap. An empty .custom #[] is rejected
by config validation.