lean-bench

6.1. Subprocess per measurement🔗

Every measurement of f at a chosen param spawns a child process. The child does the timing AND the auto-tuned-repeat loop; the parent spawns the child via IO.Process.spawn, reads stdout, and runs a sibling killer Task that calls IO.Process.Child.kill if the batch exceeds maxSecondsPerCall + killGraceMs.

Why subprocess: Lean 4 has no portable in-process interrupt for arbitrary computation. Task cancellation is cooperative; IO.sleep doesn't help. The only reliable way to enforce a wallclock cap on a single measurement is to give it its own process and kill the process.

The kill path is pure Lean — no dependency on GNU coreutils timeout(1) or any other external utility. IO.Process.Child.kill is implemented for every platform Lean supports, so the harness runs on Linux, macOS, and Windows out of the box.

6.1.1. Killer task: parent / killer coordination🔗

The parent spawns the child and immediately starts a sibling Task whose job is to kill the child if the wall budget runs out. A BaseMutex plus two IO.Ref Bool flags (mainDoneRef, killedRef) coordinate the two:

  1. Parent reads stdout (one JSONL row).

  2. Parent takes the lock and sets mainDoneRef := true, then releases.

  3. Parent calls child.wait (which on POSIX reaps the pid).

  4. Parent inspects killedRef. true → synthesize a killedAtCap row; false → use the parsed JSONL row (or synthesize an error row if the exit code is non-zero).

Meanwhile, the killer:

  1. Polls mainDoneRef while sleeping in 25 ms increments up to the deadline (maxSecondsPerCall + killGraceMs). Polling lets it exit promptly when the parent finishes early — without it, every batch would block for the full deadline.

  2. If the deadline elapses without mainDoneRef becoming true, the killer takes the lock, re-reads mainDoneRef, and if still false sets killedRef := true and calls Child.kill.

The lock guarantees the dangerous race is impossible: the parent sets mainDoneRef := true before calling child.wait, so by the time the kernel can reap the pid, any subsequent killer critical-section will observe mainDoneRef = true and skip the kill. We never call kill(2) on a pid that has already been reaped (which on POSIX could hit a recycled pid and signal an unrelated process).

There is a microsecond-scale logical (not safety) race: if the child happens to exit naturally just as the killer fires, the killer can grab the lock first, see mainDoneRef = false, and mark the row killedAtCap even though the child finished within the budget. The row is dropped from consistentWith… ratio analysis anyway, and the cap was effectively reached, so this is benign.