lean-bench

2.7. Export format🔗

The export format is a separate, user-facing JSON document produced by --export FILE on run and compare. It is distinct from the JSONL wire format described above:

aspect

JSONL wire format

Export format

scope

one row per child invocation

one document per run / compare

audience

internal (child → parent)

external (CI, dashboards, baselines)

structure

newline-delimited JSON objects

single JSON object

version key

schema_version

export_schema_version

source of truth

LeanBench/Schema.lean

LeanBench/Export.lean

The two version counters are independent. Changes to the JSONL wire format do not force a bump of export_schema_version, and vice versa.

2.7.1. Export document structure🔗

{
  "export_schema_version": 1,
  "lean_bench_version": "0.1.0",
  "env": { ... },
  "results": [ ... ],
  "baseline_comparison": [ ... ],
  "budget": { ... }
}
  • export_schema_version (integer, required): current version is 1.

  • lean_bench_version (string): LeanBench.libraryVersion.

  • env (object | null): top-level reproducibility metadata (same schema as the JSONL env field). All results in a single export share the same parent env.

  • results (array): one entry per benchmark result. Each entry has a "kind" discriminator ("parametric" or "fixed") and carries the full result data: points, ratios, verdict, config, trial summaries, etc. Every result also carries "budget_truncated" (boolean): true when the run was a CI-budget run and that benchmark's ladder was cut short between rungs by the deadline.

  • baseline_comparison (array, optional): present when --baseline was used. One entry per matched function with per-param regression / improvement / stable classification.

  • budget (object, optional): present only when --total-seconds was used. See Export budget summary below.

2.7.2. Export budget summary🔗

When the run was launched with --total-seconds N (issue #9), the export document carries a top-level "budget" object summarising which benchmarks ran, which were skipped, and how long the suite actually took:

{
  "budget": {
    "total_seconds": 60.0,
    "elapsed_seconds": 58.91,
    "completed": 7,
    "truncated": 1,
    "skipped": [
      {"function": "MyProject.Sort.runMergeSort",
       "kind": "parametric",
       "status": "budget_skip"},
      {"function": "MyProject.Sort.runInsertion",
       "kind": "parametric",
       "status": "budget_skip"}
    ]
  }
}

Field semantics:

  • total_seconds (number, required): the value of the --total-seconds flag.

  • elapsed_seconds (number, required): wallclock seconds the orchestrator actually spent before stopping.

  • completed (integer, required): number of benchmarks that ran to a result (whether or not their ladders were truncated mid-flight).

  • truncated (integer, required): number of completed benchmarks whose ladder was cut short between rungs by the deadline. Each truncated result also carries "budget_truncated": true in its results entry.

  • skipped (array, required): one entry per benchmark that didn't run at all because the budget was exhausted before it could start. Each entry has function (string), kind ("parametric" or "fixed"), and status (always "budget_skip"). Order matches the order benchmarks would have been scheduled in.

CI consumers should treat skipped and truncated > 0 as actionable information rather than failure: the run completed within the configured budget on purpose.

2.7.3. Export versioning rules🔗

The same breaking/non-breaking rules as the JSONL format apply:

  • Adding a new optional field is non-breaking.

  • Removing or renaming a field, or changing its type, is breaking and bumps export_schema_version.

  • Readers MUST reject documents with export_schema_version outside their supportedExportVersions set.

2.7.4. Baseline comparison via --baseline🔗

--baseline FILE on run loads a previous export file and compares the current run against it at shared parameter values:

  • Parametric: for each shared param, computes the percentage change in per_call_nanos (using the trial-summary median when outer trials > 1). Params where the current run is slower than baseline by more than --regression-threshold (default 10%) are flagged as regressions. Params faster by the same margin are improvements.

  • Fixed: compares median wall time.

  • Exit code: non-zero when any regression is detected, making the feature CI-usable without parsing terminal output.