4.4. Allocation and sharing🔗

Lean uses reference counting with destructive update for unique references. This matters two different ways for benchmarks.

Allocation cost in Nat arithmetic and structural data. Every non-trivial Nat operation allocates (a fresh bignum result), as does every Array.push, List.cons, String concatenation, and record construction with at least one field changed. A loop that looks like arr := arr.push x is O(1) amortised when arr is unshared (Lean re-uses the buffer in place) and O(n) per push when it is shared (Lean copies first). Code that benchmarks fine in isolation can flip between regimes when used in a context that holds a second reference to the same array.

A concrete example:

def pushN (n : Nat) : Array Nat := Id.run do
  let mut a : Array Nat := #[]
  for i in [0:n] do a := a.push i
  return a

-- Looks like O(n). It is, when called as below.
setup_benchmark pushN n => n

-- But this version, which keeps a snapshot every step, is O(n^2):
def pushNShared (n : Nat) : Array Nat := Id.run do
  let mut a : Array Nat := #[]
  let mut snapshots : Array (Array Nat) := #[]
  for i in [0:n] do
    snapshots := snapshots.push a   -- shares `a`!
    a := a.push i                   -- next push must copy
  return a

If you suspect sharing-induced regressions, write a smaller allocation-only variant that exercises just the suspect operation and benchmark it alongside the full algorithm — the relative per-call cost will tell you whether the unique-reference path is being hit.

Allocation churn vs. arithmetic in tight loops. Even when a single operation is cheap, allocating a fresh bignum / array per iteration generates GC-style pressure that doesn't map cleanly onto a textbook complexity. The harness measures wall time including this pressure, so it shows up as "constant factor C is bigger than you expected" rather than as a slope deviation. If you're chasing a 2× constant-factor difference between two algorithms with the same declared complexity, allocation patterns are a strong suspect.

The alloc_bytes and peak_rss_kb schema fields are reserved for a future feature (#6) that will surface this directly. Until then, allocation effects are visible only through the C = perCallNanos / complexity(n) constant.