Disciplined Aggregation & Correction Layers — Probability Lab

Pooling

Why the engine doesn't just average

The obvious aggregate — the weighted average of each world's P(YES | world) — is computed first and shown as the inside-view linear pool. But a long literature on forecast combination shows that averaging probabilities systematically under-extremizes: it drags every estimate toward 50%, even when the evidence genuinely points one way. Averaging a 90% and a 70% should often yield something sharper than 80%, because two semi-independent signals agreeing is itself evidence.

So the working aggregate is a geometric mean of odds — the same combination performed in log-odds space — followed by a modest extremizing step. Crucially, extremizing must be earned: the exponent grows only with genuine mechanism-family diversity (measured by a concentration index across families) and is applied at all only when the set has five or more scenarios. A thin or one-family scenario set gets no sharpening, because its agreement is correlation, not corroboration.

Linear pooling compresses toward 50%; pooling in log-odds space recovers the sharpness the evidence supports. The extremizing exponent d is gated by a diversity index, never a free parameter.

Correction layers

Eight named discounts, each with a trigger

Scenario-based forecasting has known inflation modes: the same bullish mechanism told through three doors, a crowd of optimistic worlds, an elegant story carrying a thin set. The engine guards against each with an explicit, capped correction. Every layer is named, computed from measurable properties of the scenario set, and visible in the construction ledger — including the layers that didn't trigger.

Layer	Triggers when…	Max
Overlap discount	one mechanism family carries more than ~40% of scenario weight — correlated stories counted once, not twice	−5pp
High-YES saturation	more than half the scenarios are judged ≥70% conditional YES — a suspiciously unanimous crowd	−5pp
Weighted YES mass	too much world-weight sits in high-YES paths relative to the spread of the set	−5pp
Structural friction	heavy world-weight carries hard structural NO pressure that the optimistic paths ignore	−4pp
Conjunction stack	high-YES paths require four or more independent preconditions to all go right	−4pp
Time-window friction	the judges' time-feasibility scores say the constructive chains struggle to fit the window	−4pp
Low-count fragility	four or fewer scenarios, with one elegant path dominating the result	−4pp
Meta contamination	adjudication-style scenarios (about the question, not the world) slipped into the set	−5pp

The cap

Total heuristic correction is capped at 15 percentage points. The corrections are a conservatism device, not a second forecast — they discipline the aggregate; they are not allowed to replace it.

The construction waterfall

Every step on the record

The result view renders the entire construction as a waterfall: the linear pool at the top, each signed step labelled with its cause, the defended forecast at the bottom. Nothing moves the number without leaving a line in this chart.

A real construction from a sample run. The two final steps — the outside-view blend and the red-team adjustment — are covered in explainer 05; they act after the corrections, in bounded moves.

This is the feature the product's trust rests on. When the final number is lower than the first-pass aggregate, the ledger says exactly why — “too much of the constructive weight came from overlapping YES paths” is an auditable claim, not a vibe. And when no layer triggers, that is shown too: a diversified, well-spread scenario set earns its sharpness visibly.

Linear pool

Σ weight × P(YES | world) — the naive first-pass aggregate, always shown for reference.

Geometric odds pool

The same combination in log-odds space — resistant to the compression that averaging causes.

Extremizing (d)

A sharpening exponent up to 1.30, earned only by family diversity on sets of 5+ scenarios.

Construction ledger

The named list of every correction — triggered or clear — with the measured property that caused it.