Probability Lab  ·  Method explainer 04 ← Back to Probability Lab

Disciplined Aggregation & Correction Layers

The judged scenarios now have to become one number — and how you combine probabilities matters as much as how you produced them. Probability Lab pools in log-odds space, earns its sharpness through diversity, and then applies eight named discounts for the failure modes that inflate naive forecasts.

Pooling

Why the engine doesn't just average

The obvious aggregate — the weighted average of each world's P(YES | world) — is computed first and shown as the inside-view linear pool. But a long literature on forecast combination shows that averaging probabilities systematically under-extremizes: it drags every estimate toward 50%, even when the evidence genuinely points one way. Averaging a 90% and a 70% should often yield something sharper than 80%, because two semi-independent signals agreeing is itself evidence.

So the working aggregate is a geometric mean of odds — the same combination performed in log-odds space — followed by a modest extremizing step. Crucially, extremizing must be earned: the exponent grows only with genuine mechanism-family diversity (measured by a concentration index across families) and is applied at all only when the set has five or more scenarios. A thin or one-family scenario set gets no sharpening, because its agreement is correlation, not corroboration.

0% 50% 100% individual judged worlds linear pool — pulled toward the middle log-odds pool, extremity d = 1.00 → 1.30 sharpness earned by family diversity — withheld from thin or concentrated sets
Linear pooling compresses toward 50%; pooling in log-odds space recovers the sharpness the evidence supports. The extremizing exponent d is gated by a diversity index, never a free parameter.

Correction layers

Eight named discounts, each with a trigger

Scenario-based forecasting has known inflation modes: the same bullish mechanism told through three doors, a crowd of optimistic worlds, an elegant story carrying a thin set. The engine guards against each with an explicit, capped correction. Every layer is named, computed from measurable properties of the scenario set, and visible in the construction ledger — including the layers that didn't trigger.

LayerTriggers when…Max
Overlap discountone mechanism family carries more than ~40% of scenario weight — correlated stories counted once, not twice−5pp
High-YES saturationmore than half the scenarios are judged ≥70% conditional YES — a suspiciously unanimous crowd−5pp
Weighted YES masstoo much world-weight sits in high-YES paths relative to the spread of the set−5pp
Structural frictionheavy world-weight carries hard structural NO pressure that the optimistic paths ignore−4pp
Conjunction stackhigh-YES paths require four or more independent preconditions to all go right−4pp
Time-window frictionthe judges' time-feasibility scores say the constructive chains struggle to fit the window−4pp
Low-count fragilityfour or fewer scenarios, with one elegant path dominating the result−4pp
Meta contaminationadjudication-style scenarios (about the question, not the world) slipped into the set−5pp
The cap

Total heuristic correction is capped at 15 percentage points. The corrections are a conservatism device, not a second forecast — they discipline the aggregate; they are not allowed to replace it.


The construction waterfall

Every step on the record

The result view renders the entire construction as a waterfall: the linear pool at the top, each signed step labelled with its cause, the defended forecast at the bottom. Nothing moves the number without leaving a line in this chart.

Inside view · linear pool 31.6% Log-odds pool & extremizing −10.4pp Structural friction −1.7pp Conjunction stack discount −1.2pp Outside-view blend +1.5pp Red-team adjustment −1.0pp Final defended forecast 21.1%
A real construction from a sample run. The two final steps — the outside-view blend and the red-team adjustment — are covered in explainer 05; they act after the corrections, in bounded moves.

This is the feature the product's trust rests on. When the final number is lower than the first-pass aggregate, the ledger says exactly why — “too much of the constructive weight came from overlapping YES paths” is an auditable claim, not a vibe. And when no layer triggers, that is shown too: a diversified, well-spread scenario set earns its sharpness visibly.

Linear pool

Σ weight × P(YES | world) — the naive first-pass aggregate, always shown for reference.

Geometric odds pool

The same combination in log-odds space — resistant to the compression that averaging causes.

Extremizing (d)

A sharpening exponent up to 1.30, earned only by family diversity on sets of 5+ scenarios.

Construction ledger

The named list of every correction — triggered or clear — with the measured property that caused it.