Why the engine doesn't just average
The obvious aggregate — the weighted average of each world's P(YES | world) — is computed first and shown as the inside-view linear pool. But a long literature on forecast combination shows that averaging probabilities systematically under-extremizes: it drags every estimate toward 50%, even when the evidence genuinely points one way. Averaging a 90% and a 70% should often yield something sharper than 80%, because two semi-independent signals agreeing is itself evidence.
So the working aggregate is a geometric mean of odds — the same combination performed in log-odds space — followed by a modest extremizing step. Crucially, extremizing must be earned: the exponent grows only with genuine mechanism-family diversity (measured by a concentration index across families) and is applied at all only when the set has five or more scenarios. A thin or one-family scenario set gets no sharpening, because its agreement is correlation, not corroboration.
Eight named discounts, each with a trigger
Scenario-based forecasting has known inflation modes: the same bullish mechanism told through three doors, a crowd of optimistic worlds, an elegant story carrying a thin set. The engine guards against each with an explicit, capped correction. Every layer is named, computed from measurable properties of the scenario set, and visible in the construction ledger — including the layers that didn't trigger.
| Layer | Triggers when… | Max |
|---|---|---|
| Overlap discount | one mechanism family carries more than ~40% of scenario weight — correlated stories counted once, not twice | −5pp |
| High-YES saturation | more than half the scenarios are judged ≥70% conditional YES — a suspiciously unanimous crowd | −5pp |
| Weighted YES mass | too much world-weight sits in high-YES paths relative to the spread of the set | −5pp |
| Structural friction | heavy world-weight carries hard structural NO pressure that the optimistic paths ignore | −4pp |
| Conjunction stack | high-YES paths require four or more independent preconditions to all go right | −4pp |
| Time-window friction | the judges' time-feasibility scores say the constructive chains struggle to fit the window | −4pp |
| Low-count fragility | four or fewer scenarios, with one elegant path dominating the result | −4pp |
| Meta contamination | adjudication-style scenarios (about the question, not the world) slipped into the set | −5pp |
Total heuristic correction is capped at 15 percentage points. The corrections are a conservatism device, not a second forecast — they discipline the aggregate; they are not allowed to replace it.
Every step on the record
The result view renders the entire construction as a waterfall: the linear pool at the top, each signed step labelled with its cause, the defended forecast at the bottom. Nothing moves the number without leaving a line in this chart.
This is the feature the product's trust rests on. When the final number is lower than the first-pass aggregate, the ledger says exactly why — “too much of the constructive weight came from overlapping YES paths” is an auditable claim, not a vibe. And when no layer triggers, that is shown too: a diversified, well-spread scenario set earns its sharpness visibly.
Σ weight × P(YES | world) — the naive first-pass aggregate, always shown for reference.
The same combination in log-odds space — resistant to the compression that averaging causes.
A sharpening exponent up to 1.30, earned only by family diversity on sets of 5+ scenarios.
The named list of every correction — triggered or clear — with the measured property that caused it.