Probability Lab  ·  Method explainer 03 ← Back to Probability Lab

The Adversarial Tribunal

A scenario that has never been argued against is not evidence — it is a pitch. Every world Probability Lab builds is put on trial: an Advocate makes the strongest honest case for YES, a Skeptic makes the strongest honest case for NO, and a calibrated Judge rules with numbers, intervals, and reasons.

Stage V

Three roles, one world on trial

For each scenario, three specialised agents convene. This is deliberately judicial rather than conversational: the roles have opposing mandates, each case has required elements it must contain, and eloquence earns nothing. The structure exists because single-pass estimates inherit whichever framing the model happened to adopt — forcing the strongest case on both sides before any number is assigned is the most reliable de-biasing device available.

A THE ADVOCATE the case for YES — must contain: 1  the specific causal mechanism 2  the strongest historical precedent 3  the magnitude required, and why achievable 4  the single most probative current evidence S THE SKEPTIC the case for NO — must contain: 1  the weakest link in the causal chain 2  a count of what must all go right 3  a base-rate objection 4  a historical false-positive analogue J THE JUDGE neutral, calibrated, checklist-bound — rules with probabilities and an interval
Both cases have mandatory elements — no vibes, no rhetoric. The Skeptic's “count of what must all go right” feeds directly into the conjunction checks downstream.

Calibration by construction

The judge's checklist

The Judge is the only role allowed to assign numbers, and is bound to a five-step checklist drawn from the calibration literature. The order matters — the base rate comes first, so the specific story has to move the number rather than set it.

1 Start from the outside-view base rate the reference-class prior is the anchor; evidence moves it, stories do not 2 Count independent preconditions, discount multiplicatively a path needing five things to go right is rarer than any of its parts 3 Check the time window can the causal chain actually complete before the resolution date? 4 Penalize narrative vividness by the conjunction rule, a detailed story is less probable than its components — never more 5 State an 80% interval, not just a point the interval feeds the Monte Carlo uncertainty propagation downstream
Step 4 targets the conjunction fallacy directly: vivid, specific scenarios feel more probable precisely when they should be judged less probable.

The ruling

Two probabilities per world — and the honesty around them

Each ruling separates two questions that casual forecasting conflates: P(world) — how likely is it that this scenario actually occurs — and P(YES | world) — if it does occur, how likely is the outcome to resolve YES. The product of the two, normalised across the set, is the scenario's weighted contribution to the forecast. A thrilling world that is 8% likely moves the number far less than a dull world at 30%.

RULING — “INSTITUTIONAL FLYWHEEL RE-ACCELERATES” P(world) 22% chance this world occurs P(YES | world) 62% 80% interval: 45–78% — the judge's stated uncertainty structural checks 4 preconditions time feasibility 55% weighted contribution = P(world) × P(YES | world), normalised across the set → drives the aggregate
A ruling in full. The interval, precondition count, and time-feasibility score are not commentary — each feeds a specific downstream mechanism (Monte Carlo, conjunction discount, time-window discount).
Independence by design

Tribunals run independently — no judge sees another scenario's numbers, so rulings cannot anchor on each other. Disagreement between Advocate and Skeptic is itself recorded (low / moderate / severe) and displayed, because a scenario the roles fought over deserves more scrutiny than one they agreed about.

P(world)

The judged probability that this scenario actually occurs.

P(YES | world)

The judged probability of the outcome, conditional on the scenario occurring — with an 80% interval.

Preconditions

The count of independent things that must all go right; long chains trigger the conjunction discount.

Time feasibility

Whether the causal chain can complete inside the resolution window; low scores trigger the time discount.