Insights  ·  Documentation  ·  Probability Lab ← Back to Documentation

Probability Lab

A structured forecasting instrument. Probability Lab does not guess a number — it builds one through reference classes, causal scenarios, adversarial debate, and explicit correction logic, then shows you every step of the construction. This page maps the full method; each feature links to a deeper explainer.

The pipeline

How the engine thinks

Every run moves through seven stages. The first two discipline the question before any story is told; the middle three build and stress-test the inside view; the final two turn machinery into judgment. Nothing is hidden — each stage streams its work into the live analysis view and remains inspectable in the final report.

I Question gate resolvable claim, dated framing II Outside view reference-class base rates III Factors ranked causal drivers IV Scenarios distinct world states V Tribunals advocate · skeptic · judge VI Synthesis pooling, corrections, red team VII Direct answer one defended judgment
The seven stages of a run. The tribunal stage — one adversarial debate per scenario — is the analytical core of the product; the synthesis stage is where the raw aggregate is disciplined into a defensible number.
The promise

A number that shows its work

Most AI tools answer a probability question with a single confident paragraph. Probability Lab treats that as malpractice. The final number is the end of a visible construction: it starts from historical base rates, is rebuilt from causally distinct scenarios that each survive an adversarial debate, and is then deliberately corrected for the failure modes that inflate naive forecasts — overlapping stories, elegant narratives, crowded optimism, and paths that cannot complete inside the time window.

Every run ends with an executive layer (the number, its uncertainty band, a direct answer, what would move it) and a construction layer (the correction waterfall, scenario contributions, full debate transcripts, factor crosswalks). Busy readers stop at the first; skeptical readers can audit the second.

Defended forecast

The headline probability with a verdict label — from “remote” to “highly likely”.

Uncertainty band

A Monte Carlo p10–p90 range propagated from the judges' stated intervals, not a decorative ±5%.

Why this number

Why it isn't higher, why it isn't lower, and which assumptions the forecast leans on hardest.

What would move it

Concrete swing factors and observable tripwires, with timing.

Method explainers

The six features, in depth

Each explainer covers one piece of the machinery: what it does, why it exists, and the forecasting science behind it. Read in order for the full method, or jump to the piece you want to audit.

01

The Question Gate & the Outside View

Why vague questions are rejected at the door, and how reference-class base rates anchor every forecast before a single scenario is written — including the coherence probe that asks the question twice, forwards and negated.

Read →
02

Factors & Scenario Construction

Decomposing the question into ranked causal drivers — friction included — then building causally distinct world states across the bull–bear spread, grouped into honest mechanism families, with the status-quo world enforced.

Read →
03

The Adversarial Tribunal

The core of the product: every scenario is argued by an Advocate, rebutted by a Skeptic, and ruled on by a calibrated Judge who follows a five-step checklist and states an 80% interval — not just a point.

Read →
04

Disciplined Aggregation & Correction Layers

From linear pooling to log-odds pooling with earned extremizing, then eight named correction layers — overlap, saturation, conjunction stacks, time-window friction and more — each visible in the waterfall.

Read →
05

Triangulation & the Red-Team Audit

Three independent lenses — outside view, holistic estimate, scenario machinery — blended in log-odds space, then audited by a final adversarial agent hunting for named biases, with a bounded ±4-point mandate.

Read →
06

Uncertainty, Sensitivity & Calibration

The forecast as a distribution: Monte Carlo propagation of judge intervals, leave-one-out leverage on every scenario, run-integrity trust states, and a running Brier score as your questions resolve.

Read →