The Question Gate & the Outside View — Probability Lab

Stage I

The question gate

Probability Lab refuses to forecast questions that cannot resolve. “Who will win?” has no probability; “What should we do?” is not a claim about the world. The gate enforces the shape that a forecastable question must have: one outcome, one threshold, a resolution date. Anything weaker is rejected — with a reasoned explanation and a suggested rewrite that preserves your intent.

This is not pedantry. Every downstream stage depends on the question being resolvable: the outside view needs a class of comparable historical cases, the tribunal judges need a YES/NO target to argue about, and the time-feasibility checks need a window to measure causal chains against. A vague question silently corrupts all three.

The three components the gate looks for. The resolution date matters twice: it defines when the question resolves, and it gives every later stage a time window to measure causal chains against.

Design principle

The gate behaves like expert guidance, not a blocker. A rejected question always comes back with a one-sentence diagnosis and a proposed rewrite — one click applies it.

Stage II

The outside view: start from history, not the story

The single most reliable finding in forecasting research — from Kahneman's planning-fallacy work to Tetlock's superforecasters — is that good forecasters start from the outside view: before reasoning about this case, ask how often cases of this kind have resolved YES. The inside view (the specific story) comes second, and adjusts from that anchor.

Probability Lab makes this mandatory. Before a single factor or scenario exists, the outside-view analyst identifies two to four reference classes the question belongs to, states each class's historical base rate and what that rate rests on, and grades how well the class actually fits. Weak fit earns a wide, humble prior — not a confident one. The classes combine into a single outside-view prior that the rest of the run must argue against.

A worked example. Each class contributes its historical YES frequency; classes with weaker fit are down-weighted. The combined prior is the number the scenario machinery will later have to justify departing from.

The prior is not decoration. In the synthesis stage it is blended into the final forecast in log-odds space, with a weight that grows — up to 45% — when the inside view is thin or its scenarios disagree widely. A forecast built on three scenarios leans harder on history than one built on fifty. The blend appears as its own labelled step in the construction waterfall, so you can always see exactly how much of the final number is history and how much is story.

Built-in self-test

The coherence probe

Language models — like people — give different probabilities depending on how a question is framed. Probability Lab exploits this failure mode as a diagnostic. During the outside-view stage, the analyst produces two independent holistic estimates: P(YES), asked directly, and P(NO), reasoned freshly from the negated framing. The two are never forced to sum to one.

The probe in action. A small gap is a good calibration sign and is reported as such; a material gap means the model's judgment is frame-sensitive on this question, and the final uncertainty band is widened accordingly.

Reference class

A set of historical cases the question plausibly belongs to, with a measurable YES frequency.

Base rate

How often members of that class resolved YES — the starting point all evidence must move.

Outside-view prior

The applicability-weighted combination of class base rates; blended into the final number in log-odds space.

Coherence gap

The disagreement between the direct and negated framings; widens the band when material.