Skip to content

Robust Reasoning Processes

A small textbook for a new field — reasoning processes that deliver trustworthy conclusions at known cost and resist corruption.

Robust reasoning processes (RRP) is the study and engineering of procedures that turn effort and evidence into trustworthy conclusions — and keep doing so when interested parties try to corrupt them. Its objects are processes, not people or models: peer review, prediction markets, audits, expert panels, LLM judge pipelines, debate protocols. Its central activity is measurement: what does a process deliver, what does it cost, and what does it cost to corrupt? Civilization runs on reasoning processes that were never benchmarked; AI has just made them studyable — an LLM-based process can be re-run, perturbed, versioned, and attacked on purpose — and made the problem urgent, because the cost of producing plausible reasoning has collapsed while the cost of verifying it has not.

Every process gets two headline numbers: the cost of extracting validated information from it, and the cost an adversary pays to bend its output. Plotting the field’s Process Catalogue on that plane is our version of a periodic table — and the most important feature is the quadrant that is still empty.

The empty quadrant cheap to run, expensive to corrupt — nothing lives here yet Cost per validated bit → Cost to corrupt → cheap expensive easy hard Formal proof Full replication Forecaster track records Real-money market (Kalshi-style) Output-metered market Tournament (Metaculus-style) Play-money market Courts Expert panel Financial audit Peer review Composite indices Online reviews Debate protocol Consistency battery Multi-model LLM panel Single LLM judge Grounded in resolution / track records Institutional judgment AI-native
Positions are version-0 qualitative judgments from the Process Catalogue, drawn to be argued with. Replacing them with measured coordinates is one of the field's central projects.
  • The Judge — the consumer of reasoning, who cannot check the work object-level. What they need: a utility function over information, a way to price information against it, and the process-conditional weights that say how much each source should move them.
  • The Processes — the procedures themselves: their anatomy, what their verdicts ground out in, their measured properties, and the attack-and-hardening cycle that makes them robust.
  • The Environment — the game the processes live inside: producers, resolvers, accreditors, adversaries; the identity and track-record infrastructure every incentive scheme silently assumes; and the market, legal, and cultural constraints.
#ChapterStatus
1Robust Reasoning Processesdraft
2Cruxesdraft
IThe Judge
3The Core Modeldraft
4Epistemic Impact Analysisdraft
5Constructing Utility Functionsdraft
6What Can You Use from an Untrustworthy Source?draft
Interlude: The Funding Effectdraft
IIThe Processes
7The Process Cataloguedraft (v0)
8What Grounds an Oversight Protocol?draft
9Consistency Evaluationsdraft
10What Is a Strong Reasoner?draft
IIIHardening
11Hardening: Overviewdraft
12Calibrationdraft
13Verifiability Asymmetrydraft
14Independence & Decorrelationdraft
15Invariance & Low Sensitivitydraft
16Incentive-Compatibilitydraft
17Deterrencedraft
18Certification and Gymsplanned
IVThe Environment
19Identity and Track-Record Infrastructureplanned
20The Market for Truthplanned
21Law, Liability, and Private Deliberationplanned
22Culture, Consent, and Adoptionplanned
VApplications
23The Reliability Ladderdraft
24Overseeing Automated Researchdraft
25LLM Epistemics in Productiondraft
26Open Problemsaggregated

A Glossary & Notation page indexes the field’s terms and symbols.

Three norms, applied throughout:

  1. Formalisms are graded. Every equation carries an explicit grade — [exact] (correct by construction), [standard shape] (precedent in an adjacent rigorous field), or [heuristic] (a deliberately simple decomposition, not a law). A field about robust reasoning should grade its own reasoning first. See The Core Model.
  2. Failures are published. The deployment write-ups in LLM Epistemics in Production report false-positive rates, overconfidence, and a miscounted competition alongside the wins. Documented corruption and documented failure are the best data the field has.
  3. Positions are dated and versioned. Every page carries a status line; everything here is a working draft of a field that does not exist yet, stated so it can be argued with and eventually replaced by measurement.

This wiki is in an early, exploratory stage. Pages are working notes, not settled positions. It is part of the CAIRN project by QURI.