Skip to content

Open Problems

Every chapter maintains its own open-questions section; this page aggregates them. The five problems at the top are the ones we currently consider load-bearing for the whole field.

  1. Can trustworthy resolution be extended faster than adversaries learn to game it? Retrodiction, selection protocols, and consistency floors all expand the resolvable set; all are gameable. The field is buildable if and only if this race is winnable (Robust Reasoning Processes).
  2. Is “cost per validated bit” well-defined across output types? Probabilities, rulings, models, and rankings need a common currency before the catalogue’s comparisons become measurements (The Process Catalogue).
  3. How much does consistency constrain correctness — especially under optimization pressure? The observed consistency–accuracy correlation comes from models not optimizing against the metric (Consistency Evaluations).
  4. Does composition multiply corruption costs, or merely add the weakest link? The constructive bet of the whole catalogue — cheap filters with randomized escalation to robust processes — is an unmeasured conjecture (The Process Catalogue, What Grounds an Oversight Protocol?).
  5. How fast does the corruption cost curve fall as attacker capability rises? The corruption-capacity curve, per process, is among the most decision-relevant unmeasured quantities in AI oversight (The Core Model).
  • Cruxes — the strategic uncertainties: scaffolding vs. base models, demand, differential safety, lock-in.
  • Epistemic Impact Analysis — consumer-agent adequacy, divergence measures for rich beliefs, efficient profundity, adversarial information, valuing question discovery.
  • Constructing Utility Functions — aggregating structural disagreement, weight drift and its auditors, required elicitation precision under optimization pressure.
  • Untrustworthy Sources — whether the deception conjunction is complete, when reproduction fails to close the gap under correlated error, whether “deception affordance” is the right central object, whether form-level robustness generalizes, composed arguments, Goodhart dynamics once the map is public, and how much design-time advantage survives when the defender cannot credibly commit.
  • The Process Catalogue — the common currency problem, corruption-cost stability over time, intrinsic vs. infrastructure-artifact weaknesses.
  • What Grounds an Oversight Protocol? — head-to-head empirics across groundings, retrodiction’s contamination ceiling, collusion in peer prediction, who controls decomposition, cheap conservative routing out of the deception conjunction.
  • Consistency Evaluations — consistency-vs-correctness, the cost-per-bit portfolio of checks, attacker–defender equilibrium.
  • What Is a Strong Reasoner? — which properties are load-bearing, cross-domain trust transfer, track records across model versions.
  • The Reliability Ladder — whether the tiers are real, what tier 5’s minimal verification standard is, whether lower tiers can bootstrap upper ones.
  • Overseeing Automated Research — provable incentive guarantees, surprise-vs-validation credit splits, valuing exploratory work, minimal meta-loop separation.
  • LLM Epistemics in Production — net-positive false-positive rates, whether grounding predicts evaluator failure, fixing overconfidence at the system level.