Skip to content

Hardening: Deterrence

Family 6 of the Hardening overview. The other families mostly raise corruption cost by spending on verification, which costs information every run. Deterrence raises it off-equilibrium — bonds, clawbacks, tamper-evidence — at near-zero honest-path cost, which is why it is the only route into the cheap-and-incorruptible quadrant. The lever, its limit, the constructions with their cheapest attacks, and two worked bounds with numbers.

Make corruption expensive only when it happens. A bond posted and forfeited on detection, a clawback at resolution, an append-only log that makes tampering discoverable — each costs the honest participant almost nothing yet imposes a large expected cost on an adversary. Following the classical detection × penalty logic (Becker 1968), deterrence is the unique family that raises the corruption axis without raising the information axis, because the penalty is paid off the equilibrium path. This is why the empty quadrant is reachable through deterrence and not through verification.

Deterrence needs two things the field doesn’t yet have for free: enforceable identity and eventually-real detection. Without identity infrastructure the maximum bond BB is effectively zero (an adversary re-keys after forfeiture), and the inspection-game ceiling collapses to the verification regime. Detection must also actually arrive — a tamper-evident log deters nothing if no one ever audits, and a clawback deters nothing on claims that never resolve. And deterrence fails entirely against an adversary who does not internalize the penalty: judgment-proof, one-shot, or indifferent attackers. It is the most powerful family where its preconditions hold and inert where they don’t.

ConstructionBound / estimateDefends againstCheapest attack (≈ cost)Maturity · source
Randomized deep-audit lotterysafe stake S\*=(c+pB)/(1p)S^\*=(c+pB)/(1-p); \approx\52$ below [standard shape]cheap-process corruption up to S\*S^\*keep each attack’s stake just below S\*S^\* (bounded by S\*S^\*)prototyped · (inspection games)
Merkle-logged reasoning tracespost-hoc tamper detection w.p. 1\approx 1 [exact]silent after-the-fact tamperingbe dishonest in the original trace; or rely on no one auditing (≈ free if unaudited)deployed · RFC 6962
Clawback-with-interestpenalty multiplier on false resolutions [standard shape]claims resolving just past the horizonbe judgment-proof — take the credit and exit before clawback (≈ cost of an exit)prototyped · Becker 1968
Provenance-bonded sourcessource stake slashed if later shown corrupt [heuristic]citing sources later revealed biasedstay biased but below the proof threshold (low — detection gap)speculative · (funding effect)
Reputation bankroll with ruinidentity-bound standing funds high-stakes claims [heuristic]sets the enforceable penalty BBSybil — a fresh identity after ruin (≈ cost of an identity, cheap without infra)speculative · (identity infra)
Confidence-scaled bondsbond \propto confidence, forfeited on error [heuristic]overconfidence as a cheap strategyunder-report confidence to dodge the bond (≈ free, but loses persuasive force)speculative · —
Revision rate-limitingmanipulation needs a slow, visible trail [heuristic]instant reversal as a cheap-attack signaturepre-position the manipulation slowly within the limit (≈ patience)speculative · —

The inspection-game frontier. Wrap a cheap process (corruption cost cc) in an audit that escalates to an expensive robust process with probability pp, requiring an enforceable bond BB forfeited on detection. An adversary’s expected payoff (1p)SpBc(1-p)S - pB - c is negative iff

S<S\*=c+pB1p.[standard shape]S < S^\* = \frac{c + pB}{1-p}. \qquad \textbf{[standard shape]}

Plug in a $0.01 LLM-judge call, a 5% audit rate, and a $1,000 bond: S^\*=(0.01+0.05\cdot1000)/0.95\approx\52ofsafestakeboughtbyaoneintwentyaudit.Stripthebond( of safe stake bought by a one-in-twenty audit. Strip the bond (B=0) and the *same* \52 safe stake would require an audit rate of p99.98%p\approx99.98\% — full verification. Robustness is bounded by the maximum enforceable penalty, not by verification spend — the formal reason identity/escrow infrastructure is the field’s most leveraged missing institution, and the cheapest attack is simply to keep each stake below S\*S^\*.

Provenance bonds and the funding-effect discount. Make every cited source post a stake, slashed if it is later shown corrupt. This prices the funding-effect likelihood-ratio discount directly into the mechanism: a source whose favorable conclusions are kk times more likely regardless of truth carries an attenuated likelihood ratio, and the bond makes that expected attenuation a cost the source internalizes ex ante — the deterrence-side counterpart to the identity-masking gap that measures the same bias. Its cheapest defeat is the detection gap: a source biased but never proven corrupt keeps its bond.

  • How much enforceable penalty BB is reachable without identity infrastructure, and how much does that infrastructure raise S\*S^\* across the Process Catalogue’s rows?
  • What is the minimal viable identity/escrow layer that makes bonds binding for AI producers?
  • How do you deter an adversary who is judgment-proof or playing a one-shot game — is there a deterrence analogue that doesn’t rely on a repeated relationship?