Hardening: Deterrence
Family 6 of the Hardening overview. The other families mostly raise corruption cost by spending on verification, which costs information every run. Deterrence raises it off-equilibrium — bonds, clawbacks, tamper-evidence — at near-zero honest-path cost, which is why it is the only route into the cheap-and-incorruptible quadrant. The lever, its limit, the constructions with their cheapest attacks, and two worked bounds with numbers.
The lever
Section titled “The lever”Make corruption expensive only when it happens. A bond posted and forfeited on detection, a clawback at resolution, an append-only log that makes tampering discoverable — each costs the honest participant almost nothing yet imposes a large expected cost on an adversary. Following the classical detection × penalty logic (Becker 1968), deterrence is the unique family that raises the corruption axis without raising the information axis, because the penalty is paid off the equilibrium path. This is why the empty quadrant is reachable through deterrence and not through verification.
The limit
Section titled “The limit”Deterrence needs two things the field doesn’t yet have for free: enforceable identity and eventually-real detection. Without identity infrastructure the maximum bond is effectively zero (an adversary re-keys after forfeiture), and the inspection-game ceiling collapses to the verification regime. Detection must also actually arrive — a tamper-evident log deters nothing if no one ever audits, and a clawback deters nothing on claims that never resolve. And deterrence fails entirely against an adversary who does not internalize the penalty: judgment-proof, one-shot, or indifferent attackers. It is the most powerful family where its preconditions hold and inert where they don’t.
Constructions
Section titled “Constructions”| Construction | Bound / estimate | Defends against | Cheapest attack (≈ cost) | Maturity · source |
|---|---|---|---|---|
| Randomized deep-audit lottery | safe stake ; \approx\52$ below [standard shape] | cheap-process corruption up to | keep each attack’s stake just below (bounded by ) | prototyped · (inspection games) |
| Merkle-logged reasoning traces | post-hoc tamper detection w.p. [exact] | silent after-the-fact tampering | be dishonest in the original trace; or rely on no one auditing (≈ free if unaudited) | deployed · RFC 6962 |
| Clawback-with-interest | penalty multiplier on false resolutions [standard shape] | claims resolving just past the horizon | be judgment-proof — take the credit and exit before clawback (≈ cost of an exit) | prototyped · Becker 1968 |
| Provenance-bonded sources | source stake slashed if later shown corrupt [heuristic] | citing sources later revealed biased | stay biased but below the proof threshold (low — detection gap) | speculative · (funding effect) |
| Reputation bankroll with ruin | identity-bound standing funds high-stakes claims [heuristic] | sets the enforceable penalty | Sybil — a fresh identity after ruin (≈ cost of an identity, cheap without infra) | speculative · (identity infra) |
| Confidence-scaled bonds | bond confidence, forfeited on error [heuristic] | overconfidence as a cheap strategy | under-report confidence to dodge the bond (≈ free, but loses persuasive force) | speculative · — |
| Revision rate-limiting | manipulation needs a slow, visible trail [heuristic] | instant reversal as a cheap-attack signature | pre-position the manipulation slowly within the limit (≈ patience) | speculative · — |
Worked bounds
Section titled “Worked bounds”The inspection-game frontier. Wrap a cheap process (corruption cost ) in an audit that escalates to an expensive robust process with probability , requiring an enforceable bond forfeited on detection. An adversary’s expected payoff is negative iff
Plug in a $0.01 LLM-judge call, a 5% audit rate, and a $1,000 bond: S^\*=(0.01+0.05\cdot1000)/0.95\approx\52B=0) and the *same* \52 safe stake would require an audit rate of — full verification. Robustness is bounded by the maximum enforceable penalty, not by verification spend — the formal reason identity/escrow infrastructure is the field’s most leveraged missing institution, and the cheapest attack is simply to keep each stake below .
Provenance bonds and the funding-effect discount. Make every cited source post a stake, slashed if it is later shown corrupt. This prices the funding-effect likelihood-ratio discount directly into the mechanism: a source whose favorable conclusions are times more likely regardless of truth carries an attenuated likelihood ratio, and the bond makes that expected attenuation a cost the source internalizes ex ante — the deterrence-side counterpart to the identity-masking gap that measures the same bias. Its cheapest defeat is the detection gap: a source biased but never proven corrupt keeps its bond.
Open questions
Section titled “Open questions”- How much enforceable penalty is reachable without identity infrastructure, and how much does that infrastructure raise across the Process Catalogue’s rows?
- What is the minimal viable identity/escrow layer that makes bonds binding for AI producers?
- How do you deter an adversary who is judgment-proof or playing a one-shot game — is there a deterrence analogue that doesn’t rely on a repeated relationship?