The Funding Effect

The best-replicated empirical handle the field has on the corruption cost curve: decades of meta-science measuring how much an interested funder shifts a study’s conclusions. Read as a likelihood-ratio discount, it is a real, numbered instance of corruption pulling a process’s epistemic weight toward 1 — and, at the limit, below it.

Why this is the field’s cleanest measured corruption

The Core Model claims a process’s epistemic weight is conditional on its incentive environment — that under interested pressure, the likelihood-ratio profile shifts, and a captured process can sit at or below 1 (anti-informative while looking authoritative). That claim is usually hard to measure. The funding effect is the exception: a large, replicated meta-scientific literature has measured exactly how much a known conflict of interest moves a study’s conclusions, across drugs, devices, nutrition, and tobacco. It is the empirical companion to the Untrustworthy Sources spectrum — a source whose honesty credence $\tau$ is known to be compromised — turned into numbers.

The numbers

All effect sizes below are favorable-conclusion (or favorable-result) measures for industry-funded versus independent work, quoted from the cited reviews:

Domain	Metric	Estimate (95% CI)	Source
Biomedical (foundational synthesis)	OR, pro-industry conclusions	3.60 (2.63–4.91)	Bekelman et al. 2003
Drug/device, favorable results	RR	1.27 (1.17–1.37)	Lundh et al. 2017
Drug/device, favorable conclusions	RR	1.34 (1.19–1.51)	Lundh et al. 2017
Drug/device, result–conclusion concordance	RR	0.83 (0.70–0.98)	Lundh et al. 2017
Nutrition (pooled)	RR, favorable conclusion	1.31 (0.99–1.72, n.s.)	Chartres et al. 2016
Nutrition articles (interventional)	OR, favorable conclusion	7.6 (1.3–45.7)	Lesser et al. 2007
SSB systematic reviews (“no association”)	adj. RR	5.16 (1.30–20.48)	Bes-Rastrollo et al. 2013
Passive-smoking reviews (“not harmful”)	OR	88 (16–477)	Barnes & Bero 1998

The honest pattern is itself the finding. In rigorous, high- $n$ meta-epidemiological data the effect is modest and tight (RR ~1.3, the foundational pooled OR ~3.6); in nutrition and tobacco it is reported as dramatically larger (OR 7–88) but with very wide confidence intervals and smaller samples. The result–conclusion concordance finding (RR 0.83) is the sharpest single number: funded studies’ conclusions skew more favorable than their own results warrant — corruption acting on the codification stage, not the evidence.

From odds ratios to a likelihood-ratio discount

Recast in the Core Model’s terms. A study’s favorable conclusion is evidence for the underlying hypothesis $H$ with likelihood ratio $\text{LR} = P(\text{favorable} \mid H)/P(\text{favorable} \mid \neg H)$ . If funding makes a favorable conclusion roughly $k$ times more likely regardless of whether $H$ is true — the funding effect operating as a multiplier on the report, not the world — then the funded study’s favorable conclusion carries an attenuated $\text{LR}' < \text{LR}$ , with the gap set by $k$ . [heuristic]

Three consequences worth stating precisely:

The weight is discounted, not destroyed. The pharma-scale numbers ( $k \approx 1.3$ ) imply a funded favorable result is worth a noticeable-but-partial fraction of an independent one; the nutrition/tobacco numbers push $\text{LR}' \to 1$ (worthless).
It can flip sign. In the tobacco limit, a glowing industry-funded result is negative evidence for the favorable claim — $\text{LR}' < 1$ — because the conclusion is better explained by capture than by truth. This is the Core Model’s “captured process sits below 1,” observed. “Information loss” undersells it: the update can reverse. The manufactured-doubt histories (Oreskes & Conway 2010; Proctor 2011) document this limiting case as deliberate engineering.
It is recoverable. The discount is mediated by the other robustness features: preregistration, open data, and independent replication claw most of the weight back. The funding signal’s cost is therefore not intrinsic to the source but a function of how much tamper-evidence and deterrence the process carries — which is the constructive reading.

The mechanisms behind the multiplier

The effect is not mysterious sender dishonesty; it is a set of nameable, auditable channels (Sismondo 2008): choice of comparator and dose, selective outcome reporting, publication bias, and ghost-management of the writing itself — internal documents indicate 18–40% of articles on some drugs were managed by industry-hired medical-communications firms (Sismondo 2007). This matters for the field because each channel is a distinct point of attack and audit: the corruption enters at design, analysis, reporting, and authorship as separable links — exactly the kind of decomposition the loss pipeline anticipates, here with the bias term ( $L_{\text{eval}}$ acquiring an optimizer-chosen component) made empirical.

It is also why design-hierarchy rating systems are insufficient on their own: GRADE and similar grade evidence against bias and noise, but industry sponsorship skews conclusions despite design hierarchies — they do not model a strategic funder choosing which studies to run and publish, the open problem flagged in Untrustworthy Sources.

Toward a measured version

This case study is also a template for the incentive audit the field needs to standardize. The fundable project: estimate the likelihood-ratio discount $\text{LR}'/\text{LR}$ as a function of conflict-of-interest type, disclosure regime, and the presence of preregistration/open-data/replication — turning one qualitative row of the Catalogue’s corruption column into measured numbers with intervals. The existing meta-science supplies the corpus and the outcome coding; what it lacks is the likelihood-ratio framing and the recovery-factor estimates.

Open questions

What is the actual functional form of $\text{LR}'/\text{LR}$ versus conflict type, and how much of the discount is clawed back per unit of preregistration, open data, or independent replication?
The drug-data effect is tight (~1.3×) while nutrition/tobacco estimates are large but wide — is the difference real (field capturability) or an artifact of sample size and outcome-coding latitude?
Can an LLM judge replicate human coders’ favorable/unfavorable classification reliably enough to scale this measurement across a large corpus — and would it inherit the same framing biases it is meant to detect?
Does the result–conclusion concordance gap (RR 0.83) generalize as the cleanest, most automatable corruption signal, since it needs no external ground truth — only internal consistency between a study’s data and its stated conclusion?