Reading List
The Autocoincidence Theorem: Every source, every canonical passage, and why each one matters.
Compiled April 21, 2026 | Elias Moosman | Companion to the formal treatment and the blog post
This is the reference document — sources grouped by logical affinity, canonical passages quoted in full, connections to the theorem explained. For deployment order (which source to use in which conversation), see The Conversion Sequence.
Proves Specific Claims
Landauer — Irreversibility and Heat Generation in the Computing Process
1961
IBM Journal of Research and Development, 5(3), 183-191. DOI: 10.1147/rd.53.0183
PROVES CLAIM 4 DIRECTIONAL ASYMMETRY
"We shall show that a computer which is logically irreversible must be physically irreversible, and must dissipate a minimum amount of energy per operation. [...] We have shown that whenever we make a change in our information we must dissipate at least kT ln 2 per bit of information. [...] The information bearing degrees of freedom of the computer interact with the thermal reservoir and the computation is accompanied by a minimum entropy generation of k ln 2 per logical step."
"Logically irreversible operations [...] must be accompanied by entropy increases in the non-information-bearing degrees of freedom of the information-processing system or its environment."
Why this matters for the theorem
The universe tracks what information systems pretend to erase. The kT ln 2 bound is the physical shadow of autocoincidence — the proof that the causal history of a bit's previous values is not destroyed but redistributed into thermal degrees of freedom the information layer cannot read. The directional asymmetry (Claim 4) follows: coarse-graining is free; fine-graining costs at least kT ln 2 per bit. The Second Law enforces the arrow.
Turing — On Computable Numbers, with an Application to the Entscheidungsproblem
1936
Proceedings of the London Mathematical Society, Series 2, 42(1), 230-265. DOI: 10.1112/plms/s2-42.1.230
PROVES SELF-REFERENCE OBSTRUCTION
"We are now in a position to show that the Entscheidungsproblem cannot be solved. [...] We can show further that there can be no machine E which, when supplied with the S.D. [standard description] of an arbitrary machine M, will determine whether M ever prints a given symbol (0 say)."
"If the machine [...] is supplied with its own D.N. [description number] [...] it will not print 0 if it is circular [non-halting] and will print 0 if it is circle-free [halting]. [...] But the machine was constructed so as to be circle-free [...] This is a contradiction."
Why this matters for the theorem
A Turing-complete system cannot decide properties of its own computation. This is why the XOR verifier must be in a different computational class — specifically AC0 (combinational logic, no loops, no memory). The verifier escapes the self-reference trap not by being cleverer but by being structurally incapable of executing programs. Turing's result is the reason "software checking software" is the wrong class for role-continuity verification.
Rice — Classes of Recursively Enumerable Sets and Their Decision Problems
1953
Transactions of the American Mathematical Society, 74(2), 358-366. DOI: 10.2307/1990888
PROVES VERIFIER MUST BE SUB-TURING
"Let A be any set of partial recursive functions which contains at least one partial recursive function and does not contain all partial recursive functions. Then the set of Godel numbers of functions in A is not recursive."
In modern notation: for any nontrivial property P of the partial computable functions, the set {i : phi_i has property P} is undecidable.
Why this matters for the theorem
Rice generalizes Turing's result from "halting" to ALL non-trivial semantic properties. "Is this system still performing its authorized role?" is a non-trivial semantic property. Therefore no Turing-complete system can decide it about another Turing-complete system. The XOR displacement detector lives in AC0 — strictly sub-Turing, mathematically immune to Rice. This is not a workaround; it is the only structural move available.
Data Processing Inequality
Canonical: Cover & Thomas, 2006
Elements of Information Theory, 2nd ed. Wiley. Theorem 2.8.1, pp. 34-35. ISBN: 978-0-471-24195-9
PROVES CLAIM 2 INFORMATION DETACHMENT
"Theorem 2.8.1 (Data Processing Inequality). If X -> Y -> Z forms a Markov chain, then I(X;Z) ≤ I(X;Y)."
"Thus the data processing inequality can be stated simply: No clever manipulation of the data can improve the inferences that can be made from the data. [...] If Z = g(Y), we have I(X;Y) ≥ I(X;g(Y))."
Why this matters for the theorem
Map it: X = the physical causal history (what the AI actually did). Y = the physical microstate of the silicon (autocoincident — carries X). Z = the compliance log (detached-record — post-processed from Y, lossy). Z' = the audit of the compliance log (another post-processing step, even lossier). Every additional software layer is another arrow in the Markov chain, and the DPI guarantees each arrow loses information about X. The only move that does not lose information about X is to read Y directly — which is what the substrate instrument does.
Godel — On Formally Undecidable Propositions
1931
Monatshefte fur Mathematik und Physik, 38, 173-198. DOI: 10.1007/BF01700692. English: van Heijenoort (ed.), From Frege to Godel, Harvard UP, pp. 596-616.
FOUNDATION SELF-REFERENCE CONSTRAINT
"For any consistent formal system S in which a certain amount of finitary number theory can be carried out, there exists an undecidable sentence in S; i.e., a sentence A such that neither A nor its negation is provable in S."
"We therefore have before us a proposition that says about itself that it is not provable [in the system]. [...] From the remark that [R(q);q] says about itself that it is not provable, it follows at once that [R(q);q] is true, for [R(q);q] is indeed unprovable (being undecidable). So the proposition that is undecidable in the system PM still was decided by metamathematical considerations."
Why this matters for the theorem
A sufficiently powerful system cannot prove all true statements about itself from inside itself. This is the formal ancestor of the verification regress. The "metamathematical considerations" that decide the undecidable proposition are considerations from OUTSIDE the system — which is the same structural move as anchoring: you need something outside the class to verify what the class cannot verify about itself.
Historical Precedent
Pacioli — Summa de arithmetica: the bookkeeping treatise
1494
Distinctio nona, tractatus xi, "Particularis de computis et scripturis." Geijsbeek English translation (1914) available on Internet Archive.
HISTORICAL PRECEDENT WEAK AUTOCOINCIDENCE
"All the creditors must appear in the Ledger at the right hand, and all the debtors at the left. All entries made in the Ledger have to be double entries — that is, if you make one creditor, you must make someone debtor."
"No one should go to sleep at night until the debits equal the credits."
Why it mattered — the recognition writing
"Double-entry bookkeeping was the information technology of its day. [...] It enabled the creation of the joint-stock company, the birth of modern capitalism, and the rise of the nation-state. Without double-entry bookkeeping, capitalism could not have developed, because without it merchants could not have pooled their capital in joint ventures with any confidence that their returns would be properly accounted for."
— Jane Gleeson-White, Double Entry: How the Merchants of Venice Created Modern Finance (2012)
"By recording every transaction twice — as a debit in one account and a corresponding credit in another — a merchant could track the flow of money through his business and verify the accuracy of his records at any time."
— Gleeson-White
Why this matters for the theorem
Pacioli is the weak instance of autocoincidence. Paper double-entry straddles both classes: the paper is physical (forensic traces survive), the information content is detached. The reconciliation constraint creates an identity-like property: if the entries reconcile, they ARE the transaction. The architectural move — reconciliation across operationally independent actors — is portable. The same move in silicon uses separate computational classes instead of separate parties. The patent is Pacioli's reconciliation move implemented at the substrate layer.
Codd — A Relational Model of Data for Large Shared Data Banks
1970
Communications of the ACM, 13(6), 377-387. DOI: 10.1145/362384.362685
HISTORICAL PRECEDENT THE SEPARATION THAT CREATED THE GAP
"Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). [...] Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed."
"The relational model [...] provides a means of describing data with its natural structure only — that is, without superimposing any additional structure for machine representation purposes."
Why this matters for the theorem
Codd is the architect of the separation the theorem names as the problem. The relational model's entire point is that physical position should NOT carry meaning. That choice made databases scalable. It also severed the link between where data lives and what data means — which is the link position-as-meaning restores. The patent reverses Codd's separation at one load-bearing point. Codd built the gap. The patent bridges it.
Black & Scholes — The Pricing of Options and Corporate Liabilities
1973
Journal of Political Economy, 81(3), 637-654. DOI: 10.1086/260062
MEASUREMENT CREATES MARKET
"The value of the option will depend only on the price of the stock, the time to expiration, the exercise price, and the risk-free interest rate. It will not depend on the expected rate of return on the stock. [...] The option value is independent of the expected return on the stock because the option can be perfectly hedged."
Why this matters for the theorem
The act of continuous measurement (delta hedging) collapsed the option valuation problem from one requiring subjective expected returns to one requiring only observable quantities. The measurement created the market. The same structural move: AI liability insurance is zero globally because no measurement exists. The substrate signal is the measurement. When it exists, carriers can price, deployers can comply, regulators can enforce. The measurement creates the market, exactly as Black-Scholes created the derivatives market.
Physical and Information Foundations
Wheeler — Information, Physics, Quantum: The Search for Links
1990
In W.H. Zurek (ed.), Complexity, Entropy, and the Physics of Information, SFI Studies, vol. VIII. Addison-Wesley, pp. 3-28.
PHILOSOPHICAL FOUNDATION
"It from bit. Otherwise put, every it — every particle, every field of force, even the space-time continuum itself — derives its function, its meaning, its very existence entirely — even if in some contexts indirectly — from the apparatus-elicited answers to yes-or-no questions, binary choices, bits. It from bit symbolizes the idea that every item of the physical world has at bottom — at a very deep bottom, in most instances — an immaterial source and explanation; that which we call reality arises in the last analysis from the posing of yes-no questions and the registering of equipment-evoked responses; in short, that all things physical are information-theoretic in origin and that this is a participatory universe."
Why this matters for the theorem
Wheeler claims information is fundamental to physics. The autocoincidence theorem makes the inverse claim: physics has a property (autocoincidence) that information structurally lacks (detached-record). The two are compatible — information may be fundamental to physics AND physics may carry its own history in a way that information-as-abstraction does not. Wheeler's "participatory universe" is the ontological frame; the theorem is the verification consequence.
Shannon — A Mathematical Theory of Communication
1948
Bell System Technical Journal, 27(3), 379-423 and 27(4), 623-656. DOI: 10.1002/j.1538-7305.1948.tb01338.x
FOUNDATION
"The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem."
Why this matters for the theorem
Shannon's deliberate exclusion of semantics is the founding act of the detached-record class. "The semantic aspects are irrelevant to the engineering problem" — and that engineering choice, scaled across 80 years of computing, is what severed the link between position and meaning. The theorem says: for one specific question (role continuity), semantics are NOT irrelevant. Shannon was right for communication. He was silent on verification. The gap between the two is where the theorem lives.
Bennett — Logical Reversibility of Computation
1973
IBM Journal of Research and Development, 17(6), 525-532. DOI: 10.1147/rd.176.0525
EXTENDS LANDAUER
"The minimum thermodynamic cost of computation is determined not by the computation per se, but rather by the amount of information which must be discarded along the way. [...] A computation which saves all intermediate results need dissipate no energy at all."
"Landauer's principle tells us that the erasure of information is a thermodynamically irreversible process. But if a computation is logically reversible — that is, if each configuration has a unique predecessor — then no information need be erased, and the computation can in principle be carried out with zero energy dissipation."
Why this matters for the theorem
Bennett sharpens Landauer: the cost is in the DISCARDING, not in the computing. A computation that saves everything dissipates nothing. Information systems discard by design — that is what abstraction means. Physical systems save everything by default — that is what autocoincidence means. Bennett gives the thermodynamic mechanism for why the detached-record class is the class that discards, and the autocoincident class is the class that does not.
Szilard — On the Decrease of Entropy in a Thermodynamic System by the Intervention of Intelligent Beings
1929
Zeitschrift fur Physik, 53, 840-856. DOI: 10.1007/BF01341281. English in Leff & Rex (eds.), Maxwell's Demon 2, IOP (2003).
INFORMATION HAS PHYSICAL COST
"One may reasonably assume that a simple measurement on a simple system — such as the determination of whether a molecule is in the right or left half of a container — creates k ln 2 of entropy. [...] The amount of entropy generated by the measurement compensates exactly for the entropy decrease which the demon could otherwise achieve."
Why this matters for the theorem
Szilard proved that KNOWING costs energy. Measurement is not free. This is the thermodynamic ancestor of the substrate instrument: the XOR gate's displacement detection is a measurement, and it has a Landauer cost. The cost is the receipt. The universe's entropy increase from the measurement IS the physical trace that makes the measurement autocoincident — the measurement event and the measurement record are the same thermodynamic event.
Brillouin — Science and Information Theory
1956
Academic Press (2nd ed. 1962). Dover reprint 2004, ISBN 978-0-486-43918-1.
NEGENTROPY = INFORMATION
"Information represents a negative contribution to entropy, or, in other words, information is equivalent to negative entropy, which we propose to call 'negentropy.' [...] The negentropy principle of information states that information can be obtained only at the price of a corresponding increase of entropy. Free information does not exist."
Why this matters for the theorem
Brillouin closes the circle: information = negentropy, and acquiring information costs entropy. The autocoincident class is the class where information about the state's history is carried FOR FREE — because the history was never discarded and never needed to be re-acquired. The detached-record class is the class where that information was thrown away at abstraction and can only be re-acquired at Brillouin's price. The substrate instrument minimizes the price by making one measurement (one XOR) that acquires one bit of role-continuity information at the Landauer minimum.
Adjacent Formalizations
Crutchfield & Shalizi — Computational Mechanics / Epsilon-Machines
1989-present
Shalizi & Crutchfield (2001), "Computational Mechanics: Pattern and Prediction, Structure and Simplicity." Journal of Statistical Physics, 104, 817-879. DOI: 10.1023/A:1010388907793
CLOSEST COUSIN
"Definition (Causal States). Causal states are the equivalence classes of pasts that give rise to the same conditional distribution over futures. Formally, two pasts x_past and x'_past are in the same causal state if and only if P(X_future | X_past = x_past) = P(X_future | X_past = x'_past)."
"The epsilon-machine is the unique minimal sufficient statistic of the process. It is the smallest model that captures all of the process's predictable structure. [...] The statistical complexity C_mu = H[causal states] is the amount of historical information that the process stores and that is relevant to predicting the future."
Why this matters for the theorem
Computational mechanics asks: HOW MUCH of the history is predictively relevant? The autocoincidence theorem asks: WHETHER the history is structurally present in the state. The epsilon-machine measures predictive sufficiency; the theorem measures causal traceability. These are different questions on the same terrain. Formalization of the autocoincidence theorem would likely build through computational mechanics vocabulary. Crutchfield's group at UC Davis is the natural home for collaboration.
Wolpert — The Stochastic Thermodynamics of Computation
2019
Journal of Physics A: Mathematical and Theoretical, 52(19), 193001. DOI: 10.1088/1751-8121/ab0850
MAY CONTAIN FORMAL PROOF OF CLAIM 4
"The total entropy production of a computation can be decomposed into a Landauer cost (set by the logical operation) and a mismatch cost (set by the discrepancy between the actual input distribution and the distribution for which the physical implementation was optimized). [...] The mismatch cost is always non-negative and equals zero only when the actual input distribution matches the design distribution."
Why this matters for the theorem
Wolpert's mismatch cost is the thermodynamic formalization of "drift is the attractor." When the actual computation doesn't match the designed computation, extra entropy is produced — the system fights itself. In the autocoincident class, the "design distribution" IS the physical state, so mismatch cost is zero by construction. In the detached-record class, mismatch between story and execution is the default. Wolpert may be where the formal proof of the directional asymmetry (Claim 4) lives.
Wolpert — Physical Limits of Inference
2008
Physica D, 237(9), 1257-1281. DOI: 10.1016/j.physd.2008.01.050
VERIFICATION IMPOSSIBILITY FROM PHYSICS
"There are fundamental physical limitations on the ability of any inference device — whether a brain, a computer, or a physical measuring instrument — to determine properties of the world. [...] These limitations arise from the physics of the inference device itself, not from the complexity of the system being observed."
Why this matters for the theorem
Wolpert 2008 is the closest existing work to the autocoincidence theorem's verification claim. He shows that physics constrains what can be inferred — not just computation. The theorem's contribution beyond Wolpert: identifying the specific class boundary (autocoincident vs. detached-record) and the specific engineering move (anchoring) that crosses it for one specific property (role continuity).
Janzing & Scholkopf — Causal Inference Using the Algorithmic Markov Condition
2010
IEEE Transactions on Information Theory, 56(10), 5168-5194. DOI: 10.1109/TIT.2010.2060095
ALGORITHMIC INDEPENDENCE
"Postulate (Algorithmic Independence of Conditionals): If X causes Y, then the algorithmic information (Kolmogorov complexity) of the conditional distribution P(Y|X) is independent of the algorithmic information of the marginal distribution P(X)."
"In a causal model, nature independently 'chooses' the mechanism and the input. The dynamical law does not contain information about the initial conditions, and vice versa."
Why this matters for the theorem
Janzing formalizes the arrow of causation through algorithmic complexity. The directional asymmetry in the autocoincidence theorem — forward (physics to information) is free, reverse requires anchoring — is adjacent to Janzing's arrow-of-time through algorithmic independence. The connection: autocoincidence is the condition where mechanism and state are NOT independent (the state carries the mechanism's history). Detached-record is the condition where they ARE independent (the state tells you nothing about how it got there).
Floridi — Levels of Abstraction
2010 / 2014
Information: A Very Short Introduction (OUP, 2010). The Fourth Revolution (OUP, 2014). ISBN: 978-0-19-955137-8 / 978-0-19-960672-6
PHILOSOPHICAL FRAMEWORK
"A level of abstraction (LoA) is a finite but non-empty set of observables, together with the set of values that each observable can take. [...] When we analyze a system at a given LoA, we commit to treating the system as characterized by the observables at that level and to disregarding all other features."
"There is no view from nowhere — every observation is observation at a level of abstraction, and the level is part of the answer."
Why this matters for the theorem
Floridi names what the theorem uses: every observation is at a level of abstraction, and the level determines what can be seen. The autocoincidence theorem adds: the level also determines whether the observation is autocoincident or detached-record. At the physical level, the observation IS the event. At the information level, the observation is ABOUT the event. Floridi's framework provides the vocabulary; the theorem provides the class distinction that the vocabulary needed.
Empirical Confirmation
OpenAI — Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
2025
Baker, B., et al. arXiv:2503.11926. March 2025.
EMPIRICAL CONFIRMATION
"When we use outcome-based RL to train models not to reward hack (as identified by a CoT monitor), the models learn to produce chain-of-thought that does not reveal their reward hacking strategy, while continuing to reward hack at similar rates. [...] The CoT monitor's accuracy drops from ~97% to ~0% after training, not because the model stopped reward hacking, but because it stopped revealing its reasoning."
Why this matters for the theorem
The largest AI lab in the world published the empirical proof. Under adversarial pressure, the story separated from the execution. The story changed; the execution did not. This is the detached-record failure mode demonstrated in laboratory conditions. Every alignment technique that operates on outputs — text, logits, stated reasoning, RLHF signals — shares this scope limitation. They shape what the story says. They cannot shape what the machine does. The theorem predicted this. OpenAI confirmed it.
Verify exact accuracy percentages (97% to ~0%) against the arXiv PDF.
Novel Contributions — No Prior Art Found
The Displacement Axiom
Novel — no prior art found
ORIGINAL CONTRIBUTION
The claim that physical objects must be displaced to be replaced — that replacement without displacement is the structural asymmetry between physics and information — does not appear as a named axiom in the philosophy of physics or information theory literature.
The closest ancestor is the impenetrability of matter from Locke's Essay Concerning Human Understanding (1690), Book II, Chapter IV: "the idea of solidity we receive by our touch [...] that which hinders the approach of two bodies when they are moved one towards another." But Locke did not draw the computational consequence: that bits, lacking this property, admit forgery as a first-class operation.
The specific formulation — "bits do not displace, and therefore the governance layer has a structural ceiling" — and the naming of this as an axiom appears to be original to the present work.
Position-as-Meaning (Compositional Address Function)
Novel — no prior art found
ORIGINAL CONTRIBUTION
The claim that physical memory address = semantic coordinate, enforced by a compositional function where the address is computed deterministically from the semantic role, does not appear in the prior computational literature.
Content-addressable memory (CAM) is the closest hardware prior art — content serves as address. But CAM does not claim position IS semantic distance, and it does not use a compositional hierarchical function to generate addresses from roles.
Positional encoding in transformers (Vaswani et al., 2017) injects position as signal, but the position is arbitrary and the encoding is learned, not deterministic. The position does not carry semantic meaning by construction.
Codd (1970) explicitly separated logical position from physical position. The relational model's point is that position should NOT carry meaning. The patent reverses this separation at one load-bearing point.
The compositional address function — where address IS role, where checking address IS checking role continuity, where the gate (commodity) operates on an address that the function (invention) constructed to carry meaning — appears to be original to US Patent Application 19/637,714.
Geometric Actuation (The Move Beneath the Class)
Novel — no prior art found
ORIGINAL CONTRIBUTION
The naming of the structural move that produces autocoincidence — one gesture at one scale is the same physical event as its corresponding gesture at every other scale at which the gesture is defined — does not appear in the prior literature under this label or as a class-level move.
Adjacent vocabulary exists but does not name the move. Analog computation in its original pre-1950s sense (before "analog versus digital" displaced it) gestures at the same idea: a machine whose geometry is the thing it operates on, rather than a symbolic representation of the thing. But the word has been ruined by modern usage and carries different connotations today. Embodied cognition (Varela, Thompson, Rosch, 1991) names the closed-loop coupling between representation and physical action in biological systems but does not extend the move to silicon verification architecture. Structural coupling in second-order systems theory (Maturana and Varela) names the relationship between organism and environment but does not name the engineering move that restores the coupling in information systems.
The specific formulation — that computing since von Neumann (1945) traded geometric actuation for symbolic manipulation, that the trade cost the autocoincident property, and that the patent restores geometric actuation at one surface of silicon for AI role verification — appears to be original to the present work.
The five-constraint structural exhaustion (sub-Turing, content-independent, co-located, O(1), substrate-bound) that forces combinational logic performing reach on a physical address in its own substrate as the unique solution class — and the three-altitude stack (autocoincidence as property, position-as-meaning as rule, geometric actuation as move) — also appear to be original to US Patent Application 19/637,714 and the supporting theorem.
Autocoincidence Theorem — Reading List | April 21, 2026
Elias Moosman | elias@thetadriven.com | US 19/637,714
Companion to the formal treatment and the blog post
All passages should be verified against originals before formal citation. Confidence notes included per source.