The Experiment the Industry Is Already Running

📉The Curve They Found

Three papers in four months. Different labs. Different methods. The same curve.

January 2026. Researchers studying multi-agent LLM systems over extended interactions measured three types of drift: semantic, coordination, and behavioral. Semantic drift — progressive deviation from original intent — emerged earliest. By 600 interactions, nearly half the agents had drifted. The per-step error rate did not stay constant. It rose. The agents anchored onto their own prior outputs, compounding displacement with each step. (Agent Drift: Quantifying Behavioral Degradation)

February 2026. A mathematical analysis modeled MCP (Model Context Protocol) exchanges as a bounded-difference martingale and derived high-probability bounds on cumulative semantic distortion. The finding: without periodic forced re-grounding, the variance escapes to infinity. The system must be snapped back to an objective baseline to prevent cascading failure. The re-grounding is not optional. It is the only mechanism that keeps the error bounded. (Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis)

April 2026. The HORIZON benchmark evaluated frontier agents (GPT-5, Claude) across 3,100+ trajectories on long-horizon tasks. Performance did not gradually decline. It experienced a structural shift. Failures moved from simple logic errors to semantic drift — the agent's understanding of what it was doing diverged from the task it was authorized to perform. The longer the horizon, the more dominant the drift. (The Long-Horizon Task Mirage)

Three labs. Three methods. One shape: drift compounds, it has a phase transition, and re-grounding is the only known intervention.

📉 A → B 🔬

🔬What the Data Already Shows

The industry is running the experiment. It did not intend to. The experiment is: what happens when you scale agentic AI without a physical anchor for role continuity?

The answer is in the papers. The agents drift. Not because they are poorly trained. Not because the models are too small. Not because the prompts are wrong. The agents drift because the substrate they run on is a detached-record system — the symbols can change without the change leaving a physical trace. The record and the event are separate. The gap is where the drift lives.

Standard scaling theory predicted that increasing parameter count and context length would flatten the error curve. The data contradicts this. The HORIZON benchmark shows that frontier models with the largest context windows still experience structural failure on long horizons. Context length is not the variable. Class is the variable.

The book names this as the default architecture, not an accident. From Tesseract Physics, § The Hyperion Allthing:

Gibson gave us the dark mirror. Villa Straylight, where the Tessier-Ashpools cloned themselves into recursive wealth and recursive madness, copying the genome until the genetic drift drove them insane. Straylight is what happens when you build consensus without verification. The family trusted its own reflection. The reflection drifted. The hallucination persisted until the substrate collapsed... That is the normalised enterprise running Trust Debt at scale. That is the AI system that validates its own outputs with its own embeddings. Straylight is not fiction. It is the default architecture.

The HORIZON benchmark is the experiment running that scenario, in production, on every frontier model now in deployment.

A separate analysis from May 2025 challenges the uniform-compounding model entirely: errors are not uniformly distributed independent events. They concentrate at key decision points — the moments where the agent crosses a boundary between one semantic context and another. (Beyond Exponential Decay: Rethinking Error Accumulation)

In the autocoincident frame, those key decision points are cache-line boundary crossings. Each crossing dissipates k_E bits of positional certainty. The errors concentrate at the crossings because the crossings are where the substrate records displacement. The data from this paper is consistent with the crossing-tax model: drift does not accumulate uniformly. It accumulates at boundaries.

The industry is measuring the detached-record curve. Every benchmark, every trajectory, every failure mode in these papers is a measurement of what happens in the class where the record and the event are separate. No paper has yet measured the autocoincident curve — what happens when the record and the event are the same physical object. That is the experiment we are calling for.

📉🔬 B → C 📐

📐The Prediction

The class distinction predicts two curves.

Curve 1: Detached-record (current systems). Drift variance is unbounded. The martingale analysis proves this mathematically: without forced re-grounding, the error escapes to infinity. Periodic re-grounding (snapping the system back to a baseline) keeps the error bounded but does not eliminate it — each re-grounding cycle restarts the drift from a lower level, producing a sawtooth pattern of drift-correct-drift-correct. The cost of re-grounding scales with the frequency required. The system is never in contact with the ground truth continuously. It visits the ground truth periodically and drifts between visits.

Curve 2: Autocoincident anchor (the prediction). When the compositional address function binds role continuity to the physical address, every reach is a verification. The drift signal is not periodic — it is continuous. The system does not need to be re-grounded because the grounding is structural: every access that lands at the correct coordinate is a confirmation. Every access that crosses a boundary is a detection. The predicted curve is mean-reverting — an Ornstein-Uhlenbeck process where the restoring force is the geometry of the address space itself. The system does not drift and correct. The system is continuously held by the arrangement that placed the data.

The shape of the curve changes. Not the error rate. The class of the process. Unbounded variance becomes bounded variance. Periodic re-grounding becomes continuous contact. Sawtooth becomes asymptotic.

We know the shape. The exact location of the mean-reversion constant — how tight the bound is, how fast the restoring force acts — is an empirical question. The experiment answers it.

📉🔬📐 C → D 🧪

🧪The Experiment

Current benchmarks measure the wrong property. They evaluate execution failure — did the API crash, did the script throw an error. They do not evaluate intent degradation — did the system silently unmoor its semantic map from the task it was authorized to perform while continuing to operate.

The experiment we are calling for measures intent degradation directly.

Setup. An agentic AI system performing a multi-step task over N steps (N = 100 to 10,000). Two conditions: (A) standard ungrounded memory (vector store, RAG, standard context window). (B) ShortRank compositional address function binding the task hierarchy to physical memory coordinates.

Measurement. At each step, compute the vector distance between the agent's internal representation of its current task-state and the ground-truth task-state. Plot the distance as a function of step count. The plot is the experiment.

Prediction for Condition A (ungrounded). The vector distance will grow. It will show the same structural shift the HORIZON benchmark found — gradual logic errors early, then a phase transition to semantic drift. The martingale bound predicts the shape: without re-grounding, the variance escapes. With periodic re-grounding, the sawtooth appears.

Prediction for Condition B (autocoincident anchor). The vector distance will oscillate around a mean. The oscillation amplitude is bounded by the geometry of the address space. Drift that would escape in Condition A is caught at the boundary crossing — the cache-line eviction fires, the hardware counter increments, the GDC loop restores positional equivalence. The curve mean-reverts because the restoring force is structural, not periodic.

What the experiment proves. If Condition B produces a bounded, mean-reverting curve while Condition A produces unbounded drift, the class distinction is empirically confirmed. The difference is not a better algorithm. The difference is a different class of substrate — one where the arrangement is the answer, and the answer continuously holds.

The experiment is falsifiable. If Condition B produces the same unbounded drift as Condition A, the autocoincident anchor does not change the class of the process. If it produces a bounded, mean-reverting curve, the class distinction is confirmed at the empirical level. The invitation is open.

📉🔬📐🧪 D → E ⚖️

⚖️What We Know and What We Do Not

Epistemic transparency.

What is proved (by physics, by construction, by experiment). Detached-record systems cannot anchor their records to their events — by construction, this is how memory was built. Erasure costs energy the universe tracks (Landauer, 1961; experimentally confirmed). A Turing-complete system cannot decide properties of its own computation (Rice, 1953; Turing, 1936). Semantic drift compounds in long-horizon agentic systems (Agent Drift, Jan 2026; HORIZON, Apr 2026). Periodic re-grounding is the only known mechanism that bounds the error (Martingale analysis, Feb 2026).

What is strongly argued but not yet formally proved. The autocoincident class is structurally immune to the drift that defines the detached-record class — argued from the class distinction; the experiment would confirm or refute. The compositional address function produces mean-reverting dynamics rather than unbounded drift — predicted from the geometry; the experiment would measure the curve.

What cannot be proved (and we do not claim). The specific implementation in US 19/637,714 is not necessarily the only possible autocoincident anchor — other implementations may exist; the patent covers the signal pattern, not the specific arithmetic. Autocoincidence does not solve alignment generally — it anchors one property (role continuity); everything above the anchor remains in the detached-record class.

What the class distinction implies for the simulation hypothesis. If the universe were a detached-record system — a simulation running on a von Neumann architecture — then the records of physical events would be separable from the events themselves. Bits could be overwritten without displacement. The simulation could edit history without leaving a trace. But physics does not work this way. Classical exclusion (two objects cannot occupy the same coordinate), Landauer's principle (erasure costs energy), and the thermodynamic arrow (entropy increases) all enforce autocoincidence at the physical layer. The universe's own substrate does not permit the gap that defines the detached-record class. A simulation that perfectly reproduced these constraints would not be a simulation — it would be a physical system with autocoincident properties. The distinction between "simulated" and "real" collapses at the point where the substrate enforces the same physics. The class distinction does not prove the universe is not a simulation. It proves that the question is structurally empty: any substrate that enforces classical exclusion and Landauer's principle IS autocoincident, regardless of what runs it. The question is not "is it real?" The question is "does the substrate enforce the gap or not?" Physics enforces it. That is the measurement.

The data from three independent labs is consistent with the prediction. The experiment measures what the data does not yet contain: the shape of the autocoincident curve.

📉🔬📐🧪⚖️ E → tesseract.nu 🎯

Three papers. Four months. One curve. The industry is measuring the detached-record class and calling the results "scaling limits." The class distinction predicts the curve. The experiment measures whether the anchor changes it.

US 19/637,714 — 36 claims — Track One. Filed April 2, 2026.
The class distinction: The Autocoincidence Theorem. The mechanism: The Reach Is the Verification.

Ready for your "Oh" moment?