You're using a chatbot with the object permanence of a goldfish to govern an autonomous agent that holds your balance sheet. There is now a decidable, hardware-signed alternative. It's free. We do something narrower, and far stronger — and you never have to trust us to believe it. That's the entire point.
↑ a verdict being walked across the 144×144 lattice. it does this the same way every single time. your eval could never.
Show me the radio ↓Because for the first time in computing history, semantics run on the chip — north of 6 million times a second, on silicon, with no model in the loop. Not all semantics: the decidable kind — the part where meaning has an address (where-is-what), so resolving it halts at chip speed instead of regressing forever. The spec isn't a string we checksum — it's compiled into the same vocabulary the work speaks (named coordinates on a curated lexicon), so "did the work drift from what was asked" collapses to a distance between two positions on one lattice, not a verdict a model renders. The work product of one function lands at an exact coordinate on a 144×144 lattice, and whether it drifted from the coordinate the spec authorized is a recomputable, signed physical event — byte-for-byte identical on your machine and ours — not a second chatbot's opinion. "Does this output sit where the spec said it should?" used to be answerable only by another fallible model. Now it's measured in hardware. Semantic-on-chip wasn't a better tool. It did not exist — and the instant it does, every eval that flips between runs and signs nothing is, by law, below the standard of care.


"Wait — isn't 'does it mean what the spec asked' the exact non-trivial semantic property Rice's theorem says is undecidable? You can't wield Rice to bury the eval and then sell the thing Rice forbids."
↓ not a metaphor — every commit tolerance panel below is a real commit from this repo, each a decidable semantic placement that recomputes byte-for-byte. that's the six-million-a-second, made visible.
Yesterday, not knowing was an excuse.
You just read this. Welcome to the information hazard.
we're genuinely a little sorry. you are now fully conscious that your current tech stack is uninsurable.
It asks a second chatbot if the first one did okay.
Same input, three runs: PASS · FAIL · PASS. That's not a control framework. It's a magic 8-ball with a system prompt — and Brenda from Risk Advisory bills $400/hr to shake it for you. Stochastic governance, dressed up in a slide deck with your logo on it. And you're the one who signs it.
npx thetacog-mcp prove-rice sweeps a payload across the lane boundary: the LLM judge flips on 3 of 6 payloads (no two runs agree near the edge); the on-chip Oracle is 5/5 byte-identical at every step.scripts/pmu/prove-rice.mjs — run it, watch it happen.


Three real commit receipts from this repo. Same instrument, three readings — and every one is byte-identical if you recompute it. That's the opposite of a magic 8-ball you pay $400/hr to shake.
Rice's theorem: software cannot decide a non-trivial property of software without infinite recursion. Who checks the checker? Another checker. Forever. It's turtles, and you're paying GPU rent on every turtle. You can't fine-tune your way out of a theorem — you change layers, to one that isn't software grading its own homework.
We don't beat Rice. Nobody beats Rice.
We do the one thing the theorem never forbade: move the question off software entirely — onto silicon, where a stranger recomputes the same answer to the same bytes. The theorem still stands, untouched — it's just standing on the incumbent's neck now, not ours. An eval is software judging software's behavior over every possible input (the undecidable thing). Ours is a physical layer underneath both, asking only "does this finished work sit where this spec authorized?" — a finite comparison of two fixed artifacts, not an infinite regress. Same theorem. Opposite side of it.
prove-rice.mjs.tests/pmu-simulator/competence-walk-is-real.test.mjs.Ask a model what a word means and it hands you more words. Ask what those mean — more words. To check a checker you need a checker; to ground a symbol you need another symbol already grounded. Software has no floor. That is Rice's staircase seen from below: not "who checks the checker" but "who defines the definer" — and the honest answer, in language, is no one, ever.
A flat tool — cosine, embedding, an LLM judge — pretends to end the regress and only defers it: it grounds your text in another pile of ungrounded reference text and calls the nearest one a match. The staircase is still there. It just moved into whatever dictionary the tool was trained on.
We end it by refusing to define with words at all. On the ShortLex lattice a concept's meaning is its address — where it sits is what it is. And the definer-of-definer chain becomes a thing you can watch run: stand on a lit coordinate, walk its row, jump to the column that row lights, walk that row, recurse — each step resolving a meaning by the meaning it is defined in terms of. The staircase is real; the difference is this one has a bottom. The lattice is finite and acyclic, so the walk halts — and it halts on silicon, north of six million times a second. The regress with no floor in language has a floor in physics, and that floor is the placement. docs/architecture/ballistic-walk-the-definer-of-definer.md
Each independent walk asks one small question — did the real work land where the intent did, more than chance would explain? — and answers with a number: σ, the separation of true alignment from noise. One walk is a whisper. But the walks are independent, so their evidence adds, and the sum is a divergent series — it climbs with no architectural ceiling (the patent's own phrase). Add walks, sharpen the lanes, and the certainty does not approach a limit; it keeps going — past 10σ, past 100, toward the 600σ a physicist would call decided.
Now the honest half that makes it real instead of magic: a divergent series buys unbounded precision, never unbounded coverage. We become almost violently certain about the lanes we have carved — and we know nothing about the territory we have not. Perfect meaning-is-position (ρ→1) stays out of reach; we close the gap inside the covered lanes, never over all of meaning. Infinite sharpness on a finite map. That is not the hedge — it is the reason the map is worth insuring: a claim with an edge you can see is the only kind an underwriter can price. scripts/pmu/walk-significance.mjs
Two tugboats lost their barges because they carried no radio to hear the storm warning. The whole industry carried no radios — so they pleaded custom. Judge Learned Hand was unmoved: "a whole calling may have unduly lagged in the adoption of new and available devices… there are precautions so imperative that even their universal disregard will not excuse their omission."
An LLM eval is the tugboat with no radio. The radio is now on the shelf. And you've seen the shelf.
gzip bridge → 144×144 lattice → ballistic PMU walk → ed25519 receipt

INTENT (what you commanded) · REALITY (what the agent did) · Δ (exactly where they disagree). this exact triptych rides on every commit we make. it's not a render — it's the receipt.
A spec is published. Work is submitted, signed. The gate runs on the chip and signs a verdict binding what was asked, what was delivered, and who delivered it. Either side recomputes it. No blockchain. No clearinghouse. No second model asked to bless the first.
.thetacog/pmu/target/release/pmu-onchip — the same daemon digest is stamped into every receipt.npx thetacog-mcp attest-demo — Node A signs the spec, Node B signs the work, the chip gates it, an underwriter prices it, and an LLM judge (any onboard CLI on your PATH — Claude Code, Cursor, codex, ollama, …) renders a verdict that signs nothing a stranger can recompute. Measured 2026-06-19: a small model (llama3.2:1b) flips on the same spec; a large one (current Claude) holds — the flip is a class of error that rides on model capability, and the chip removes the whole class. (Or drive each pillar by hand: attest publish-reef / submit / gate / verify · scripts/pmu/attest-demo.mjs.)A bullseye is one shot you hit or miss. This is the other thing: every one of the 144×144 cells is a competence coordinate, lit by where a commit's reality actually fired against where its intent said it should. You don't score a point — you get a picture of where you are as what you meant. Green fired in-lane · amber a few zones out · red too far to absorb. The magenta crosshair is your one located pixel.


both rode on real commits in this repo — the filename is the sha. left: the work fired where the intent declared it. right: the red gathering in the corner is reality firing where the intent was weak — drift you can see before it ships, not in a post-mortem after.
This is what arrives on every commit we make — not a dashboard someone curated, an artifact the chip emitted. And when a commit is tight and nothing fires out of lane, the field is silent — near-black, just the crosshair. An instrument that stays quiet when nothing happened is the only one you trust when it finally lights up.
The strength isn't one heroic claim — it's ten independent ones. Tap any to see exactly how the Rust runner proves it. We'll wait.
prove-rice.mjs STRONGchip-cloud-lattice-golden.test.mjs:64 STRONGforge-test.test.mjs:51 STRONG*npx thetacog-mcp attest verify STRONGprove-rice.mjs STRONGdata/pmu/reef/reef-144.json STRONG.thetacog/sigma-panel-trend.ndjson MEASURED.thetacog/reef-interventions.ndjson MEASUREDattest publish-reef STRONGscripts/pmu/attest.mjs STRONG* T1 holds under host-key pinning; the hardware-attested key (Secure Enclave / tape-out) is what makes pinning unforgeable. We state the bound out loud.
"We deployed a secondary AI to monitor the primary AI.
Then we asked a third one to summarize their argument."
Your agent just made a $10M unrecoverable error. Plaintiff's counsel is holding a physical memory receipt; you are holding a probabilistic text output — and the only other thing on the defense table is a goldfish bowl. There is no second sentence.
Every competitor's deck hides its bounds. We lead with ours — because the fence is what makes the signal underwritable.
Way 1 — run it yourself. Install nothing; it runs straight from npm.
Way 2 — let the AI you already trust be the witness.
Your agent reads the install output into its own context. It tries to confirm the placement by reasoning — and catches itself, because Rice applies to it too (it's a chatbot; it can't grade this any more than the eval-chatbots could). So it does the only honest thing left: it recomputes the receipt (prove-rice --check → exit 0), then turns to you and explains, in its own words, that you're standing in a T.J. Hooper moment — the radio is on the shelf, and you've now seen it. The defense's own expert just took the stand for the plaintiff. The goldfish filed an amicus brief against the goldfish bowl. We didn't argue it into agreeing — the math did, on your machine, in the tool you already use. The skeptic you brought becomes the witness.
Run it on your own outputs. If your AI governance can't survive a stranger — or your own agent — re-running the math on their laptop, it was never governance. It was a vibe with a logo.
Get the package → Read it again ↑so… are you out of your pixel? (it's okay. yesterday, everyone was.)