The radio is a free install. npx thetacog-mcp attest-demo
decidable AI verification · semantic, on silicon

Are you out of yourp i x e l?

You're using a chatbot with the object permanence of a goldfish to govern an autonomous agent that holds your balance sheet. There is now a decidable, hardware-signed alternative. It's free. We do something narrower, and far strongerand you never have to trust us to believe it. That's the entire point.

↑ a verdict being walked across the 144×144 lattice. it does this the same way every single time. your eval could never.

Show me the radio ↓
the part you can't un-read

How do you know the radio exists?

Because for the first time in computing history, semantics run on the chip — north of 6 million times a second, on silicon, with no model in the loop. Not all semantics: the decidable kind — the part where meaning has an address (where-is-what), so resolving it halts at chip speed instead of regressing forever. The spec isn't a string we checksum — it's compiled into the same vocabulary the work speaks (named coordinates on a curated lexicon), so "did the work drift from what was asked" collapses to a distance between two positions on one lattice, not a verdict a model renders. The work product of one function lands at an exact coordinate on a 144×144 lattice, and whether it drifted from the coordinate the spec authorized is a recomputable, signed physical event — byte-for-byte identical on your machine and ours — not a second chatbot's opinion. "Does this output sit where the spec said it should?" used to be answerable only by another fallible model. Now it's measured in hardware. Semantic-on-chip wasn't a better tool. It did not exist — and the instant it does, every eval that flips between runs and signs nothing is, by law, below the standard of care.

blurry
LLM eval · "looks good 👍"
crisp lattice
the oracle · 144×144, exact
the objection a sharp reviewer makes in thirty seconds

"Wait — isn't 'does it mean what the spec asked' the exact non-trivial semantic property Rice's theorem says is undecidable? You can't wield Rice to bury the eval and then sell the thing Rice forbids."

  • Right question — it's the one that kills every imitator. We do run semantics on silicon, millions of times a second — just not all semantics. Rice forbids deciding a semantic property of a program's behavior across its infinite input space; we never touch that. We compute the decidable part: place two fixed, finite artifacts — this submitted work, this published spec — against a compiled lattice. A property of fixed inputs is finite and decidable; the theorem has nothing to say about it. And the comparison needs no interpreter, because the spec is semantic and shares the work's vocabulary: both the spec (from its prose) and the work (its comments + identifier-words, prose extracted from code) are projected onto the same 144 named coordinates by the same compression witness (gzip-NCD), then XOR'd. Drift is the gap between two positions in one shared vocabulary — a difference you compute, never a meaning anyone has to judge. (Rice is the bouncer for an infinite nightclub. We're checking two IDs at the door of a room that seats exactly 144 — and both IDs are printed in the same alphabet.)
  • The mathematical reason it can't reach us: ShortRank lives below the Turing line. Rice's theorem only bites on Turing-complete computation — arbitrary programs, unbounded tape, the halting problem in play. The patent's compositional rank-based address function is not that. It's a finite, total, terminating operation: a fixed 144×144 lattice and a deterministic acyclic walk that halts by construction — bounded ply, contained blast radius, no unbounded loop, no halting question. Below Turing-completeness every property of the computation is decidable; undecidability simply evaporates. We didn't out-argue Rice — we chose a substrate it was never about, and that sub-Turing substrate is the claim. (The undecidable monster needs an infinite playground. We handed it a 144×144 sandbox with a fence and a bedtime.)
  • So we don't exempt ourselves from Rice — we stand outside its domain. We never claim to decide whether your output is good (that is undecidable; it stays with you and your underwriter). We decide only where the work landed and whether that's inside the lane the spec authorized — and we sign the bytes.
  • Meaning gets a decidable placement; worth stays a human judgment. Same line we draw out loud at the honest fence below — no overclaim, nothing hidden. No tool that calls a second LLM to grade the first can say it.
  • And the move is filed. The compositional rank-based address function that turns meaning into a recomputable coordinate is the claimed mechanism of Patent US 19/637,714 — 36 claims, Track One examination, filed April 2, 2026. The sidestep isn't a clever framing you can copy in a weekend; it's prosecuted IP. Playing dumb about the distinction doesn't make it go away — it means you read the claims late.

↓ not a metaphor — every commit tolerance panel below is a real commit from this repo, each a decidable semantic placement that recomputes byte-for-byte. that's the six-million-a-second, made visible.

Yesterday, not knowing was an excuse.
You just read this. Welcome to the information hazard.

we're genuinely a little sorry. you are now fully conscious that your current tech stack is uninsurable.

the current standard of care (lol)

How the whole industry verifies AI today

It asks a second chatbot if the first one did okay.

Same input, three runs: PASS · FAIL · PASS. That's not a control framework. It's a magic 8-ball with a system prompt — and Brenda from Risk Advisory bills $400/hr to shake it for you. Stochastic governance, dressed up in a slide deck with your logo on it. And you're the one who signs it.

how do we know it actually flips?
  • npx thetacog-mcp prove-rice sweeps a payload across the lane boundary: the LLM judge flips on 3 of 6 payloads (no two runs agree near the edge); the on-chip Oracle is 5/5 byte-identical at every step.
  • Source: scripts/pmu/prove-rice.mjs — run it, watch it happen.
Tolerance panel for a clean commit: an empty 144-lattice with only the magenta crosshair locating the pixel.
clean commit · nothing lights, the pixel still located
Tolerance panel in-lane: green and amber cells scattered across the lattice, intent and reality agreeing.
in-lane · intent and reality agree
Tolerance panel with drift: a red cluster in the lower band, located by the crosshair.
drift · caught and located, not a coin flip

Three real commit receipts from this repo. Same instrument, three readings — and every one is byte-identical if you recompute it. That's the opposite of a magic 8-ball you pay $400/hr to shake.

it's not a tuning problem

Your eval isn't broken.
It's mathematically impossible.

Rice's theorem: software cannot decide a non-trivial property of software without infinite recursion. Who checks the checker? Another checker. Forever. It's turtles, and you're paying GPU rent on every turtle. You can't fine-tune your way out of a theorem — you change layers, to one that isn't software grading its own homework.

We don't beat Rice. Nobody beats Rice.

We do the one thing the theorem never forbade: move the question off software entirely — onto silicon, where a stranger recomputes the same answer to the same bytes. The theorem still stands, untouched — it's just standing on the incumbent's neck now, not ours. An eval is software judging software's behavior over every possible input (the undecidable thing). Ours is a physical layer underneath both, asking only "does this finished work sit where this spec authorized?" — a finite comparison of two fixed artifacts, not an infinite regress. Same theorem. Opposite side of it.

how do we know it's the theorem, not just our bad day?
  • One command shows both halves: the LLM judge drifts (software watching software), the on-chip walk holds (a layer beneath it). prove-rice.mjs.
  • The walk is the real recursive on-chip lattice walk — guarded against any analytic shortcut by tests/pmu-simulator/competence-walk-is-real.test.mjs.
who defines the definer? — the regress we end, instead of defer

Ask a model what a word means and it hands you more words. Ask what those mean — more words. To check a checker you need a checker; to ground a symbol you need another symbol already grounded. Software has no floor. That is Rice's staircase seen from below: not "who checks the checker" but "who defines the definer" — and the honest answer, in language, is no one, ever.

A flat tool — cosine, embedding, an LLM judge — pretends to end the regress and only defers it: it grounds your text in another pile of ungrounded reference text and calls the nearest one a match. The staircase is still there. It just moved into whatever dictionary the tool was trained on.

We end it by refusing to define with words at all. On the ShortLex lattice a concept's meaning is its addresswhere it sits is what it is. And the definer-of-definer chain becomes a thing you can watch run: stand on a lit coordinate, walk its row, jump to the column that row lights, walk that row, recurse — each step resolving a meaning by the meaning it is defined in terms of. The staircase is real; the difference is this one has a bottom. The lattice is finite and acyclic, so the walk halts — and it halts on silicon, north of six million times a second. The regress with no floor in language has a floor in physics, and that floor is the placement. docs/architecture/ballistic-walk-the-definer-of-definer.md

the divergent series — why the precision has no ceiling (and the coverage does)

Each independent walk asks one small question — did the real work land where the intent did, more than chance would explain? — and answers with a number: σ, the separation of true alignment from noise. One walk is a whisper. But the walks are independent, so their evidence adds, and the sum is a divergent series — it climbs with no architectural ceiling (the patent's own phrase). Add walks, sharpen the lanes, and the certainty does not approach a limit; it keeps going — past 10σ, past 100, toward the 600σ a physicist would call decided.

Now the honest half that makes it real instead of magic: a divergent series buys unbounded precision, never unbounded coverage. We become almost violently certain about the lanes we have carved — and we know nothing about the territory we have not. Perfect meaning-is-position (ρ→1) stays out of reach; we close the gap inside the covered lanes, never over all of meaning. Infinite sharpness on a finite map. That is not the hedge — it is the reason the map is worth insuring: a claim with an edge you can see is the only kind an underwriter can price. scripts/pmu/walk-significance.mjs

the t.j. hooper · 1932 · this is the whole legal play

The standard is not care.
It's what's available.

Two tugboats lost their barges because they carried no radio to hear the storm warning. The whole industry carried no radios — so they pleaded custom. Judge Learned Hand was unmoved: "a whole calling may have unduly lagged in the adoption of new and available devices… there are precautions so imperative that even their universal disregard will not excuse their omission."

An LLM eval is the tugboat with no radio. The radio is now on the shelf. And you've seen the shelf.

the available device

Here's the actual radio

gzip bridge → 144×144 lattice → ballistic PMU walk → ed25519 receipt

intent / reality / delta-xor triptych

INTENT (what you commanded) · REALITY (what the agent did) · Δ (exactly where they disagree). this exact triptych rides on every commit we make. it's not a render — it's the receipt.

A spec is published. Work is submitted, signed. The gate runs on the chip and signs a verdict binding what was asked, what was delivered, and who delivered it. Either side recomputes it. No blockchain. No clearinghouse. No second model asked to bless the first.

how do we know it's shipped code, not a pitch deck?
  • It drives the real Rust runner at .thetacog/pmu/target/release/pmu-onchip — the same daemon digest is stamped into every receipt.
  • Run the whole transaction in one command: npx thetacog-mcp attest-demo — Node A signs the spec, Node B signs the work, the chip gates it, an underwriter prices it, and an LLM judge (any onboard CLI on your PATH — Claude Code, Cursor, codex, ollama, …) renders a verdict that signs nothing a stranger can recompute. Measured 2026-06-19: a small model (llama3.2:1b) flips on the same spec; a large one (current Claude) holds — the flip is a class of error that rides on model capability, and the chip removes the whole class. (Or drive each pillar by hand: attest publish-reef / submit / gate / verify · scripts/pmu/attest-demo.mjs.)
read a real one · green is in-lane, red is out of your pixel

It's not a bullseye.
It's a tolerance field.

A bullseye is one shot you hit or miss. This is the other thing: every one of the 144×144 cells is a competence coordinate, lit by where a commit's reality actually fired against where its intent said it should. You don't score a point — you get a picture of where you are as what you meant. Green fired in-lane · amber a few zones out · red too far to absorb. The magenta crosshair is your one located pixel.

a commit whose work landed in-lane — green and amber, no red gathering
in your pixel · green + amber, no red
a commit with reality firing where intent was weak — red gathering in the corner
out of your pixel · red in the corner

both rode on real commits in this repo — the filename is the sha. left: the work fired where the intent declared it. right: the red gathering in the corner is reality firing where the intent was weak — drift you can see before it ships, not in a post-mortem after.

This is what arrives on every commit we make — not a dashboard someone curated, an artifact the chip emitted. And when a commit is tight and nothing fires out of lane, the field is silent — near-black, just the crosshair. An instrument that stays quiet when nothing happened is the only one you trust when it finally lights up.

ten orthogonal proofs · knock one down, nine stand

Every one runs on your laptop

The strength isn't one heroic claim — it's ten independent ones. Tap any to see exactly how the Rust runner proves it. We'll wait.

01 · DETERMINISM
Same input, same answer, forever
5 gate runs on one payload → 1 verdict, 1 σ = 18.254248, identical to 6 decimals. The variance that defines an eval is the defect Hand wouldn't excuse. prove-rice.mjs STRONG
02 · THE WELD
Chip = cloud, byte for byte
Max |Δweight| = exactly 0 across 1000+ cells over 144 anchors, on two random seeds. Not "within tolerance." Zero. chip-cloud-lattice-golden.test.mjs:64 STRONG
03 · FORGERY
Seven attacks, zero accepted
Tamper-body, graft-sig, foreign-key forge, missing-sig, nibble-flip, null — all rejected under host-key pinning. Flip one field → caught twice (seal breaks AND re-walk disagrees). forge-test.test.mjs:51 STRONG*
04 · RECOMPUTABLE
A stranger reproduces it
Re-run the gate on your own hardware → same verdict, same σ, trusting nothing about us. The only vendor whose pitch is "please, recompute us." npx thetacog-mcp attest verify STRONG
05 · RICE
The incumbent is impossible
Software can't verify software without infinite recursion — but that's Turing-complete software. ShortRank lives below the Turing line (finite lattice, walk halts by construction), so Rice never RSVPs. We show the judge drift and the silicon hold, side by side, in one run. prove-rice.mjs STRONG
06 · MEANING = PHYSICS
S ≡ P ≡ H
Physical distance on the lattice tracks semantic distance at ρ = 0.7671 (clears the 0.70 target). We quote the real number, not the unreachable ideal of 1.0. data/pmu/reef/reef-144.json STRONG
07 · ON SILICON
Signal measured, not modeled
Signal-to-noise on the real Rust daemon sits at ~10.6σ panel / 14.3σ product — deflated honestly for fragment correlation, not inflated by a naive √N. .thetacog/sigma-panel-trend.ndjson MEASURED
08 · SELF-SHARPENING
It improves overnight
An unattended loop proposed, measured, and accepted 7 orthogonalization interventions — and rejected the ones that didn't earn it. A measurement that sharpens itself. .thetacog/reef-interventions.ndjson MEASURED
09 · LEGIBLE
A non-engineer can read it
Spec stated in plain English AND the glossed lattice (🏛️ A · Strategy — long-term direction · ⚖️ A1 · Strategy.Law …). A standard the risk officer can't read isn't one they can be held to. attest publish-reef STRONG
10 · THE TRANSACTION
Two parties who don't trust each other
One publishes a spec; the other submits work signed with their own key; the gate signs a verdict both recompute. The whole product — and it sells today, no future market required. scripts/pmu/attest.mjs STRONG

* T1 holds under host-key pinning; the hardware-attested key (Secure Enclave / tape-out) is what makes pinning unforgeable. We state the bound out loud.

tomorrow morning, in a deposition

"What was your verification protocol?"

"We deployed a secondary AI to monitor the primary AI.
Then we asked a third one to summarize their argument."

Your agent just made a $10M unrecoverable error. Plaintiff's counsel is holding a physical memory receipt; you are holding a probabilistic text output — and the only other thing on the defense table is a goldfish bowl. There is no second sentence.

the honest fence (yes, we brag about our limits)

What we don't claim

Every competitor's deck hides its bounds. We lead with ours — because the fence is what makes the signal underwritable.

the four lines we will not cross
  • We decide placement, not worth. We run the decidable semantics — where your work landed against the spec, signed on the chip, recomputable by anyone. We do not run the undecidable kind — whether the outcome was good in the world. That call stays with you and your underwriter (exactly as §2 says — no contradiction; we just liked it enough to say it twice). For code and artifacts the payload is the deliverable, so placement-in-lane is "spec met." Where the chip never saw the world, we don't pretend it did — that's the one thing the goldfish and we have in common.
  • We measure WHERE it drifted, not WHETHER the meaning held. On the shared vocabulary the chip locates exactly where work and spec diverge — it does not certify that a clever paraphrase preserved the intent. We measured the bound and we'll say the number out loud: a meaning-preserving synonym and a domain-breaking foreign term register as nearly the same-size change (ratio 0.94). The grounding is honest, not magic — physical distance tracks semantic distance at ρ = 0.767, not the unreachable 1.0. WHERE is decidable and ours; WHETHER-the-meaning-survived-a-rewrite is judgment, and we don't fake it.
  • We don't sell you a price yet. A price without calibration is a guess, and a guess sold as a fact is the thing that broke 2008. We earn calibration by running attestations first.
  • We don't quote a throughput number we haven't benchmarked. The walk is fast; you'll get the bench, not a billboard.
recompute the truth · two ways to see it

Don't take our word.
Take the exit code.

Way 1 — run it yourself. Install nothing; it runs straight from npm.

# the chip half — decidable, reproduces byte-for-byte:
$ npx thetacog-mcp prove-rice --check # ← exit 0 = verdict + σ reproduced, identical. a stranger re-runs it offline, same bytes.
$ npx thetacog-mcp attest-demo # the Node A↔B transaction: spec→signed work→on-chip gate→priced. an LLM asked the same thing signs nothing recomputable; the chip holds, byte-for-byte.
$ npx thetacog-mcp hooper # the 7/7 forgery ledger, live
$ npx thetacog-mcp attest verify … # reproduce a verdict, trusting no one

Way 2 — let the AI you already trust be the witness.

# drop it into Claude Code, Cursor, Aider — any LLM CLI (Claude's easiest):
$ npx thetacog-mcp

Your agent reads the install output into its own context. It tries to confirm the placement by reasoning — and catches itself, because Rice applies to it too (it's a chatbot; it can't grade this any more than the eval-chatbots could). So it does the only honest thing left: it recomputes the receipt (prove-rice --check → exit 0), then turns to you and explains, in its own words, that you're standing in a T.J. Hooper moment — the radio is on the shelf, and you've now seen it. The defense's own expert just took the stand for the plaintiff. The goldfish filed an amicus brief against the goldfish bowl. We didn't argue it into agreeing — the math did, on your machine, in the tool you already use. The skeptic you brought becomes the witness.

Run it on your own outputs. If your AI governance can't survive a stranger — or your own agent — re-running the math on their laptop, it was never governance. It was a vibe with a logo.

Get the package → Read it again ↑

so… are you out of your pixel? (it's okay. yesterday, everyone was.)