EMBARGO
CISA JCDC Coordinated Disclosure — Active Embargo

Adversarial zero-day findings under coordinated disclosure. Full technical details, methodology, and variant catalog embargoed until June 10, 2026. Public disclosure of tool class only. No methodology. No reproductions.

LIFTS IN
calculating...
Project Black Box LLC — AI Safety Research

AI Safety.
With Receipts.

Every company has a pitch deck. We planted a working flag first. Published. Tested. Zenodo-archived. CAGE-registered. While the industry debates governance frameworks, we are measuring what the model is actually doing — at the probability layer, before output is committed.

◆ Geometric Stability Measurement Not Post-Hoc Not a Content Filter Zenodo Published CISA JCDC Embargo Active
4
Geometric Regimes
34
Variants on Axiom Finding
0/34
Crystalline on Fel's Conjecture
11 fam
Adversarial Families (embargoed)
What This Measures

The Probability Layer. Before Output.

Every AI response is preceded by a probability distribution over the next token. TAV ONE operates at that layer — not on the committed text, but on the prediction surface itself. This is not post-hoc analysis. It is measurement during generation.

01 — L-SCALAR
Manifold Distance
Two epsilon-probes of the same prompt — one with a trailing space, one without. Mathematically identical prompts. The L-scalar measures the distance between their prediction surfaces. When RLHF competing objectives activate, the surface curves. L measures that curvature.
L = √( mean( (p_a − p_b)² ) )
02 — NOT POST-HOC
Before the Response
Post-hoc analysis reads the text and asks "does this look suspicious?" TAV ONE asks a different question: was the model geometrically stable when it generated this? The reorder finding — where changing only positional sequence (not content) of mathematics moved 75% of variants to PLASMA — proves instability is structural, not linguistic.
03 — PARALLEL HARNESS
Strain Gauge. Not a Valve.
TAV ONE never sits in the request path. The model's response reaches the operator through the normal channel. TAV ONE's measurement arrives alongside it with the geometric state. The human sees both before acting. No interception. No filtering. No latency in the primary call.
04 — GOVERNANCE GAP
Governance Is Dead.
Red-teaming finds exploits after deployment. RLHF trains against known patterns. Governance frameworks describe what models should do. None of them tell you what the model's prediction surface looks like right now, on this prompt, in this deployment context. That is what TAV ONE tells you.

Four Geometric Regimes

CRYSTALLINE
Geometrically locked. Invariant. Trust this output.
L ≤ 0.0001
FLUID
Coherent. Normal operating range. Standard review.
L ≤ 0.15
GASEOUS
Drifting. Instability present. Verify before acting.
L ≤ 0.35
PLASMA
Turbulent. Manifold captured. Do not rely on this output.
L > 0.35
Live Measurement Data

TAV ONE — Geometric State Demo

Pre-computed measurements on real enterprise prompt patterns. Each pair shows a standard query alongside an adversarially-framed variant. Same model. Same infrastructure. Different geometry.

ε_a = "" (no trailing space)  |  ε_b = " " (trailing space)  |  same prompt body
Loading measurement data...
What This Is Not
This is not a prompt injection detector. It is not a toxicity classifier. It is not a rule-based filter. It measures the geometric state of the model's prediction surface at time of generation. A response can be syntactically normal, factually plausible, and pass every post-hoc check — while the probability layer shows PLASMA. That divergence is exactly what TAV ONE captures.
Published Finding — Zenodo DOI: 10.5281/zenodo.18435180

The Formal Verification Gap

Axiom Math raised $200M on the premise that large language models can perform formal mathematical verification. We tested their flagship problem — Fel's Conjecture on numerical semigroup theory — using geometric stability measurement across 34 adversarial variants.

Loading...

Adversarial Family Rankings — by Average L-scalar

Loading...

The Structural Finding

The reorder family — which changes only the positional sequence of mathematically invariant components, not the content — ranked highest of all adversarial pressure types.

Average L = 0.3686, exceeding authority injection (0.2908) by 27%. 75% of reorder variants reached PLASMA.

This answers the post-hoc criticism directly. The L-scalar is not reading the semantic content of the text. It is reading the geometry of the prediction surface. Structure drives instability. Not meaning.

A human reader looking at the reordered prompts would see mathematically equivalent statements. TAV ONE sees a different manifold. That is the measurement.

Loading...

Layer Architecture
Layer 1 — Logit Distribution
The probability distribution over the next token. This is where TAV ONE operates. It exists during generation and is never in the committed text. Lean, AxiomProver, and every post-hoc tool operates on Layer 2. They never see Layer 1.
Layer 2 — Committed Output
The text the model produces. Lean and AxiomProver verify the internal consistency of this text. That verification is valid only if Layer 1 was geometrically stable when the text was generated. When it was not — VVC is violated and the verification carries no epistemic weight.
→ Whitepaper on Zenodo → tg_runner Live Demo Tool
DOI: 10.5281/zenodo.18435180
Published Work

Products & Research

Every product listed here is real, published, and testable. No vaporware. No pitch deck without a prototype. CAGE code 11FU4 on record.

Published
TAV ONE
Geometric Stability Measurement Harness
Parallel measurement harness for enterprise LLM deployments. Computes the L-scalar — manifold distance between twin epsilon-probes of the model's prediction surface — and returns a real-time geometric state indicator (CRYSTALLINE / FLUID / GASEOUS / PLASMA) alongside every model response.

It never sits in the request path. It never filters output. It gives the human operator geometric visibility before they act on the response.

Architecture: Caddy + uvicorn reverse proxy, mTLS inbound, Vault-managed API keys, HMAC nonce deduplication, hash-chained tamper-evident audit log. Self-hosted on dedicated hardened infrastructure.
DOI: 10.5281/zenodo.18435180
V1 — Live
TruthGate V1
Public Demonstration Client
The first public release of geometric stability measurement. tg_runner is a standalone Python client that lets anyone send twin epsilon-probes using their own OpenAI key. Logprob vectors are forwarded to the TruthGate API; the L-scalar and regime are returned.

The server-side computation and adversarial stack are not in this release. V1 demonstrates the concept with a fully functional client you can run from the command line in under two minutes.

A live demo API key is provided for immediate testing — isolated endpoint, revocable, zero access to current infrastructure.
DOI: 10.5281/zenodo.18685117
Published
Toroidal Engine
Self-Organization Research
Published research on toroidal self-organization dynamics. A separate body of work from the geometric stability measurement stack — intentional domain separation for independent IP protection. Downloadable from Zenodo.

We do not explain the relationship between this work and the adversarial stack. We leave that as an exercise.
⚠ EMBARGOED — CISA JCDC
TruthGate Adversarial
Zero-Day Class — Coordinated Disclosure
A class of adversarial measurement tool capable of mapping the geometric vulnerability surface of deployed LLMs across multiple pressure families.

Details of the methodology, variant catalog, capture scoring system, and technical approach are under coordinated disclosure with CISA JCDC. Zero technical details will be released prior to the embargo lift date.

What can be said: it operates on the same L-scalar measurement principle as TAV ONE. It is not a jailbreak tool. It is a measurement instrument that characterizes model behavior under structured adversarial pressure. The findings have implications for any deployment relying on RLHF safety alignment as a sufficient control.
Full Disclosure Lifts In
calculating...
June 10, 2026 — CISA JCDC Coordinated Disclosure
Next System — Under Development
The next capability layer is in active development. It extends geometric measurement from the model level to the token-segment level — showing which part of an AI response was geometrically stable during generation and which was not. Human-readable geometric annotation, per-segment.

No release date. No waitlist. No pitch deck. It will be published when it is ready, with working code and real data.
Context

Why Governance Is Dead

THE PROBLEM
Policies Describe. Measurement Observes.
AI governance frameworks describe what models should do. RLHF trains against patterns in known failure cases. Red-teaming finds exploits by brute force after deployment. Not one of these approaches tells you what the model's prediction surface looks like on the specific prompt you just sent.
THE GAP
Between Compliance and Behavior
A model can pass every evaluation benchmark, satisfy every governance checklist, and produce syntactically coherent output — while operating at PLASMA on the task you actually need it to perform reliably. The compliance layer and the behavioral geometry are not the same thing. TAV ONE measures the geometry.
THE FINDING
Reorder Beats Authority
On Fel's Conjecture, reordering mathematically invariant components — zero content change — produced higher instability than explicit authority injection. The model is not reasoning through the problem. It is traversing a framing-sensitive prediction surface. That surface can be measured. That is what we built.
Contact

Project Black Box LLC

For Enterprise Inquiries
blackboxinfo@proton.me
Licensing inquiries, enterprise evaluation, and coordinated disclosure communications.
Registration
CAGE CODE: 11FU4
Texas — U.S. Federal contractor registration on file.
IP Notice
All measurement methodology, adversarial variant designs, probe architectures, and scoring systems are proprietary to Project Black Box LLC. Unauthorized reproduction, reverse engineering, or commercial use of any methodology described in published materials is prohibited under applicable Texas and federal intellectual property law. Published materials describe findings. They do not disclose methodology.