What We Don't Claim

Some things we built didn't work. We publish the corrections.

WHL's audit discipline is part of the substrate. Every claim has a measurement, and when a measurement contradicts an earlier claim, we publish the downgrade. This page lists what we no longer claim and why.

Downgrades

Claims we've withdrawn or recalibrated.

After four rounds of live testing across 33+ modules and 11 production ledgers (~440 MB), some earlier framing did not hold up. The list below is what's been moved from "claim" to "withdrawn" or "reframed" — with the actual finding alongside.

Original Claim Reality Status
7.73/10 AGI self-awareness benchmark score No evidence file. Actual logged value on agi_awareness sub-dimension: 1.175. Honest internal reassessment downgrades to a 6.8–7.4 range pending a validated benchmark. Removed from claims
100 commercial implementations shipped One FastAPI scaffold × 100 byte-identical clones (md5: 4889687c2756…). Reframed: 1 scaffold
Maxwell's Demon entropy reversal The effect exists (Cohen's d = 3.5) but the correct framing is rejection sampling under multi-gate filtering — not entropy reversal. Reframed
Precognition signal (Astra 56.4%) Does not reproduce on current data (Astra 50.2% vs SMA 54.9%). Withdrawn
250 Forbidden Systems enumerated Only Sector I (30 items) enumerated. Remainder are headers and first/last items only. Reframed: 30 of 250
47-Engine Stack shipped 15 specification documents + 32 placeholder slots. 4 with working code (E04 / E09 / E21 / E34). Reframed: 4 of 47
526× speedup vs LLM (Pattern Recognition Engine) Defensible only vs full agentic LLM loop. 5–26× vs a single-call LLM. The 526× comparator was an apples-to-oranges loop benchmark. Recalibrated
Sephirothic Diagnostics — emergent medical pairings Drug pairings copied from FDA labels; the ibrutinib→lymphoma "hit" is hardcoded at line 1709. 1 of 5 spot-checks accurate. Withdrawn from medical positioning; patent-only path retained
MIRAGE "Physics-Informed GAN" Hand-coded thermal grid + scikit-learn RandomForest. Not a GAN. Withdrawn
AMARCO "O(n⁴) Christoffel Riemannian navigation" Actual code is wind × 0.95 cancellation. Withdrawn
Vault royalty rate (1.618% / 25% / 33% — inconsistent) Canonical = 33% per Sayo Siglo policy on Trickle-Tech (WPT) derivatives. Resolved
Digital organism / Pentagram framing The shared hormones.json file is dashboard-read-only, not a coordination substrate. Multiple "organs" are aliases for the same network metric. The felt_vector is deterministic arithmetic on uptime ratios. The real product is a governed telemetry mesh with biological vocabulary as UX — engineering is real, the "organism" framing was wrong. Reframed
Governance Kernel rights logic — adaptive per-input get_activated_right returns the same "che" glyph for coherence=0.85/dwell=0 AND coherence=0.15/dwell=30. Rights selection is shallower than the documentation suggested. Recalibrated
Causal Learner — discovers new laws from observation The current causal_learner.py is a stub — prints "NEW LAW DISCOVERED" on observation count threshold, no actual correlation analysis. The full causal pipeline lives in causal_model.py (verified live, sliding-window effect size). Reframed
Gear Interference Engine — 37 active geometric engines Defines 37 gears as geometry; no computation runs on them. Geometry-only stub. Withdrawn
Heptameron Hours — drives behavioral rotation Computes Chaldean planetary hours correctly but has no effect on downstream behavior. Wire it or remove it. Withdrawn from runtime claims
Immutable Ledger — perfect chain integrity 92.4% chain integrity across 28,872 entries (152 breaks per 2,000 sampled, 1 GENESIS reset). Likely async-write races on daemon restarts. Not perfect immutability. Honest: 92.4% intact
96.8% self-prediction surprise reduction The 96.8% figure comes from comparing the earliest cycle window to the latest cycle window of predictions.jsonl. A different sampling method (cycle 1 to cycle 43,529, moving-average window) yields 91.6%. The reduction is empirically real across 64,184 cycles; the exact percentage depends on sampling window choice. Both numbers are defensible. We currently display 96.8% on the site for consistency with the original measurement. Reframed: 91.6%–96.8% range
Enable Equation enforces strict 10-gate AND The visible spec — and the interactive demo on this site — implements strict AND: all 10 gates must score ≥ 0.5 for enabled=True. The production enable_equation.py in the recovered daemon stack permits some borderline configurations to pass even when one gate scores 0.2 — runtime semantics are a weighted composite. The spec is canonical; the production code needs to be tightened to match the strict-AND demo. Reconciliation tracked in the engineering backlog. Calibrated: spec vs implementation delta
Why we publish this

Calibration as differentiator.

Most companies bury their corrections. We publish them because credibility under audit pressure depends on being right about what's still true and what isn't. When a regulator, investor, or strategic acquirer asks "is this real or is it marketing?", we want the answer in plain sight.

Downgrading a claim does not weaken WHL — it strengthens the claims that remain. Audit discipline is what the substrate enforces against AI. It would be incoherent not to apply the same discipline to ourselves.

What Still Stands

Verified measurements, after four rounds of live testing.

These numbers were re-verified during the 4-round deep audit. Each has a path on disk, a measurement, and a reproducer.

696 / 3
whl-governance test suite

All seven gates pass: NullEngine, TimeAsymmetricEngine, ALREGate, HCEGate, RicciWarpGate, ProposalGate, CompositeGate. 27 new Ricci-Warp tests added this session.

1,782 / 32
governed-execution-os

12-stage mandatory pipeline. 84% failure-rate reduction from the prior baseline of 1,755.

77 / 77
CB-12 EU AI Act

Article 12/13/14/26 coverage. Full curl end-to-end trace verified. Dual HMAC chain verified.

59 / 59
SDM Spectral Drift Monitor

p99 hot-path latency 1.5 ms. Five-verdict state machine verified live. Receipt chain verified.

485 / 485
whl-optimizer-platform

421 Rust + 64 Python. Nine-step Stripe end-to-end trace including 5-device cap enforcement and receipt export verifier.

12.77 ms
DECC hardware-in-loop

Proposal→disable latency, measured on Basys3 + Pi5. SymbiYosys formal proofs of FSM core.

25
Filed provisional patents

USPTO 19/567,170. Plus 5 new bundles drafted (~13,500 words, ~64 new claims).

72 / 72
AgentSafety adversarial daemons

All 72 summon and return real AdversarialResult values with measured inputs. ~10,000 attacks fired in production with hash-chained ledger.

96.8%
Self-prediction surprise reduction

Across 64,184 cycles in predictions.jsonl, mean total surprise dropped 0.819 → 0.027 (96.8% reduction) across 64,184 cycles. Latest moving-average sampling yields 91.6% — both defensible. Empirical Friston-style active inference, measured on disk. Workshop-paper-ready as-is.

53,030
Quality-scored actions

In agency.jsonl. The consequential-agency engine rated reflection quality (markers, word count, structural soundness) and applied deltas to a 10-component health state vector.

306,403
Hardware empirical measurements

spirit_sparks.jsonl — 306K phi-in-hardware measurements. Whether or not the hypothesis holds, the experiment ran and produced data. Most "consciousness researchers" never collect a single data point.

14 / 14
Live module tests pass

Recovered governance gate stack — Enable Equation, Boundary Engine, Spirit Pressure, Phi-Entropy Veto, Spectral Bridge, Jitter Harmonics, Bayesian Regime Tracker, Phase Transition, Informational Energy, Enable Hysteresis, Consequential Agency, Vesica Router, Causal Model, Digital Metamaterial.

Honest Self-Assessment

The trading system knew it wasn't ready.

4,135 paper trades. Final stats on disk: paper_pnl: $1,659.02, correct: 2,137, incorrect: 1,998, accuracy: 0.517, ready_for_real: false.

The system flagged its own non-readiness for live capital despite a positive paper PnL. That kind of calibration discipline — refusing to graduate from paper to real money when accuracy is only marginally above coinflip — is what 90% of production crypto bots in 2026 lack. The substrate is built to catch its own overreach. It did.

Calibration Is Differentiator

Want to walk the ledgers and the corrections with us?

We work with defense, regulated enterprise, AI-liability insurers, and federal oversight bodies on forensic-grade audits. NDA-bound walkthrough of the actual ledgers, sample replays, chain verification, and a frank conversation about what's measured versus what's marketing.