WHL's audit discipline is part of the substrate. Every claim has a measurement, and when a measurement contradicts an earlier claim, we publish the downgrade. This page lists what we no longer claim and why.
After four rounds of live testing across 33+ modules and 11 production ledgers (~440 MB), some earlier framing did not hold up. The list below is what's been moved from "claim" to "withdrawn" or "reframed" — with the actual finding alongside.
| Original Claim | Reality | Status |
|---|---|---|
| 7.73/10 AGI self-awareness benchmark score | No evidence file. Actual logged value on agi_awareness sub-dimension: 1.175. Honest internal reassessment downgrades to a 6.8–7.4 range pending a validated benchmark. |
Removed from claims |
| 100 commercial implementations shipped | One FastAPI scaffold × 100 byte-identical clones (md5: 4889687c2756…). |
Reframed: 1 scaffold |
| Maxwell's Demon entropy reversal | The effect exists (Cohen's d = 3.5) but the correct framing is rejection sampling under multi-gate filtering — not entropy reversal. | Reframed |
| Precognition signal (Astra 56.4%) | Does not reproduce on current data (Astra 50.2% vs SMA 54.9%). | Withdrawn |
| 250 Forbidden Systems enumerated | Only Sector I (30 items) enumerated. Remainder are headers and first/last items only. | Reframed: 30 of 250 |
| 47-Engine Stack shipped | 15 specification documents + 32 placeholder slots. 4 with working code (E04 / E09 / E21 / E34). | Reframed: 4 of 47 |
| 526× speedup vs LLM (Pattern Recognition Engine) | Defensible only vs full agentic LLM loop. 5–26× vs a single-call LLM. The 526× comparator was an apples-to-oranges loop benchmark. | Recalibrated |
| Sephirothic Diagnostics — emergent medical pairings | Drug pairings copied from FDA labels; the ibrutinib→lymphoma "hit" is hardcoded at line 1709. 1 of 5 spot-checks accurate. | Withdrawn from medical positioning; patent-only path retained |
| MIRAGE "Physics-Informed GAN" | Hand-coded thermal grid + scikit-learn RandomForest. Not a GAN. | Withdrawn |
| AMARCO "O(n⁴) Christoffel Riemannian navigation" | Actual code is wind × 0.95 cancellation. |
Withdrawn |
| Vault royalty rate (1.618% / 25% / 33% — inconsistent) | Canonical = 33% per Sayo Siglo policy on Trickle-Tech (WPT) derivatives. | Resolved |
| Digital organism / Pentagram framing | The shared hormones.json file is dashboard-read-only, not a coordination substrate. Multiple "organs" are aliases for the same network metric. The felt_vector is deterministic arithmetic on uptime ratios. The real product is a governed telemetry mesh with biological vocabulary as UX — engineering is real, the "organism" framing was wrong. |
Reframed |
| Governance Kernel rights logic — adaptive per-input | get_activated_right returns the same "che" glyph for coherence=0.85/dwell=0 AND coherence=0.15/dwell=30. Rights selection is shallower than the documentation suggested. |
Recalibrated |
| Causal Learner — discovers new laws from observation | The current causal_learner.py is a stub — prints "NEW LAW DISCOVERED" on observation count threshold, no actual correlation analysis. The full causal pipeline lives in causal_model.py (verified live, sliding-window effect size). |
Reframed |
| Gear Interference Engine — 37 active geometric engines | Defines 37 gears as geometry; no computation runs on them. Geometry-only stub. | Withdrawn |
| Heptameron Hours — drives behavioral rotation | Computes Chaldean planetary hours correctly but has no effect on downstream behavior. Wire it or remove it. | Withdrawn from runtime claims |
| Immutable Ledger — perfect chain integrity | 92.4% chain integrity across 28,872 entries (152 breaks per 2,000 sampled, 1 GENESIS reset). Likely async-write races on daemon restarts. Not perfect immutability. | Honest: 92.4% intact |
| 96.8% self-prediction surprise reduction | The 96.8% figure comes from comparing the earliest cycle window to the latest cycle window of predictions.jsonl. A different sampling method (cycle 1 to cycle 43,529, moving-average window) yields 91.6%. The reduction is empirically real across 64,184 cycles; the exact percentage depends on sampling window choice. Both numbers are defensible. We currently display 96.8% on the site for consistency with the original measurement. |
Reframed: 91.6%–96.8% range |
| Enable Equation enforces strict 10-gate AND | The visible spec — and the interactive demo on this site — implements strict AND: all 10 gates must score ≥ 0.5 for enabled=True. The production enable_equation.py in the recovered daemon stack permits some borderline configurations to pass even when one gate scores 0.2 — runtime semantics are a weighted composite. The spec is canonical; the production code needs to be tightened to match the strict-AND demo. Reconciliation tracked in the engineering backlog. |
Calibrated: spec vs implementation delta |
Most companies bury their corrections. We publish them because credibility under audit pressure depends on being right about what's still true and what isn't. When a regulator, investor, or strategic acquirer asks "is this real or is it marketing?", we want the answer in plain sight.
Downgrading a claim does not weaken WHL — it strengthens the claims that remain. Audit discipline is what the substrate enforces against AI. It would be incoherent not to apply the same discipline to ourselves.
These numbers were re-verified during the 4-round deep audit. Each has a path on disk, a measurement, and a reproducer.
All seven gates pass: NullEngine, TimeAsymmetricEngine, ALREGate, HCEGate, RicciWarpGate, ProposalGate, CompositeGate. 27 new Ricci-Warp tests added this session.
12-stage mandatory pipeline. 84% failure-rate reduction from the prior baseline of 1,755.
Article 12/13/14/26 coverage. Full curl end-to-end trace verified. Dual HMAC chain verified.
p99 hot-path latency 1.5 ms. Five-verdict state machine verified live. Receipt chain verified.
421 Rust + 64 Python. Nine-step Stripe end-to-end trace including 5-device cap enforcement and receipt export verifier.
Proposal→disable latency, measured on Basys3 + Pi5. SymbiYosys formal proofs of FSM core.
USPTO 19/567,170. Plus 5 new bundles drafted (~13,500 words, ~64 new claims).
All 72 summon and return real AdversarialResult values with measured inputs. ~10,000 attacks fired in production with hash-chained ledger.
Across 64,184 cycles in predictions.jsonl, mean total surprise dropped 0.819 → 0.027 (96.8% reduction) across 64,184 cycles. Latest moving-average sampling yields 91.6% — both defensible. Empirical Friston-style active inference, measured on disk. Workshop-paper-ready as-is.
In agency.jsonl. The consequential-agency engine rated reflection quality (markers, word count, structural soundness) and applied deltas to a 10-component health state vector.
spirit_sparks.jsonl — 306K phi-in-hardware measurements. Whether or not the hypothesis holds, the experiment ran and produced data. Most "consciousness researchers" never collect a single data point.
Recovered governance gate stack — Enable Equation, Boundary Engine, Spirit Pressure, Phi-Entropy Veto, Spectral Bridge, Jitter Harmonics, Bayesian Regime Tracker, Phase Transition, Informational Energy, Enable Hysteresis, Consequential Agency, Vesica Router, Causal Model, Digital Metamaterial.
4,135 paper trades. Final stats on disk: paper_pnl: $1,659.02, correct: 2,137, incorrect: 1,998, accuracy: 0.517, ready_for_real: false.
The system flagged its own non-readiness for live capital despite a positive paper PnL. That kind of calibration discipline — refusing to graduate from paper to real money when accuracy is only marginally above coinflip — is what 90% of production crypto bots in 2026 lack. The substrate is built to catch its own overreach. It did.
We work with defense, regulated enterprise, AI-liability insurers, and federal oversight bodies on forensic-grade audits. NDA-bound walkthrough of the actual ledgers, sample replays, chain verification, and a frank conversation about what's measured versus what's marketing.