536,264 documents indexed. 22GB SQLite corpus. FTS5 full-text plus semantic vector search using 768-dimensional embeddings. Flask query interface, self-hosted. Deep coverage of physics, mathematics, cryptography, philosophy, signal processing, and governance systems. Powers Cascade routing, GE-OS decision pathways, Safe Agent Runtime reasoning, and internal research. Queryable today.
The substrate stores 536,264 documents in a SQLite corpus with two retrieval paths working in parallel: SQLite FTS5 for exact-term and lexical search, and a 768-dimensional vector store for semantic similarity. Queries hit both. Results carry document provenance and an integrity-anchored corpus reference.
SQLite FTS5 over the full document corpus. Exact-term, phrase, and lexical queries. Useful when the researcher knows the term they need and wants the citing documents back, ranked.
768-dimensional embeddings. Concept-level retrieval, query in natural language, recover documents that share semantic neighborhood without sharing surface terms.
Deep coverage across physics, mathematics, cryptography, philosophy, signal processing, and governance systems. Useful for R&D teams running prior-art sweeps, decision warehousing, and concept-cluster analysis.
Every returned document carries its source path within the indexed corpus. Researchers can trace a result back to the originating file. No floating snippets.
The corpus is integrity-anchored via SHA-256. Tamper events are detectable. The substrate is the same artifact every researcher queries, not a moving target.
Lightweight self-hosted Flask surface, query box, hybrid result panel, provenance link, integrity status. Researchers can be working against the substrate inside an afternoon of access provisioning.
The substrate is online and queryable. Internal WHL products integrate with the Knowledge Layer for decision-making, reasoning, and documentation. No long wait for first-value.
Physics, mathematics, cryptography, philosophy, signal processing, and governance systems. The substrate is broad enough to support cross-domain queries, exactly what Cascade routing and GE-OS governance pathways need.
Cascade, GE-OS, and Safe Agent Runtime all benefit from a backing corpus: one integrity-anchored document set where the research context for decision pathways is provable across sessions and auditable. The Knowledge Layer serves as the shared reasoning substrate.
FTS5 plus semantic vectors in the same query path. Lexical precision when you know the term, semantic recall when you don't. One result panel, two index types.
Corpus carries a SHA-256 anchor. Two decision pathways querying the same knowledge layer get the same document set. Audit trails over reasoning workflows become possible.
Integrated with Cascade (task routing), GE-OS (governance), and Safe Agent Runtime (continuous loops). Shared infrastructure, one corpus reference, deterministic decisions.
Health check on the live knowledge layer.
$ python tools/health_check.py
Connecting to ACTIVE_KNOWLEDGE.db (22.0 GB)...
Total documents: 536,264
FTS5 index status: ready
Embedding coverage: 128,256 / 536,264 (semantic)
Sample query "phi": 6,206 hits in 12 ms
Sample query "consciousness": 6,070 hits in 9 ms
Sample query "governance": 4,847 hits in 8 ms
Flask UI status: running
Corpus SHA-256: [anchored, verifiable on request]
Verified live: 536,264 documents indexed, FTS5 + 128K semantic embeddings, self-hosted. Corpus hash anchored. Powers Cascade, GE-OS, and Safe Agent Runtime reasoning.
The Knowledge Layer is part of the WHL platform stack. For details on how it integrates with Cascade, GE-OS, and Safe Agent Runtime, request an architecture brief.