536,264 documents indexed. 22GB SQLite corpus. FTS5 full-text plus semantic vector search using nomic-embed-text 768-dimensional embeddings. Flask query interface on port 8551. Deep coverage of physics, mathematics, cryptography, philosophy, signal processing, and governance systems. Queryable today; productization path identified as the Canon Concepts Knowledge Graph SKU.
The substrate stores 536,264 documents in a SQLite corpus with two retrieval paths working in parallel: SQLite FTS5 for exact-term and lexical search, and a nomic-embed-text 768-dimensional vector store for semantic similarity. Queries hit both. Results carry document provenance and an integrity-anchored corpus reference.
SQLite FTS5 over the full document corpus. Exact-term, phrase, and lexical queries. Useful when the researcher knows the term they need and wants the citing documents back, ranked.
nomic-embed-text 768-dimensional embeddings. Concept-level retrieval — query in natural language, recover documents that share semantic neighborhood without sharing surface terms.
Deep coverage across physics, mathematics, cryptography, philosophy, signal processing, and governance systems. Useful for R&D teams running prior-art sweeps, decision warehousing, and concept-cluster analysis.
Every returned document carries its source path within the indexed corpus. Researchers can trace a result back to the originating file. No floating snippets.
The corpus is integrity-anchored at SHA-256 prefix 58c471fc…. Tamper events are detectable. The substrate is the same artifact every researcher queries — not a moving target.
Web UI on port 8551. Lightweight Flask surface — query box, hybrid result panel, provenance link, integrity status. Researchers can be working against the substrate inside an afternoon of access provisioning.
The substrate is online and queryable. Pilot access surfaces the Flask UI and a structured-query path suitable for downstream tooling. No long productization wait for first-value.
Productization path is identified: the Canon Concepts Knowledge Graph SKU layers concept extraction and graph navigation on top of the indexed substrate. Pilot customers shape the SKU.
Physics, mathematics, cryptography, philosophy, signal processing, and governance systems. The substrate is broad enough to support cross-domain queries — exactly the queries R&D labs and IP firms care about.
Research teams already have vector databases. They already have full-text search. What they do not have, often, is a single integrity-anchored corpus where the document set under query is provable across sessions and across reviewers.
FTS5 plus semantic vectors in the same query path. Lexical precision when you know the term, semantic recall when you don't. One result panel, two index types.
Corpus carries a SHA-256 anchor. Two reviewers running the same query against the same anchor get the same document set. Audit trails over research workflows become possible.
Canon Concepts Knowledge Graph SKU layers concept extraction and navigation on top. Pilot customers shape what gets prioritized — extraction, ontology, graph UI, or export contract.
The Knowledge Substrate is built for organizations whose research output depends on the integrity of the corpus they retrieve from — not just the quality of the model that summarizes it.
Health check on the live knowledge substrate.
$ python tools/health_check.py
Connecting to ACTIVE_KNOWLEDGE.db (22.0 GB)...
Total documents: 536,264
FTS5 index status: ready
Embedding coverage: 128,256 / 536,264 (semantic)
Sample query "phi": 6,206 hits in 12 ms
Sample query "consciousness": 6,070 hits in 9 ms
Flask UI status: running on :8551
Corpus SHA-256: 58c471fc3ed92e4e4bc0e3c19d6242d813c13c87f12fc5ca2d385e2d0aaa8287
Verified live: 536,264 documents indexed, FTS5 + 128K semantic embeddings, queryable on port 8551. Corpus hash anchored.
Demos run live against the substrate on port 8551. Pilot customers shape the Canon Concepts Knowledge Graph SKU and lock in early-access pricing against the productization path.