Redirecting to WHL Knowledge Layer...

536,264 documents indexed. 22GB SQLite corpus. FTS5 full-text plus semantic vector search using 768-dimensional embeddings. Flask query interface, self-hosted. Deep coverage of physics, mathematics, cryptography, philosophy, signal processing, and governance systems. Queryable today; productization path identified as the Canon Concepts Knowledge Graph SKU.

Request Knowledge Substrate Demo → How It Works
536K
Documents Indexed
22GB
SQLite Corpus
FTS5 + Semantic
Hybrid Retrieval
Local
Self-Hosted Flask UI
How the Knowledge Substrate Works

A 22GB indexed research corpus with hybrid full-text and semantic retrieval.

The substrate stores 536,264 documents in a SQLite corpus with two retrieval paths working in parallel: SQLite FTS5 for exact-term and lexical search, and a 768-dimensional vector store for semantic similarity. Queries hit both. Results carry document provenance and an integrity-anchored corpus reference.

Why the Substrate

A corpus you can search, cite, and trust the integrity of.

Research teams already have vector databases. They already have full-text search. What they do not have, often, is a single integrity-anchored corpus where the document set under query is provable across sessions and across reviewers.

Hybrid Retrieval

FTS5 plus semantic vectors in the same query path. Lexical precision when you know the term, semantic recall when you don't. One result panel, two index types.

Integrity-Anchored

Corpus carries a SHA-256 anchor. Two reviewers running the same query against the same anchor get the same document set. Audit trails over research workflows become possible.

Productization Path Identified

Canon Concepts Knowledge Graph SKU layers concept extraction and navigation on top. Pilot customers shape what gets prioritized, extraction, ontology, graph UI, or export contract.

Pricing

Enterprise license inquiry. Scoped to seat count and corpus footprint.

Target Customers

Research labs, federal R&D, decision warehouses, IP firms.

The Knowledge Substrate is built for organizations whose research output depends on the integrity of the corpus they retrieve from, not just the quality of the model that summarizes it.

Research labs Federal R&D programs DARPA program offices Decision-warehouse customers IP and patent firms National-lab cross-discipline teams Cross-domain R&D groups
Verified Output

Tests pass. Receipts on disk.

Health check on the live knowledge substrate.

$ python tools/health_check.py
Connecting to ACTIVE_KNOWLEDGE.db (22.0 GB)...
Total documents:      536,264
FTS5 index status:    ready
Embedding coverage:   128,256 / 536,264 (semantic)
Sample query "phi":   6,206 hits in 12 ms
Sample query "consciousness": 6,070 hits in 9 ms
Flask UI status:      running
Corpus SHA-256:       [anchored, verifiable on request]

Verified live: 536,264 documents indexed, FTS5 + 128K semantic embeddings, self-hosted. Corpus hash anchored.

Queryable Today, Canon SKU Pilot Window Open

Query a 536K-document integrity-anchored corpus.

Demos run live against the substrate. Pilot customers shape the Canon Concepts Knowledge Graph SKU and lock in early-access pricing against the productization path.