Architecture

Why your vector store is not your data layer

LAST UPDATED · MAY 9, 2026

By · · Architecture · 9 min

Vector indexes are a cache. Treating them as a system of record is the most common foundation mistake we fix.

Almost every AI build we audit treats the vector store as the data layer. Pinecone, Weaviate, Chroma, Qdrant. Sometimes pgvector, sometimes a homebrewed index. The wiring looks the same: ingest documents, embed, query at agent time, hand the chunks to the model. It feels like a database. It is not.

A vector store is a cache over an embedding function. The source documents and the embedding model are upstream. Change either and the cache is wrong, silently. Drop the index and your system loses memory of facts it had yesterday. Try to enforce a foreign key, a transaction, an audit trail, or a row-level permission, and you discover the index does not have those concepts. It was never designed to.

We see the symptoms in production every month. Stale answers after a doc is updated. Phantom citations to chunks that no longer exist. Permission leaks where an agent surfaces text it should not have. Slow re-indexes that block every model swap. Each looks like a different bug. They are the same bug: the cache is acting as the source.

The fix is structural. Hold the truth in a system of record the company already trusts: a Postgres warehouse, the CRM, an object store with proper ACLs. Make the vector index a derived view with a clean rebuild path. Treat embedding as a function from records to vectors, not as a one-way upload. When you swap models, you rebuild the view. When a record is deleted, the view drops the row. When a permission changes, the query enforces it before retrieval, not after.

This is the unglamorous half of the AI Operating System. Layer 1, the context and data layer, sits underneath every agent in the stack. Get it wrong and every layer above inherits the rot. Get it right and the rest of the build is a series of small, swappable components on a foundation that holds.

If your team is debating which vector database to standardize on, you are debating the wrong thing. Pick the system of record first. The index is whatever fits behind it.

Subscribe

More like this in AI OS by Shrey.

Field notes on AI Operating Systems, agent infrastructure, and B2B operations. Posted when there is something to say.

Subscribe to the newsletter
← Back to all posts 2026-04-12 · Architecture

Ready?

Less reading. Build it.

A 30 minute consultation where we will talk through your stack, the workflow that is eating your team, and what the first engagement would look like.