Why Your RAG Will Fail Enterprise Security Review
You built the RAG. Parsing works. Chunking looks good. Retrieval returns relevant results. The demo goes well.
Then it hits security review.
- Who can access what documents through this system?
- If someone’s permissions change in SharePoint, how long until the RAG reflects that?
- What stops User A from seeing documents they’re not authorized to?
And suddenly the project that was two weeks from production is back on the whiteboard.
The gap nobody warns you about
Permissions exist at the source. Your S3 bucket has folder-level access controls. Your SharePoint has groups. Your Confluence has spaces with different visibility. These controls exist for a reason, and your organization’s security team expects them to be respected.
But somewhere between the source and the vector database, those permissions disappear. The chunking step doesn’t know about them. The embedding step definitely doesn’t. By the time your documents are sitting in Pinecone or PgVector, they’re just vectors with maybe a filename attached.
Now you have two bad options: retrofit permissions onto a system that wasn’t designed for them, or rebuild from scratch.
Where it actually breaks
The failure mode isn’t dramatic. It’s subtle.
Someone in engineering asks the RAG about project timelines. The retrieval pulls chunks from a finance document that mentions the same project. The engineer sees revenue projections they shouldn’t have access to. Nobody notices until an audit.
Or permissions change. Someone moves teams. In the source system, their access updates immediately. In the RAG, the old permissions are baked into metadata that was set during ingestion three months ago. The vector database has no idea anything changed.
The security team’s concern isn’t hypothetical. They’ve seen this pattern before with other systems. Data gets copied somewhere, permissions don’t follow, and eventually something leaks.
What practitioners are actually doing
I’ve been following discussions in RAG communities, and the teams running this in production have converged on a few patterns.
The first insight is that authorization can’t happen at the LLM layer. The model should never decide what it’s allowed to see. By the time the LLM receives context, that context should already be filtered to only include authorized content.
The second insight is about where filtering happens. Some teams filter after retrieval. Get the top results, then remove unauthorized ones before sending to the model. This works but has a problem: if you retrieve ten chunks and eight get filtered out, your context window is starved. Your retrieval was tuned for ten chunks of context. Now the model has two, and the answer might have been in chunk eleven, which you never fetched.
The better pattern is pre-filtering. Before the vector search even runs, determine which documents the user can access, then constrain the search to only those documents. You always get a full context window of authorized content.
The third insight is architectural. Teams that handle this well keep their vector store “dumb” about permissions. Chunks carry a document ID, nothing more. The actual permission logic lives in a separate layer, often a graph database that models the relationships between users, groups, and documents. When permissions change at the source, you update one edge in the graph. No re-indexing required.
The preprocessing connection
Here’s the thing that gets missed: this is a preprocessing problem as much as a retrieval problem.
If your chunks don’t carry a stable document ID through every transformation, you can’t link them back to permissions later. If your ingestion pipeline strips source metadata, you’ve lost the information you’d need to reconstruct access controls. If your chunking strategy splits a document without preserving which chunks came from where, you’re building on sand.
The vector database can’t fix what preprocessing broke.
This is the kind of upstream problem I think about with what I’m building at VectorFlow. Not the authorization layer itself, that’s a different system. But making sure the metadata that authorization depends on actually survives the journey from raw document to embedded chunk. You can’t secure what you can’t trace back to its source.
The question to ask earlier
Before your RAG gets anywhere near production, ask: if permissions change tomorrow, what breaks?
If the answer involves re-indexing your entire vector store, you have a design problem. If the answer is “we haven’t thought about that,” you have a project risk.
The security review isn’t trying to kill your project. They’re asking questions that will come up eventually. Better to have answers before the demo than after.
← Back to Home