Why I'm Building VectorFlow
Everyone wants RAG. Few know how to build the pipeline that powers it.
The discovery that led here
In my last weeks at Airbyte, I spent a lot of time on discovery for AI use cases. Interviews with users, market research, the usual. A pattern kept emerging: companies had hordes of unstructured data they wanted to tap into. The means was almost always RAG. They wanted a chatbot that could actually use their documents.
But they kept hitting roadblocks. And often, they couldn’t even articulate what was blocking them beyond the fact that they were stuck.
Where people get stuck
Some didn’t know where to start. They had unstructured data. They knew they needed a chatbot. Their first instinct was to move data from point A to point B. After that? Blank. Maybe a vector database. That’s it.
Others had a working RAG setup, but it wasn’t giving them the results they wanted. They were iterating, but each iteration was painful. The feedback loop was slow. Time to failure was high.
A subset of these people used tools like n8n to build end-to-end RAG setups, both the processing pipeline and the RAG itself. It made things easier, but with serious tradeoffs. One person I spoke with had built prototypes this way. n8n handled parsing behind the scenes, probably Unstructured or something similar. They did chunking and embedding themselves. The abstraction got them to a working prototype, but when responses came back wrong, they had no way to debug it. They couldn’t see what was happening in parsing. Moving beyond the prototype stage felt unrealistic. They wanted more control. They wanted ways to manipulate and optimize the pipeline without losing visibility.
The same problem, twenty times over
When I started researching more broadly, the pattern was everywhere. People understood the general steps of a RAG pipeline. They knew the theory. But at some point, it becomes obvious: you can build twenty different RAGs and still experience the same problems. If your processing pipeline isn’t good, your RAG won’t be good.
People had played with complicated tools. Some succeeded, some didn’t. But the time it took to fail and learn from that failure was enormous. The frustration was palpable, from understanding what steps are required, to picking the right tools, to figuring out what works for their specific data.
A familiar pattern
This space reminded me of something. Half a decade ago, maybe more, data integration looked a lot like this. Everything was manual. Existing players had painful solutions. The modern data stack changed that. Components became commodified. What used to take teams and years could now be done by a few individuals in months, sometimes weeks.
RAG pipelines need the same thing. Something that gives you enough fidelity into what you’re doing without the slow time to value. Fail fast, learn quickly, produce value in a short amount of time.
VectorFlow
So I’m building VectorFlow. It’s a no-code tool that turns documents into vectors through conversation. It handles parsing, chunking, metadata extraction, and embedding, then loads everything into your vector database.
It’s scary to put this out there. But it’s a problem I’m passionate about, and I want to see if I can help people who are stuck in this exact situation.
← Back to Home