ENTRY_04 11-30-2025

Why I'm Building VectorFlow

Everyone wants RAG. Few know how to build the pipeline that powers it.

The discovery that led here

In my last weeks at Airbyte, I spent a lot of time on discovery for AI use cases. Interviews with users, market research, the usual. A pattern kept emerging: companies had hordes of unstructured data they wanted to tap into. The means was almost always RAG. They wanted a chatbot that could actually use their documents.

But they kept hitting roadblocks. And often, they couldn’t even articulate what was blocking them beyond the fact that they were stuck.

Where people get stuck

Some didn’t know where to start. They had unstructured data. They knew they needed a chatbot. Their first instinct was to move data from point A to point B. After that? Blank. Maybe a vector database. That’s it.

Others had a working RAG setup, but it wasn’t giving them the results they wanted. They were iterating, but each iteration was painful. The feedback loop was slow. Time to failure was high.

A subset of these people used tools like n8n to build end-to-end RAG setups, both the processing pipeline and the RAG itself. It made things easier, but with serious tradeoffs. One person I spoke with had built prototypes this way. n8n handled parsing behind the scenes, probably Unstructured or something similar. They did chunking and embedding themselves. The abstraction got them to a working prototype, but when responses came back wrong, they had no way to debug it. They couldn’t see what was happening in parsing. Moving beyond the prototype stage felt unrealistic. They wanted more control. They wanted ways to manipulate and optimize the pipeline without losing visibility.

The same problem, twenty times over

When I started researching more broadly, the pattern was everywhere. People understood the general steps of a RAG pipeline. They knew the theory. But at some point, it becomes obvious: you can build twenty different RAGs and still experience the same problems. If your processing pipeline isn’t good, your RAG won’t be good.

People had played with complicated tools. Some succeeded, some didn’t. But the time it took to fail and learn from that failure was enormous. The frustration was palpable, from understanding what steps are required, to picking the right tools, to figuring out what works for their specific data.

A familiar pattern

This space reminded me of something. Half a decade ago, maybe more, data integration looked a lot like this. Everything was manual. Existing players had painful solutions. The modern data stack changed that. Components became commodified. What used to take teams and years could now be done by a few individuals in months, sometimes weeks.

RAG pipelines need the same thing. Something that gives you enough fidelity into what you’re doing without the slow time to value. Fail fast, learn quickly, produce value in a short amount of time.

VectorFlow

So I’m building VectorFlow, a tool that lets you configure RAG document processing pipelines through conversation, preview the output at each step, and iterate without reprocessing everything. Problems surface at config time, not when a user asks a question and gets nonsense back.

It’s scary to put this out there. But it’s a problem I’m passionate about, and I want to see if I can help people who are stuck in this exact situation.

← Back to Home