All articles

Stitch Blog

Introducing the Stitch Swarm

Stitch
CompanySeptember 30, 20255 min read

Our data engine: thousands of human-like agents gathering candidate data across the live internet.

Romain Rey

Romain Rey

Co-founder & CTO

Everything Stitch does starts with data. Before you can score a candidate or reach out to them, you have to find them, and the people worth hiring are scattered across the internet rather than sitting in any one database. The Stitch Swarm is how we find them.

Imagine one person searching the internet for candidates, gathering data by hand. Now multiply that by thousands. Thousands of people looking across the internet, as humans, collecting data. That is roughly what the Stitch Swarm does.

What it is

The Stitch Swarm is our data collection engine: a multi-agent swarm of thousands and thousands of agents working in parallel, out on the internet collecting candidate data continuously. The agents behave like humans rather than like classic scrapers, which is what lets them gather data at this scale and keep it fresh. The result is a deep base of candidate data that we keep continuously refreshed, so we are never starting from zero.

That base is only the starting point. When you run a search, the swarm does not just return what is already in it. It goes out again, live, and gathers data for your specific role from the sources that fit it. The base gives each search its direction; the live pass means every search is collected fresh, with its own sources. That is why the data lands after about 12 hours rather than instantly, and why no two searches are the same.

Scaling it to work anywhere

We did not start here. The first version of the swarm only ran in Seattle, and a single search took about two weeks to gather the data. Since then we have scaled it to run anywhere, for any role, and return the data in about 12 hours.

That is a hard engineering problem, and it is expensive to run at this scale. We keep it economical through a collaboration with Google, whose infrastructure operates at a scale almost no one else can match. My years at Google, and the relationships I kept there, are a big part of why that collaboration exists.

Where the data lives

There is no single source we primarily rely on. Each candidate's data is gathered from its own unique place, and different candidates are built from different sources. The swarm pulls from wherever a given person's footprint actually lives.

A good example is lawyers. Lawyers often have very sparse profiles on professional networking sites, with little detail about what they have actually worked on. Their firm's own website tends to hold far richer information, so for these candidates the swarm collects from sources like law firm websites instead. Engineers leave their best trail in source code repositories. Different candidates are best served by different sources, and the swarm goes wherever the real data lives. Because we are not dependent on any one source, no single platform change can cut off our view.

Why it's hard to copy

The idea is easy to describe and very hard to build. Running thousands of autonomous agents at once is a coordination problem: they have to be orchestrated so they do not collide or redo each other's work, each routed to the right source; the information they bring back has to be validated and verified rather than trusted blindly; and all of it has to be resolved to the right unique candidate, since the same person shows up across many sources under slightly different details. Doing all of that while holding quality and staying cheap enough to run on every role is a hard distributed-systems problem. The model is the easy part. Anyone can wrap a language model; almost no one can run a swarm like this at scale.

We have a method for orchestrating the swarm at this scale that we have not seen anywhere else. It comes out of years building large-scale machine learning systems at Microsoft and Google, the same background reflected in the 17 AI patents Alex and I hold between us. That combination, the method and the infrastructure to run it economically, is what makes the swarm hard to copy.

Why it's possible now

This is only possible now because AI has become cheap and efficient enough, and agentic models good enough, to actually go out and do this. The same swarm a few years ago would have been impossible to build and impossible to afford. The technology only just caught up to the idea.

The Swarm is why Stitch searches live instead of querying a pre-built index. We unpack that difference in Stitch vs SeekOut, or you can start a 14-day trial and see what it surfaces for your roles.

See it on your own roles

Start a 14-day trial and see real candidates booked on your calendar before you decide. Most customers only pay on a successful hire.

Keep reading