Joe Archondis
July 5, 2026 · 9 min read
Business AI Automation
Building an Autonomous B2B Lead Generation Agent
The first version of my B2B lead gen agent had five stages: research, enrichment, qualification, copywriting, and sending — each handled by a separate agent. It processed one prospect in 22 seconds and cost $0.04 per contact. The rewrite had two agents. 8 seconds per prospect. 60% cheaper. The lesson wasn't about AI. It was about where delegation actually helps versus where it just adds failure modes between stages.
Here's the full system I'd build today from scratch, what went wrong in version 1, and the numbers from a real client deployment.
Why Manual B2B Prospecting Fails at Scale
A solid SDR can research and personalize 20 to 30 outreach messages per day. That's not 20 conversations — it's 20 people contacted. Most won't reply. Cold outreach response rates hover around 2 to 5% when targeting and copy are good. To get 2 qualified replies per day, you need 40 to 100 touches. Running that consistently, five days a week, at real personalization quality is a volume problem that humans can't solve economically.
The consistency problem is less obvious. Manual outreach quality varies. It depends on who's doing it that day, how many they've already sent, whether they actually read the LinkedIn profile or skimmed it. An agent applies the same qualification criteria and the same personalization logic to every prospect. The 1,000th message gets the same quality check as the first.
Volume is the multiplier. An agent processing 200 prospects per day, five days a week, produces 4,000 touches per month. At a 3% response rate, that's 120 qualified replies. A single SDR hitting 30 per day gives you 600 touches and 18 replies. That gap compounds.
What the Agent Actually Does
The pipeline has three stages: sourcing, qualification, and outreach generation. Sourcing and qualification run inside Agent 1. Outreach generation is Agent 2. The split exists because qualification output is structured data that Agent 2 reads — not a freeform summary it has to re-interpret.
Stage 1: Prospect and Qualify
Agent 1 pulls contacts from Apollo.io or LinkedIn Sales Navigator — name, title, company, LinkedIn URL, email if available. It enriches missing emails via Hunter.io and pulls company-level data from Clearbit: headcount, industry, tech stack signals. For ICP matching that depends on what a company uses, BuiltWith job post analysis fills the gap.
Then it runs each contact through a qualification rubric. The rubric is the founder's ICP, encoded: company size range, industry, hiring signals, tech stack presence, recent funding. Agent 1 reads the company website and recent LinkedIn posts, scores the prospect 0 to 1, and writes a one-sentence personalization hook if they pass. Contacts below the threshold get filtered with a reason logged. No message is ever written for a filtered contact.
Typical pass rate: 25 to 35% of sourced contacts. Out of 200 pulled, 50 to 70 make it to Agent 2.
Stage 2: Copywrite
Agent 2 reads Agent 1's structured output — ICP score, personalization hook, company context — and writes the outreach message. The structure is fixed: personalized opener specific to this prospect, a fixed mid-section covering the value prop, and a CTA. The personalized opener is what Agent 1 already identified: a recent post they made, a job opening signaling a relevant pain point, or a specific company milestone. Not "I noticed you're at Acme." Specific.
Each message, with ICP score and qualification notes, queues in the database for review. The founder sees a Telegram message or a Google Sheet row: the prospect, the draft, the score, one tap to approve or reject.
Version 1 vs. Version 2: What Actually Changed
The original five-agent architecture looked clean on paper. Research agent, enrichment agent, qualification agent, copywriter agent, sender. In practice it created three compounding problems.
Token cost compounded with each handoff. Every agent received a full system prompt plus the prior agent's output as context. By agent five, I was burning tokens re-explaining what agents one through four had already processed. Error propagation was brutal — incomplete enrichment data led to a weak qualification score, which led to a weak personalization hook, which led to a message that should never have been written. No guardrails between stages meant failures cascaded silently. Latency was just bad: 22 seconds per prospect across five sequential LLM calls.
The fix was collapsing five agents into two, with a typed data contract between them:
# Agent 1 returns a typed object, not a freeform summary
# Agent 2 reads the schema, not prose it has to re-interpret
QUALIFY_TOOL = {
"name": "qualify_prospect",
"description": "Score a prospect against the ICP rubric.",
"input_schema": {
"type": "object",
"properties": {
"company_fit": {"type": "boolean"},
"icp_score": {"type": "number"}, # 0.0 to 1.0
"personalization_hook": {"type": "string"}, # specific to this prospect
"filter_reason": {"type": "string"} # null if they passed
},
"required": ["company_fit", "icp_score", "personalization_hook"]
}
}
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=512,
system=QUALIFICATION_SYSTEM_PROMPT,
messages=[{"role": "user", "content": prospect_context}],
tools=[QUALIFY_TOOL],
tool_choice={"type": "tool", "name": "qualify_prospect"}
) Agent 2 reads icp_score and personalization_hook directly from this object. No re-parsing, no re-interpretation. The handoff is deterministic.
| Version 1 (5 agents) | Version 2 (2 agents) | |
|---|---|---|
| Latency per prospect | 22 seconds | 8 seconds |
| Cost per prospect | ~$0.04 | ~$0.016 |
| Error recovery | None between stages | Validated at schema boundary |
| Token re-use | Context repeated at each stage | Passed once as structured object |
Results From a Real Deployment
We ran this for a B2B software client targeting mid-market SaaS companies in the $5M to $50M ARR range. ICP: VP of Engineering or CTO, company using AWS, recent engineering hires signaling growth. Three weeks of running:
- 1,840 prospects sourced from Apollo.io over 3 weeks
- 520 qualified (28%) — passed the ICP rubric and reached the copywriting stage
- 1,200 messages sent — the founder reviewed and approved in batches of 50
- 41 replies (3.4% response rate) — in line with well-targeted cold outreach
- 8 meetings booked in three weeks from a standing start
The qualification filter is the most important part of those numbers. Sending 1,200 messages out of 1,840 sourced contacts instead of all 1,840 meant every outreach had a real reason to land in that person's inbox. Generic volume kills reply rates. Filtered volume improves them.
When This Makes Sense (and When It Doesn't)
This works well when your ICP is clearly defined and outbound is a real channel for you. If you know exactly who your best customers are, an agent qualifies and personalizes at scale without degrading quality. The automation targets volume and consistency — two things that genuinely break with manual prospecting.
It doesn't replace a founder who's still figuring out their ICP. If you're unsure who your ideal customer is, the agent will efficiently process the wrong people. The qualification criteria have to come from somewhere: past deals, customer interviews, pattern-matched from your best existing accounts. The rubric is the hardest part to build — not the engineering.
It also doesn't work for complex enterprise deals where relationship context dominates. Reaching a Fortune 500 CTO requires timing, network context, and internal political awareness that no agent has access to. Use it for mid-market and below, where volume matters and cold outreach is a legitimate channel.
Cost Breakdown
Running a 200-prospect-per-day pipeline, the monthly infrastructure cost:
| Cost item | Monthly |
|---|---|
| Claude API (qualification + copywriting) | $45–80 |
| Hunter.io (email verification) | $49 (Starter) |
| Apollo.io (contact sourcing) | $99–149 |
| Cloud Run hosting + PostgreSQL | ~$20 |
| Total | ~$215–300 |
Compare that to an SDR fully loaded at $3,000 to $5,000 per month, reaching 600 contacts per month. The agent hits 4,000 contacts at comparable personalization quality, filters out 70% before any message is written, and runs 24 hours a day. Build time is 3 to 4 weeks. The engineering cost is recovered in the first month you don't need to hire a dedicated SDR.
The rubric iteration takes the most time. Getting the ICP score to match the founder's actual intuition usually takes 2 to 3 tuning rounds. Run a batch of 100, review the filtered contacts, adjust the criteria, run again. It's calibration work, not engineering work.
Frequently Asked Questions
What data sources does a B2B lead gen agent pull from?
The most reliable setup: Apollo.io or LinkedIn Sales Navigator for contact sourcing, Hunter.io for email verification, Clearbit for company enrichment, and BuiltWith for tech stack signals. The agent queries these in parallel during enrichment. Which sources matter depends on your ICP — if you're targeting by tech stack, BuiltWith is essential. If you're going purely by company size and industry, Apollo alone may be enough.
How do you prevent the agent from sending low-quality messages?
Two layers. First, the qualification agent filters hard — only contacts above the ICP score threshold ever reach the copywriting stage. Second, there's a human review step before sending. The founder sees each message, the personalization hook, and the score before approving. Start with 100% review, then loosen it once you trust the output. Most founders stay at 100% review permanently — it takes 30 to 60 seconds per batch of 20 messages and keeps them in the loop on what's going out.
Does this violate LinkedIn's terms of service?
Scraping LinkedIn directly does. The production setup uses Apollo.io or Sales Navigator official exports for contact sourcing — they handle data licensing. The agent reads public profile content for personalization context but doesn't scrape at scale. This distinction matters: sourcing contacts and reading a public profile are different activities with different legal standing. If you're unsure, use Apollo as the primary source and skip LinkedIn profile reads entirely.
Can a non-technical founder set this up without a developer?
Not fully. Connecting the APIs, building the qualification logic, setting up the database and cloud hosting — that requires engineering. A clean first implementation takes 3 to 4 weeks with a developer who knows the stack. After that, the ongoing operation is non-technical: reviewing approved messages, adjusting the ICP rubric, monitoring reply rates. The founder drives the rubric — the developer builds the plumbing once.
Working on something similar?
I build AI agents and low-latency systems. If you're trying to solve a version of this, let's talk.
Get in touchAuthor: Joe Archondis — AI systems engineer and HFT infrastructure builder.
Last updated: 2026-07-05