Live Case study · Multi-agent · EU ↔ Egypt

Bridge Sourcing — multi-agent B2B sourcing across the EU tariff lane

Egypt has a bilateral free-trade agreement with the EU. For many categories the import tariff is 0%. Most EU buyers don't know this, and the ones who do can't find suppliers because Egyptian B2B is offline. Bridge Sourcing is a multi-agent pipeline that finds, qualifies, and opens a conversation with the right suppliers — automatically. The hardest part wasn't the agents. It was the qualification rubric.

The problem

A German buyer looking for a textile supplier has three options:

Egypt has real manufacturing capacity in textiles, processed foods, chemicals, and construction materials. It has the 0% EU tariff. What it doesn't have is a well-indexed supplier directory. Most Egyptian B2B websites are static HTML from the early 2010s, Arabic-only, and don't list any machine-readable product data.

Bridge Sourcing closes that gap. The buyer describes what they need ("200k knit T-shirts, cotton 180gsm, OEKO-TEX certified, Germany delivery"). The pipeline finds 20–50 candidate suppliers, scores them against a qualification rubric, and drafts an outreach email that sounds like a human wrote it.

Architecture

Three agents, one state machine, one shared memory.

Buyer brief structured intake Discovery agent Playwright + LLM extract Qualification agent rubric scorer Lead state machine Postgres · JSON Outreach agent email drafter Human approval review queue SMTP + DKIM sendpulse · signed Reply parser sentiment + intent Mnemonic shared agent memory
Fig 1 · Bridge Sourcing pipeline · discovery → qualification → outreach · with human approval and shared memory

Discovery agent

Playwright browses candidate directories and supplier sites. The extraction layer uses Claude Sonnet for structured data pulls (company name, product categories, certifications, contact info). Pure regex scraping fails on Arabic-language pages with inconsistent layouts; an LLM handles the layout variance gracefully. Output: a JSON record per supplier with ~20 fields.

Qualification agent — the real moat

This is where the system earns its keep. Every supplier gets scored on a weighted rubric. The LLM doesn't decide the weights — Wael and I did — but it fills in the data:

SignalWeightSource
Category match25%buyer brief vs supplier product list
Certifications match20%OEKO-TEX, ISO, FDA, etc.
Response reachability15%valid email, phone, WhatsApp verified
Evidence of real production15%factory photos, trade fair presence, LinkedIn signals
English/German capability10%site content, rep profiles
Volume match10%stated MOQ and capacity vs buyer need
Export experience5%mentions of EU customers, tariff awareness

A supplier scoring < 50 is discarded. 50–75 is flagged for manual review. 75+ goes straight to the outreach queue. The rubric isn't magic — it's the result of Wael and I sitting with 200 hand-scored suppliers and asking "which signals predicted whether they actually responded and delivered." The LLM just applies what we learned.

Outreach agent

Drafts first-touch emails in the buyer's language (German, English, French, sometimes Arabic). Every draft goes into a human-approval queue — we never send unreviewed mail. The agent's job is to get 80% of the way there; the reviewer's job is to catch the 20% of drafts that would embarrass the buyer.

The stack

PythonFastAPIPlaywrightPostgresClaude Sonnet 4.6GPT-4oSMTP + DKIMPydanticMnemonic (OSS)

By the numbers

82 → 96%
Scrape accuracy
~14s
Per-lead enrichment
DKIM ✓
Inbox-safe
EU + UK
Buyer coverage

The scrape accuracy jump from 82% to 96% was the single most impactful change. It meant the qualification agent stopped filtering out legitimate suppliers because of bad data in the upstream pipeline. I wrote a separate essay on the 8 specific changes that moved that number.

What I got wrong

Starting with the agents instead of the rubric

I spent a week building the multi-agent orchestration before writing the qualification rubric. The result was a beautiful pipeline that produced garbage scores because the criteria weren't calibrated. I should have started with "score these 100 suppliers by hand, see what patterns emerge, codify them" — and only then built the pipeline. The agent isn't the product. The rubric is.

Sending before DKIM was set up

Early outreach went to spam folders because my mail-from domain wasn't DKIM-signed. I assumed it was a content problem and tried to rewrite the emails. It wasn't. It was an authentication problem. Set up DKIM/SPF/DMARC before you send a single email.

Trusting the LLM to parse certifications

Certification names are ambiguous across languages ("ISO 9001" vs "ISO neun tausend eins"). Asking an LLM to normalize them worked 90% of the time, which sounds good until the qualification agent drops 10% of otherwise-good suppliers. I replaced it with a deterministic dictionary lookup and a fallback LLM call only when the deterministic path failed.

The lesson

Multi-agent systems sound clever. In practice, the intelligence lives in the data model and the scoring criteria, not in the agent coordination. Spend 10× more time on the qualification rubric than on the prompts. Spend another 10× on the validation layers that catch LLM mistakes before they cascade downstream. The agents are cheap. The rubric is the moat.

Visit bridgesourcing.co →

Read more