Live Case study · Multi-agent · EU ↔ Egypt

Bridge Sourcing — multi-agent B2B sourcing across the EU tariff lane

Egypt has a bilateral free-trade agreement with the EU. For many categories the import tariff is 0%. Most EU buyers don't know this, and the ones who do can't find suppliers because Egyptian B2B is offline. Bridge Sourcing is a multi-agent pipeline that finds, qualifies, and opens a conversation with the right suppliers — automatically. The hardest part wasn't the agents. It was the qualification rubric.

The problem

A German buyer looking for a textile supplier has three options:

Alibaba — fast but mostly Chinese suppliers, high tariffs, low margin advantage.
Trade shows — slow, expensive, limited coverage, pandemic-fragile.
Sourcing agents — human intermediaries charging 10–20% commission, slow response times, opaque sourcing process.

Egypt has real manufacturing capacity in textiles, processed foods, chemicals, and construction materials. It has the 0% EU tariff. What it doesn't have is a well-indexed supplier directory. Most Egyptian B2B websites are static HTML from the early 2010s, Arabic-only, and don't list any machine-readable product data.

Bridge Sourcing closes that gap. The buyer describes what they need ("200k knit T-shirts, cotton 180gsm, OEKO-TEX certified, Germany delivery"). The pipeline finds 20–50 candidate suppliers, scores them against a qualification rubric, and drafts an outreach email that sounds like a human wrote it.

Architecture

Three agents, one state machine, one shared memory.

Fig 1 · Bridge Sourcing pipeline · discovery → qualification → outreach · with human approval and shared memory

Discovery agent

Playwright browses candidate directories and supplier sites. The extraction layer uses Claude Sonnet for structured data pulls (company name, product categories, certifications, contact info). Pure regex scraping fails on Arabic-language pages with inconsistent layouts; an LLM handles the layout variance gracefully. Output: a JSON record per supplier with ~20 fields.

Qualification agent — the real moat

This is where the system earns its keep. Every supplier gets scored on a weighted rubric. The LLM doesn't decide the weights — Wael and I did — but it fills in the data:

Signal	Weight	Source
Category match	25%	buyer brief vs supplier product list
Certifications match	20%	OEKO-TEX, ISO, FDA, etc.
Response reachability	15%	valid email, phone, WhatsApp verified
Evidence of real production	15%	factory photos, trade fair presence, LinkedIn signals
English/German capability	10%	site content, rep profiles
Volume match	10%	stated MOQ and capacity vs buyer need
Export experience	5%	mentions of EU customers, tariff awareness

A supplier scoring < 50 is discarded. 50–75 is flagged for manual review. 75+ goes straight to the outreach queue. The rubric isn't magic — it's the result of Wael and I sitting with 200 hand-scored suppliers and asking "which signals predicted whether they actually responded and delivered." The LLM just applies what we learned.

Outreach agent

Drafts first-touch emails in the buyer's language (German, English, French, sometimes Arabic). Every draft goes into a human-approval queue — we never send unreviewed mail. The agent's job is to get 80% of the way there; the reviewer's job is to catch the 20% of drafts that would embarrass the buyer.

The stack

PythonFastAPIPlaywrightPostgresClaude Sonnet 4.6GPT-4oSMTP + DKIMPydanticMnemonic (OSS)

By the numbers

82 → 96%

Scrape accuracy

~14s

Per-lead enrichment

DKIM ✓

Inbox-safe

EU + UK

Buyer coverage

The scrape accuracy jump from 82% to 96% was the single most impactful change. It meant the qualification agent stopped filtering out legitimate suppliers because of bad data in the upstream pipeline. I wrote a separate essay on the 8 specific changes that moved that number.

What I got wrong

Starting with the agents instead of the rubric

I spent a week building the multi-agent orchestration before writing the qualification rubric. The result was a beautiful pipeline that produced garbage scores because the criteria weren't calibrated. I should have started with "score these 100 suppliers by hand, see what patterns emerge, codify them" — and only then built the pipeline. The agent isn't the product. The rubric is.

Sending before DKIM was set up

Early outreach went to spam folders because my mail-from domain wasn't DKIM-signed. I assumed it was a content problem and tried to rewrite the emails. It wasn't. It was an authentication problem. Set up DKIM/SPF/DMARC before you send a single email.

Trusting the LLM to parse certifications

Certification names are ambiguous across languages ("ISO 9001" vs "ISO neun tausend eins"). Asking an LLM to normalize them worked 90% of the time, which sounds good until the qualification agent drops 10% of otherwise-good suppliers. I replaced it with a deterministic dictionary lookup and a fallback LLM call only when the deterministic path failed.

The lesson

Multi-agent systems sound clever. In practice, the intelligence lives in the data model and the scoring criteria, not in the agent coordination. Spend 10× more time on the qualification rubric than on the prompts. Spend another 10× on the validation layers that catch LLM mistakes before they cascade downstream. The agents are cheap. The rubric is the moat.

Visit bridgesourcing.co →

RetailOS — AI-first retail MedPrüf — exam prep for Vienna Essay: scrape accuracy 82→96% Lab: try Mnemonic live