Live·case study·published 2026-04-29

Bridge Sourcing

Bridge Sourcing connects EU buyers with Egyptian suppliers via Egypt’s 0% EU tariff lane. The product is a multi-agent pipeline: discovery, qualification, outreach. The engineering interesting bit is the scrape-accuracy curve — 82% to 96% over 3 months, by 8 specific changes that compound.

82→96%

Scrape accuracy

Agent stages

EU import tariff

Specific changes

The problem

Most EU buyers sourcing from MENA hit the same wall: supplier discovery is manual, qualification is gut-feel, and outreach hits dead Hotmail addresses from 2017. Meanwhile Egypt has a free-trade lane to the EU (0% tariff on most categories) that almost no one outside the country can navigate. The opportunity isn’t the AI; it’s the trade-route arbitrage. AI is just what makes the pipeline cheap enough to run continuously.

The pipeline — 3 agents

Discovery agent. Crawls LinkedIn + Hunter + targeted Google search for suppliers in a category. Outputs a normalised supplier row with company, contact, category, scale signal.
Qualification agent. Reads the supplier’s site + LinkedIn + shipping records via public manifests. Scores fit-to-buyer-spec on a 0–100 rubric with explainable reasoning.
Outreach agent. Drafts a personalised first email per supplier in the buyer’s voice, scheduled via Zoho with deliverability checks (DKIM verified Apr 2026). Never auto-sends — the buyer reviews + sends.

The 82→96% scrape accuracy curve

Day one, the discovery agent extracted supplier fields at 82% accuracy on a held-out test set. Three months later, 96%. The 14-percentage-point gain came from eight specific changes:

Switched extraction model from JSON-mode-flagship to JSON-mode-mini with a stricter schema. Smaller model, tighter prompt, +3pp.
Added a separate validation pass with a different model. The “does this look right?” second opinion catches hallucinations the extractor didn’t flag. +2pp.
Pre-cleaned HTML before extraction. Stripped nav, footer, ads. Removes 40% of input tokens and 70% of nuisance text. +2pp.
Per-field confidence scoring. Low-confidence fields get a retry with a different prompt before being marked “unknown” instead of guessed. +2pp.
Country-specific date and address parsers. Egyptian addresses don’t match Western patterns. Hand-rolled regex, not the model. +1pp.
Duplicate detection across discovery batches. The same supplier appearing twice with different field values used to silently corrupt scoring. Now it triggers a merge step. +1pp.
Field-level golden eval set. Built a 200-row hand-labeled set and ran every prompt change against it before deploying. Stops regressions cold. +2pp.
Stopped trusting the model on numeric fields. Years founded, employee count, revenue band — all parsed by hand from canonical fields, not inferred from prose. +1pp.

What I got wrong

Trusted the flagship model with no validation pass

For the first two months, I had a single GPT-4 extraction call producing the structured output. Hallucinations hit ~6% of rows. The fix wasn’t a better extractor — it was a second opinion. A different model on the same input, comparing outputs, flagging mismatches. That’s the pattern that took accuracy from 88 to 92.

Underbuilt the eval set

I had a 30-row test set for too long. Felt rigorous; was useless. Statistically, 30 rows can’t distinguish a 90% from an 88% extractor. Building the 200-row golden set took two days and unlocked all the meaningful prompt iteration after.

The actual lesson

Scrape accuracy is not an AI problem. It’s an evals problem, a HTML-cleaning problem, and a domain-specific-parser problem with an LLM in the middle. The AI is the cheapest part. The discipline around the AI is the moat.

// stack

Next.js
Multi-agent
Hunter
Zoho
GPT-4o

Visit bridgesourcing.co →

// next case study

Want this kind of build for your business? Book the Audit Sprint — $1,500 or email omar@neurascale.org.