Live Case study · 2024 → present · Mr. Donut, Egypt

RetailOS — AI-first retail operations for MENA

I built an AI-first retail SaaS with a paying customer in Egypt. Forecasting, inventory, POS sync. The interesting part isn't the AI — it's the boring plumbing that kept the tenant paying month after month when the forecasting was wrong and the POS sync hiccupped.

The problem

Independent retailers across MENA run on a stack that looks like this: a POS terminal from 2014, an Excel file for stock, a WhatsApp group for orders, and a human brain for forecasting. The tooling that exists for them is either (a) built for enterprise with a price tag to match, or (b) a generic POS that doesn't speak to anything else.

Mr. Donut, a retail chain in Egypt, had three specific pains:

The path to paid was: stop them from guessing. Give them a forecast for the next 7 days per SKU per location. Show them where the margin is leaking. Do it without replacing their POS.

Architecture

The stack had three constraints that drove every decision: realtime sync (the tenant wanted today's data today), cheap LLM usage (MENA margins are thin), and single-operator maintainability (I ship it, I run it, I can't page someone).

POS terminal per location Sync worker Node · cron Postgres Supabase Forecast service FastAPI · cached LLM router Haiku → 4o-mini Next.js dashboard realtime Edge function auth · tenant Clarity / GA4 observability
Fig 1 · RetailOS data flow · POS → sync worker → Postgres → forecast service → LLM router → dashboard

A few architectural calls that mattered more than I expected:

The stack

Next.js 15React 19TypeScriptSupabasePostgresFastAPIPythonClaude HaikuGPT-4o-miniVercelTailwind

By the numbers

1
Paying tenant (Mr. Donut)
~4%
Forecast MAPE
218ms
POS sync p50
−28%
LLM cost vs baseline

The MAPE (mean absolute percentage error) of ~4% on 7-day forecasts is higher than anything I've seen published for a multi-SKU retail setting of this size. Part of it is that the domain is easier than e.g. fashion retail: donuts have a consistent weekly seasonality and a short shelf life that forces fast adjustment. Part of it is a boring baseline: we start with a naive weekly average and only let the model adjust within bounds.

What I got wrong

Over-engineering the forecast

I spent the first month trying to use a proper time-series model (Prophet, then a custom ARIMA wrapper). Both outperformed the naive weekly average by ~0.3 percentage points of MAPE. For the effort, I should have just shipped the naive baseline, gotten live data flowing, then iterated. The tenant didn't care about MAPE. They cared about was today's order right.

Ignoring the POS driver rabbit hole

The first POS I integrated with had a quirky local API that dropped silent write failures. My sync worker assumed success unless it got an error. For two weeks, the tenant saw yesterday's numbers instead of today's and didn't tell me because they thought that's how it worked. I fixed it by adding an end-to-end checksum on the sync loop: hash the row count and last-updated-at on both sides, alert if they diverge. Should have been there from day one.

Realtime everywhere

I built every table with realtime subscriptions because it felt cool. Then I realized some tables (daily summary rollups) don't need realtime — they change once a day. The overhead of maintaining realtime channels for low-write tables was non-trivial. I now default to no realtime unless proven necessary.

The lesson

The thing that kept Mr. Donut paying wasn't the forecast. It was the checksum, the realtime dashboard, and the fact that when something broke I had observability to tell me before they did. In SaaS, the features you sell are not the features that retain. The features that retain are the ones the user never consciously notices — until they're missing.

If I'd spent less time tuning the forecast and more time on the boring stuff, I would've shipped two months earlier.

Visit retailos.one →

More case studies