Skip to main
omar.nagy
RetailOS — AI-first retail SaaS dashboard, Arabic-first hero, "-28%" LLM cost metric.
Live · paying customer·case study·published 2026-04-29

RetailOS

RetailOS is a multi-tenant POS, inventory, and ordering SaaS for Egyptian SMBs. Two tenants live: Mr. Donut paid ج.م500/mo for a year (currently non-operational), and Lolies — a kids-clothing shop in the Nile Delta — onboarded Apr 27. The interesting part isn’t the AI. It’s the boring plumbing — checksums, RLS, audit trails — that kept Mr. Donut paying through the months when the forecasting was wrong and the POS sync hiccupped.

2
Tenants live
18
PRs Apr 21–24
218ms
POS sync p50
−28%
LLM cost vs baseline

The problem

Independent retailers across MENA run on a stack that looks like this: a POS terminal from 2014, an Excel file for stock, a WhatsApp group for orders, and a human brain for forecasting. The tooling that exists for them is either built for enterprise with a price tag to match, or a generic POS that doesn’t speak to anything else. None of it speaks Arabic right.

Mr. Donut, a donut chain in Egypt, had three concrete pains when they signed:

  • Stockouts and dead stock at the same time. Different SKUs, same week. The team was guessing wrong in both directions.
  • No visibility across locations. HQ didn’t know what was selling where until end-of-month manual reports.
  • Daily decisions needed tomorrow’s data. They knew what happened three weeks late, which is the same as not knowing.

Tenants — the honest version

A case study shouldn’t round numbers up, so here’s the receipt:

  • Mr. Donut. Paid ج.м500/mo (~$10/mo) Apr 2025 → Apr 2026. Manual billing — I invoice them every month, they pay every month, no failed cards because there are no cards. Currently non-operational on their side (their business pause, not a churn). They own a year of data on the platform and a fully functioning POS waiting for them to come back.
  • Lolies. Kids-clothing shop, Nile Delta. Onboarded Apr 27, 2026 on the fashion_retail vertical template — 14% VAT, EGP, 5 kid categories seeded, two owner accounts provisioned. First real test of the new invite-only onboarding flow shipped in the Apr 22 sprint.
  • El-Arish pharmacy chain. 4 branches, kickoff pending on the WhatsApp Cloud API decision (portal OTP gating).

Two paying-history tenants, one in onboarding, one in pilot waiting. Billing stays manual until tenant #4. Paymob/Stripe is a problem for when the volume earns it.

Architecture

Three constraints drove every decision: realtime sync (the tenant wanted today’s data today), cheap LLM usage (MENA margins are thin), and single-operator maintainability (I ship it, I run it, I can’t page someone).

  • Supabase realtime instead of polling. The dashboard feels live because it is live — no “pull every 30s” hack. Edge functions subscribe to row changes and push deltas.
  • Forecast service as a separate FastAPI process. Python’s stats ecosystem is better than Node’s for time-series work. Keeping it out of the main Next.js app meant I could iterate on the model without redeploying the UI.
  • Cache-first LLM usage. The LLM isn’t in the hot path. Summaries, daily digests, and explanations are generated once and cached in Postgres. Users almost never pay latency for a model call. Result of the same disciplines documented in the −28% cost essay.
  • Multi-tenant from day one even with one tenant. Row-level security on every query, scoped through a business_staff join. Every new table needs business_id + an RLS policy or it doesn’t merge. Cost me two days up front, saved me weeks when Lolies arrived.
  • 9 vertical templates. food_delivery, cafe_bar, fashion_retail, grocery, pharmacy, electronics, beauty_salon, services, generic_retail. Each ships its own default tax rate, default categories, and feature flags. Onboarding a new tenant is a 3- click invite + a vertical pick.

What kept Mr. Donut paying for a year

The thing that retains a tenant is rarely the thing that sold them. Mr. Donut signed for the forecast and the dashboard. They stayed because:

  • End-to-end checksum on the POS sync loop. Hash row-count + last-updated-at on both sides; alert if they diverge. After two weeks of silent drift in the early days (see the WrongBlock below), this became non-negotiable.
  • Bilingual daily-close. Cash drawer reconciliation in Arabic for staff, English for HQ. Same data, two languages, no translation step.
  • Per-tenant health dashboard. /platform/tenants/[slug]/health — 30-day revenue series, per-staff last login, stuck orders. When something felt off, I had observability before they did.
  • Audit trail on every state-changing action. Returns, refunds, tax overrides, gift redemptions — all written through SECURITY DEFINER RPCs that re-fetch prices server-side. Staff disputes resolve from the log, not from memory.

Apr 21–24 sprint — 18 PRs in 4 days

A case study should also show the build cadence. Across four days in late April, I shipped 18 PRs and 17 Postgres migrations through three sprints (audit hardening, Phase 6, Phase 7):

  • Customer portal login. Phone + OTP, per-tenant HttpOnly sessions, WhatsApp Cloud API delivery with a dev-log fallback so I can demo without burning a real OTP.
  • Returns / RMA flow. order_returns + order_return_items + 3 SECURITY DEFINER RPCs. Customer-initiated and staff-initiated paths both wired.
  • Per-product tax override. products.tax_rate nullable, falls through to business default. Audit row written on every order.
  • Driver photo fallback for proof-of-delivery. Storage bucket + RLS + camera button — when the driver’s OTP-keypad doesn’t round-trip on bad internet, the photo holds the proof.
  • Bilingual daily-close + per-tenant health drilldown. The observability layer that quietly retains tenants.

Post-launch hardening, not feature creep. None of this was on the brochure when Mr. Donut signed. All of it is what kept the platform shippable when Lolies arrived.

What I got wrong

Over-engineering the forecast

I spent the first month trying to use a proper time-series model (Prophet, then a custom ARIMA wrapper). Both outperformed the naive weekly average by ~0.3 percentage points of MAPE. For the effort, I should have just shipped the naive baseline, gotten live data flowing, then iterated. The tenant didn’t care about MAPE. They cared about was today’s order right.

Ignoring the POS driver rabbit hole

The first POS I integrated with had a quirky local API that dropped silent write failures. My sync worker assumed success unless it got an error. For two weeks the tenant saw yesterday’s numbers instead of today’s and didn’t tell me — they thought that’s how it worked. I fixed it by adding an end-to-end checksum on the sync loop: hash the row count and last-updated-at on both sides, alert if they diverge. Should have been there from day one.

Realtime everywhere

I built every table with realtime subscriptions because it felt cool. Then I realized some tables (daily summary rollups) don’t need realtime — they change once a day. I now default to no realtime unless proven necessary.

Calling Mr. Donut “Live” without the asterisk

For a long stretch the homepage said “Live — Paying Customer” without surfacing that the tenant was non-operational on their end. Defensible — they were paying on a real contract — but a careful buyer who pulled the storefront found a gap. Now I lead with the receipt: ج.م500/mo for a year, currently paused on their side, Lolies live as the second tenant. Trust over hype.

The lesson

The thing that kept Mr. Donut paying wasn’t the forecast. It was the checksum, the realtime dashboard, and the fact that when something broke I had observability to tell me before they did. In SaaS, the features you sell are not the features that retain. The features that retain are the ones the user never consciously notices — until they’re missing.

What’s next

Unstick Lolies (the owner is logged in but hasn’t added her first product — human onboarding gap, not a code gap), close the WhatsApp Cloud API decision so the El-Arish pharmacy pilot can move, and ship the customer-side returns UI on top of the RPC that already works. Manual billing holds until tenant #4 asks for Paymob.

// stack

  • Next.js 15
  • Supabase
  • FastAPI
  • Claude Haiku
  • GPT-4o-mini

// next case study

Want this kind of build for your business? Book the Audit Sprint — $1,500 or email omar@neurascale.org.