Case Study: How We Built a Real-Time ERP to Shopify Sync Engine

March 8, 2026

Integration Architecture

A client needed their legacy ERP to talk to Shopify in real time — products, inventory, orders, the lot. Here's how I built a zero-dependency Python sync engine that handles 2,300+ SKUs, matrix variants, and 136 smart collections.

March 2026 12 min read Python · Shopify · REST · GraphQL

The Brief

A specialty retailer with 2,300+ products across multiple brands was moving to Shopify. Their product catalog, pricing, and inventory lived in a legacy ERP system (cloud-hosted, REST API, bearer-token auth). Shopify would handle the storefront and checkout. Orders needed to flow back into the ERP for fulfillment.

The requirements I was handed boiled down to three real-time sync flows:

Products + Images + Pricing (ERP → Shopify) — full initial load, then incremental via event polling
Inventory (ERP → Shopify) — stock movements reflected within minutes
Orders (Shopify → ERP) — paid orders pushed as sales order requests

2,315

Products Synced

9,800+

Variants

136

Smart Collections

External Dependencies

Architecture Decisions

Zero Dependencies, Maximum Portability

I chose Python stdlib only — no pip, no virtualenvs, no package conflicts. The entire codebase uses urllib.request, json, and http.server. This might sound like masochism, but when you're deploying to a client's infrastructure where you can't guarantee anything beyond a Python 3.8+ install, zero dependencies is a feature.

The tradeoff: I wrote my own rate limiter, retry logic, and token refresh. But these are small, self-contained modules that I fully control.

Modular Sync Engine

Each sync flow is a separate module with its own entry point. State is persisted in JSON files with atomic writes (write to temp, rename). This means any sync can be interrupted and resumed without corruption.

The Hard Problems

1. Matrix Products → Shopify Variants

The ERP uses a "matrix" model for product variants. A t-shirt isn't one product with size/colour options — it's a parent product with a matrix of child products defined by coordinate axes. Each intersection of axes (Size=M, Colour=Navy) is a separate product code with its own SKU, barcode, price, and stock level.

Shopify, by contrast, uses a flat variant model: one product, up to three options, and variants are combinations of those options.

The mapping logic I implemented:

ERP CoordAxis 1 → Shopify option1 (e.g., Size)
ERP CoordAxis 2 → Shopify option2 (e.g., Colour)
ERP CoordAxis 3 → Shopify option3 (rare, but supported)
Each matrix entry → one Shopify variant with SKU = MatrixCode

Key insight: The ERP's matrix information endpoint returns stock levels per variant (QtyInStock, ReservedOut). I compute available stock as QtyInStock - ReservedOut and use Shopify's absolute inventory_levels/set endpoint instead of adjust. This eliminates drift — every sync sets the truth, rather than trying to track deltas that can compound errors over time.

2. Shopify's 500 Errors on Large Products

Products with 30+ high-resolution images would intermittently return HTTP 500 from Shopify's REST API during creation. The product would be partially created — some images uploaded, some not — leaving the store in an inconsistent state.

My solution: deferred image sync. Create the product with basic data first (title, description, variants, prices), save the Shopify product ID, then sync images in a separate pass. If images fail, the product still exists and is sellable. Retry logic handles transient failures.

# Simplified flow
product_id = shopify.create_product(base_data)   # No images
state.save(code, product_id)                       # Persist mapping
for img in images:                                 # Separate pass
    try:
        shopify.upload_image(product_id, img)
    except Exception:
        log.warning("Image failed, will retry next sync")

3. 136 Empty Smart Collections

The store had 136 smart collections defined with tag-based rules (e.g., a "Charcoal BBQs" collection auto-includes any product tagged charcoal-bbqs). After the initial product sync, every single collection was empty. Products existed, but they lacked the right tags.

The challenge: the ERP organizes products into a hierarchical tree of 591 web categories. Shopify's smart collections expect flat tags. I needed to map between two completely different taxonomies.

My approach was multi-layered:

Direct slug matching — Slugify each ERP category name and check if it matches a smart collection handle. This caught 49 of 136 collections automatically.
Fuzzy matching with domain synonyms — The ERP calls them "Barbecues", Shopify calls them "BBQs". I built a synonym table and matched another 40+ collections.
Brand-based overrides — Some collections are brand-specific (e.g., "Blues Hog BBQ"). I mapped 24 brand names to their corresponding collection tags.
Ancestor expansion — A product in "Portable Charcoal Barbecues" (child category) should also appear in "Charcoal BBQs" (parent). I walk up the category tree and apply tags from all ancestors.

The result: a 121-entry category-to-tag mapping table and 24 brand overrides, applied during product sync. A batch retag script updated all 2,315 existing products in one pass.

Result: 2,060 products tagged, all 136 smart collections populated. The retag script is idempotent — it uses set union (existing tags + new tags), only writes when there's a diff, and never removes tags. Safe to re-run at any time.

4. HTML Body → Structured Metafields

The ERP stores product descriptions as monolithic HTML blobs. A typical product might have sections for Features, Materials, Size & Fit, and Care Instructions — all mashed together in one <p> tag soup.

Shopify's modern themes expect structured data in metafields. I needed to parse the HTML, identify section boundaries (usually <strong>Section Name:</strong> patterns), extract the content, and map it to the correct metafield namespace and key.

# Section mapping
_SECTION_MAP = {
    "size":              {"namespace": "custom", "key": "size_fit"},
    "size & fit":        {"namespace": "custom", "key": "size_fit"},
    "features":          {"namespace": "custom", "key": "features"},
    "materials":         {"namespace": "custom", "key": "material"},
    "care instructions": {"namespace": "custom", "key": "care_instructions"},
}

The parser handles variations in heading formatting (bold, uppercase, with/without colons), strips the matched sections from the body HTML, and what's left becomes the main product description. Clean separation of concerns, zero manual data entry.

Event-Driven Incremental Sync

After the initial full sync, I don't re-sync everything. The ERP exposes a /productevents endpoint that returns timestamped events: CREATE, UPDATE, PRICE, IMAGE, WEBINFO, and MOVEMENT.

My incremental sync polls this endpoint from the last-known timestamp, groups events by product code, and applies the minimum necessary updates:

CREATE or UPDATE → full product re-sync
PRICE → variant price update only
IMAGE → image re-sync only
MOVEMENT → inventory level update only

This keeps API calls to a minimum. A typical incremental run touches 5–20 products instead of 2,300.

The Role of AI in Development

I built this entire integration with AI-assisted development, using Claude as a pair programmer. Not as a code generator that produces boilerplate you then debug for hours — but as an architectural collaborator that understands the problem domain.

Where AI Added Real Value

API exploration — The ERP's documentation was sparse. Claude helped me reverse-engineer API responses, identify undocumented fields (like Name vs CategoryName), and build client methods from observed behavior.
Category mapping at scale — Mapping 591 ERP categories to 136 Shopify collection tags manually would have taken me hours. AI handled the slug matching, fuzzy matching, and synonym detection in minutes, producing a mapping table that needed minimal manual review.
Edge case discovery — "What happens when a matrix product has only one axis?" "What if the image CDN URL is missing the CloudFront domain?" AI caught edge cases that would have surfaced as production bugs.
Debugging live data — When products showed "Size: One Size" in the body instead of the metafield, Claude traced the issue to a section heading parser that only matched "Size & Fit" but not plain "Size" — and verified only 1 of 2,300 products was affected before I deployed the fix.

Honest take: AI doesn't eliminate the need to understand your systems deeply. It accelerates the work, but I still validate every mapping, test every edge case, and understand why a Shopify API call returns 500 on the 31st image. The leverage comes from spending less time on boilerplate and more time on the problems that actually matter.

Lessons for CTOs

1. Treat the ERP as Source of Truth — Always

Products, prices, and inventory flow one way: ERP → Shopify. Orders flow the other way. Never let Shopify edits override ERP data. This eliminates an entire class of sync conflicts.

2. Absolute Beats Relative for Inventory

Use inventory_levels/set (absolute) instead of adjust (relative). Relative adjustments compound errors. If a sync fails midway through, absolute values self-correct on the next run. Relative values drift forever.

3. Idempotency Is Not Optional

Every sync operation must be safe to re-run. My retag script processes 2,315 products. If it crashes at product 1,800 (and it did — the ERP API timed out), you just restart it. No duplicate tags, no missing data, no manual cleanup.

4. Separate Concerns, Even in Sync

Don't create a product with images, metafields, and inventory in one API call. Create the product first, then layer on additional data. Each step can fail independently and retry independently. One 500 error on an image upload shouldn't block the entire product from being created.

5. Zero Dependencies Has a Real ROI

When your integration runs on a client's server, every dependency is a liability. Python stdlib covers HTTP, JSON, file I/O, logging, and scheduling. You trade some developer convenience for deployment certainty. For client-facing integrations, that trade is almost always worth it.

The Numbers

2,315

Products Synced

~9,800

Variants Created

136

Collections Populated

121

Category Mappings

Sync Flows

pip install

The integration runs on a scheduled cron job. Incremental syncs complete in under 30 seconds. Full inventory reconciliation runs nightly. Orders sync within minutes of payment confirmation.

From project kickoff to live products in Shopify: under a week. That's the leverage of AI-assisted development combined with clear architectural decisions.

Stack

Language: Python 3.10+ (stdlib only)
ERP API: REST with bearer token auth (3-day expiry, auto-refresh)
Shopify: REST Admin API 2024-10 (products, variants, inventory) + GraphQL Admin API (collections, navigation, publishing)
State: JSON files with atomic writes
Scheduling: cron / n8n
AI: Claude (architecture, API exploration, category mapping, debugging)

NEED AN ERP INTEGRATION? LET'S TALK →