DataFuse — Matthew McGuire

Stack

Frontend Next.js (App Router), TypeScript, Tailwind CSS

Database Neon PostgreSQL, Prisma ORM

Cache Upstash Redis

AI Anthropic Claude API

Ad APIs Meta Marketing API, Google Ads API, TikTok Ads API

Platform Shopify App Bridge, Admin API, Billing API

CRM Klaviyo API

Email Resend

Infrastructure Vercel

My Role

Product Design

Designed the full UI — dashboard layouts, data visualisation components, and the drill-down interaction model across seven analytics panels.

Full-Stack Development

Built the entire application in Next.js with TypeScript — 126 API routes, 44 lib files, from the Shopify OAuth install flow to the real-time P&L dashboard.

Data Pipeline & Fee Engine

Engineered the order reconciliation system, including a multi-processor fee engine covering six payment providers and versioned COGS history for accurate historical margin calculations.

Attribution System

Built the first-party pixel, server-side session stitching, and the multi-touch attribution engine that applies five models across the full conversion path on first-party data.

Ad Platform Integrations

Integrated Meta, Google, and TikTok ad APIs — including timezone normalization for accurate day-level ROAS joins between ad platform data and Shopify order data.

AI Layer

Engineered the daily briefing, CFO scenario modeler, and natural-language concierge on Claude, with prompt caching to keep per-merchant inference costs low at scale.

About This Project

A real-time profit intelligence platform for Shopify stores — because knowing your revenue means nothing if you don't know what you actually kept. DataFuse is a full-stack Shopify SaaS I built from scratch that reconciles ad spend, platform fees, COGS, and refunds against live order data to show merchants their true net profit. The headline metric is Profit ROAS — return on ad spend calculated on margin, not revenue — because a 4x ROAS on a 20% margin product is often a money-losing campaign in disguise.

The Problem

Most Shopify merchants are flying blind on profitability. They log into Shopify and see revenue. They log into Meta and see ROAS. They check their bank account and wonder where the money went. The gap between those three numbers is COGS, transaction fees across multiple payment processors, ad spend, refunds, and shipping — none of which Shopify's native analytics surfaces in one place.

Existing tools that solve this either cost $200–$500/month (Triple Whale, Northbeam) or don't go deep enough on the ad data. DataFuse sits in between: a full profit stack for stores spending $1k–$15k/month on ads, at $29–$149/month.

How It Works

A merchant installs the app, connects their ad accounts (Meta, Google, TikTok), and enters their COGS — either manually per product or via CSV import. From that point forward, DataFuse pulls live Shopify orders, reconciles them against a fee engine that handles Shopify Payments, Stripe, PayPal, Square, Klarna, and Afterpay, then layers in ad spend from the platform APIs to produce a real-time P&L. Every number drills down: revenue by channel, profit by product, ROAS by ad creative.

The attribution layer is first-party. A lightweight pixel on the storefront captures fbclid, gclid, ttclid, and UTM parameters at the session level, stitches them server-side to Shopify orders, and feeds five attribution models — first-touch, last-touch, linear, time-decay, and position-based — so merchants aren't solely relying on Meta and Google's self-reported numbers post-iOS 14.

On top of the data layer sits an AI layer: a daily briefing that summarizes what moved and why, a CFO-style scenario modeler, and a natural-language concierge that answers questions like "which ad creative drove the most profit last week?" against the store's own data.

Under the Hood

The core data challenge was reconciliation correctness at the order level. Orders arrive via Shopify webhooks, get written to Neon PostgreSQL through Prisma, and immediately trigger a fee calculation pass. The fee engine handles six payment processors — each with different rate structures (flat, percentage-plus-fixed, or volume-tiered). Fees are materialized per order at write time rather than computed at query time, so historical P&L stays stable even when rate structures change. COGS are stored with full version history tied to a date range, so margin calculations reflect what the merchant actually paid at the time of sale rather than today's current cost.

Ad data syncs on a scheduled cron: Meta Marketing API at the ad level (campaign → ad set → ad, per day), Google Ads and TikTok Ads at the campaign level. The Meta sync was the hardest to get right — the Insights API returns data in the account's marketing timezone while Shopify orders are in UTC, and the join between them has to normalize both into a common timezone before aggregating, or you get phantom day-boundary mismatches that silently corrupt ROAS numbers.

The first-party attribution pixel captures fbclid, gclid, ttclid, and full UTM parameters on page load and posts them to a lightweight pixel endpoint. Server-side, each hit creates a session record keyed to a browser fingerprint. When a Shopify orders/create webhook fires, the attribution engine queries open sessions for that fingerprint, ranks touchpoints by recency and source priority, and writes a resolved attribution record. A second pass then applies all five multi-touch models, distributing fractional credit across the full conversion path. This matters because Meta's own reported ROAS and pixel-attributed ROAS often diverge by 20–40% post-iOS 14.

The AI layer is built on the Claude API with prompt caching enabled on the system prompt and static grounding context, which cuts per-briefing token costs significantly. Each briefing prompt is constructed dynamically from a fresh query window of the merchant's actual KPI data — not cached summaries — and returns structured JSON: KPI deltas with direction and magnitude, anomaly flags with likely causes, and prioritised action items. The concierge Q&A endpoint translates natural-language questions into parameterised database queries at runtime and grounds Claude's response in the live query result rather than letting it hallucinate figures.

Upstash Redis sits in front of all public-facing endpoints enforcing sliding-window rate limits, and caches hot dashboard aggregations with short TTLs so the UI stays fast on stores with 50k+ orders without hammering Postgres on every page load. The entire stack runs on Vercel serverless with Neon's connection pooling — compute and database both scale to zero when idle, keeping infrastructure cost proportional to actual usage.

Platform Features

P&L & Profitability

Real-time net profit after COGS, ad spend, platform fees, refunds, and shipping
Fee engine covering Shopify Payments, Stripe, PayPal, Square, Klarna, and Afterpay
COGS management via manual entry or CSV import, with full version history
P&L breakdown by product, channel, and order
Revenue heatmap and operations dashboard

Attribution & Ad Analytics

First-party pixel capturing fbclid, gclid, ttclid, and UTM parameters
Five multi-touch attribution models: first-touch, last-touch, linear, time-decay, position-based
Facebook ad sync at the ad level (campaign → ad set → ad, per day)
Google and TikTok sync at the campaign level
Creative-level ROAS matched via utm_content
Creative fatigue detection
Survey-based post-purchase attribution

Customer Intelligence

RFM segmentation: VIP, Loyal, At-Risk, Lost, and New customer buckets
LTV by acquisition channel
Funnel analytics and cart abandonment tracking
Klaviyo integration for segmented email flows
Automated win-back and lifecycle email campaigns

Inventory

Sell-through velocity calculation per variant
Projected stockout date alerts
Dead stock identification

AI Layer

Daily AI briefing — KPI movements, anomaly flags, and prioritised actions
CFO Dashboard with scenario modeling (e.g. "what happens to margin if COGS rise 10%?")
AI Concierge — natural-language Q&A grounded in live store data
Per-panel AI insights surfaced inline across the dashboard
Weekly briefing email delivered to the merchant's inbox

Alerts & Automation

20+ configurable alert types: margin drops, ROAS degradation, stockouts, LTV shifts, and more
Action feed with seven event types surfacing the highest-priority signals
Influencer tracking

Multi-Store

Isolated data pipelines per store under a single merchant account
Up to 25 connected stores on the Agency tier