Enterprise AI Analytics Platform

Project Overview

A Fortune 500 retailer had 10 years of transaction, inventory, and supplier data spread across 14 internal systems. Analysts spent 60% of their time extracting and wrangling data before they could start any actual analysis. Executives couldn't get answers to strategic questions without waiting 2–3 weeks for an analyst sprint.

The company needed to democratise data access — giving every product manager, merchant, and executive the ability to ask complex business questions in plain English and get accurate, sourced answers instantly.

The Challenge

Text-to-SQL sounds simple. In practice, enterprise data is messy:

14 source systems with inconsistent schemas, naming conventions, and update cadences
Business logic encoded in Excel sheets and analyst tribal knowledge (e.g., "GMV excludes marketplace returns before 2022")
Hallucinated SQL would silently return wrong numbers — catastrophic in a business context

Architecture

Data Layer: We built a unified semantic layer using dbt that harmonised all 14 data sources into a single, documented schema with business-logic encoded in metric definitions. Every column had a plain-English description for the LLM context.

Text-to-SQL Engine: Rather than using a generic LLM, we built a structured prompting pipeline:

Table and column retrieval via semantic search over the dbt documentation
SQL generation using GPT-4 with strict schema grounding and few-shot examples
SQL validation — every generated query was validated against the schema before execution
Result interpretation — a separate LLM pass narrated the results in plain English

Trust & Transparency: Every answer displayed the generated SQL, the tables used, and a confidence indicator. Analysts could inspect and override any AI-generated query.

Implementation

Semantic Layer: dbt Cloud + Snowflake
LLM: GPT-4 via Azure OpenAI (VPC deployment)
Vector Store: Pinecone for documentation retrieval
Frontend: Next.js + Recharts for interactive visualisations
Auth: Okta SSO with row-level security inherited from Snowflake

Results

60% reduction in analyst time spent on data extraction
Non-technical executives can answer their own questions in under 30 seconds
Zero data hallucination incidents in 6 months (validated via weekly analyst audits)
Rolled out to 1,200 internal users across 8 business units