Enterprise AI Analytics Platform
Internal analytics platform for a Fortune 500 retailer, using GenAI to surface insights from 10TB of sales and inventory data via natural language queries.
Project Overview
A Fortune 500 retailer had 10 years of transaction, inventory, and supplier data spread across 14 internal systems. Analysts spent 60% of their time extracting and wrangling data before they could start any actual analysis. Executives couldn't get answers to strategic questions without waiting 2–3 weeks for an analyst sprint.
The company needed to democratise data access — giving every product manager, merchant, and executive the ability to ask complex business questions in plain English and get accurate, sourced answers instantly.
The Challenge
Text-to-SQL sounds simple. In practice, enterprise data is messy:
- 14 source systems with inconsistent schemas, naming conventions, and update cadences
- Business logic encoded in Excel sheets and analyst tribal knowledge (e.g., "GMV excludes marketplace returns before 2022")
- Hallucinated SQL would silently return wrong numbers — catastrophic in a business context
Architecture
Data Layer: We built a unified semantic layer using dbt that harmonised all 14 data sources into a single, documented schema with business-logic encoded in metric definitions. Every column had a plain-English description for the LLM context.
Text-to-SQL Engine: Rather than using a generic LLM, we built a structured prompting pipeline:
- Table and column retrieval via semantic search over the dbt documentation
- SQL generation using GPT-4 with strict schema grounding and few-shot examples
- SQL validation — every generated query was validated against the schema before execution
- Result interpretation — a separate LLM pass narrated the results in plain English
Trust & Transparency: Every answer displayed the generated SQL, the tables used, and a confidence indicator. Analysts could inspect and override any AI-generated query.
Implementation
- Semantic Layer: dbt Cloud + Snowflake
- LLM: GPT-4 via Azure OpenAI (VPC deployment)
- Vector Store: Pinecone for documentation retrieval
- Frontend: Next.js + Recharts for interactive visualisations
- Auth: Okta SSO with row-level security inherited from Snowflake
Results
- 60% reduction in analyst time spent on data extraction
- Non-technical executives can answer their own questions in under 30 seconds
- Zero data hallucination incidents in 6 months (validated via weekly analyst audits)
- Rolled out to 1,200 internal users across 8 business units