tharensol
Enterprise AI Analytics Platform
All Projects
Enterprise2025completed

Enterprise AI Analytics Platform

Internal analytics platform for a Fortune 500 retailer, using GenAI to surface insights from 10TB of sales and inventory data via natural language queries.

60%
Analyst Time Saved
1,200
Internal Users
<30 sec
Query Response Time

Project Overview

A Fortune 500 retailer had 10 years of transaction, inventory, and supplier data spread across 14 internal systems. Analysts spent 60% of their time extracting and wrangling data before they could start any actual analysis. Executives couldn't get answers to strategic questions without waiting 2–3 weeks for an analyst sprint.

The company needed to democratise data access — giving every product manager, merchant, and executive the ability to ask complex business questions in plain English and get accurate, sourced answers instantly.

The Challenge

Text-to-SQL sounds simple. In practice, enterprise data is messy:

  • 14 source systems with inconsistent schemas, naming conventions, and update cadences
  • Business logic encoded in Excel sheets and analyst tribal knowledge (e.g., "GMV excludes marketplace returns before 2022")
  • Hallucinated SQL would silently return wrong numbers — catastrophic in a business context

Architecture

Data Layer: We built a unified semantic layer using dbt that harmonised all 14 data sources into a single, documented schema with business-logic encoded in metric definitions. Every column had a plain-English description for the LLM context.

Text-to-SQL Engine: Rather than using a generic LLM, we built a structured prompting pipeline:

  1. Table and column retrieval via semantic search over the dbt documentation
  2. SQL generation using GPT-4 with strict schema grounding and few-shot examples
  3. SQL validation — every generated query was validated against the schema before execution
  4. Result interpretation — a separate LLM pass narrated the results in plain English

Trust & Transparency: Every answer displayed the generated SQL, the tables used, and a confidence indicator. Analysts could inspect and override any AI-generated query.

Implementation

  • Semantic Layer: dbt Cloud + Snowflake
  • LLM: GPT-4 via Azure OpenAI (VPC deployment)
  • Vector Store: Pinecone for documentation retrieval
  • Frontend: Next.js + Recharts for interactive visualisations
  • Auth: Okta SSO with row-level security inherited from Snowflake

Results

  • 60% reduction in analyst time spent on data extraction
  • Non-technical executives can answer their own questions in under 30 seconds
  • Zero data hallucination incidents in 6 months (validated via weekly analyst audits)
  • Rolled out to 1,200 internal users across 8 business units

Technologies Used

PythonGPT-4dbtSnowflakeNext.jsPinecone