AI Customer Support System

Project Overview

A B2B SaaS company with 120,000 active users was drowning in tier-1 support tickets — password resets, billing queries, onboarding confusion. Response times averaged 18 hours. Customer satisfaction was falling.

They needed an intelligent triage and resolution engine that could handle the majority of requests without human intervention, while routing genuinely complex issues to senior agents with full context.

The Challenge

The core engineering challenge was threefold:

Intent accuracy at scale — A generic LLM would hallucinate or give incorrect account-specific answers. We needed tight retrieval-augmented generation over private documentation, knowledge bases, and live account data.
Zero trust on PII — Customer data could not leave the client's VPC. All model calls had to be proxied through an on-prem inference layer.
Graceful escalation — The system had to know precisely when it couldn't answer, and hand off to a human agent with a complete, structured context packet.

Architecture

We designed a three-layer resolution pipeline:

Layer 1 — Intent Classification: Fine-tuned a distilled BERT model on 50k labeled historical tickets to classify intent into 38 categories with 96% accuracy.

Layer 2 — RAG Resolution Engine: Built a vector retrieval system over the client's documentation (Confluence, Zendesk macros, API docs) using pgvector and a GPT-4 API proxy. Responses were grounded strictly in retrieved context with citation links.

Layer 3 — Escalation Router: A confidence-threshold + rule-based system that detects when the AI is uncertain and constructs a structured handoff packet (user history, ticket category, retrieved context, AI's attempted answer) for human agents.

Implementation

Inference: Azure OpenAI on a private endpoint, proxied through a FastAPI gateway
Vector Store: PostgreSQL + pgvector, refreshed nightly from Confluence and Zendesk
Ticket Ingestion: Webhooks from Zendesk, normalised into a canonical schema
Agent Interface: Custom React dashboard showing AI decisions, confidence scores, and one-click approvals
Monitoring: Prometheus + Grafana for latency, accuracy, and escalation-rate dashboards

Results

Within 90 days of deployment:

80% of tier-1 tickets resolved autonomously, with no human touchpoint
Average first-response time dropped from 18 hours to 4 minutes
Support team headcount held flat despite 40% user growth
Customer CSAT improved from 3.2 to 4.7 / 5