AI Customer Support System
End-to-end AI support platform that handles 80% of tier-1 tickets autonomously, dramatically cutting response times and support costs.
Project Overview
A B2B SaaS company with 120,000 active users was drowning in tier-1 support tickets — password resets, billing queries, onboarding confusion. Response times averaged 18 hours. Customer satisfaction was falling.
They needed an intelligent triage and resolution engine that could handle the majority of requests without human intervention, while routing genuinely complex issues to senior agents with full context.
The Challenge
The core engineering challenge was threefold:
- Intent accuracy at scale — A generic LLM would hallucinate or give incorrect account-specific answers. We needed tight retrieval-augmented generation over private documentation, knowledge bases, and live account data.
- Zero trust on PII — Customer data could not leave the client's VPC. All model calls had to be proxied through an on-prem inference layer.
- Graceful escalation — The system had to know precisely when it couldn't answer, and hand off to a human agent with a complete, structured context packet.
Architecture
We designed a three-layer resolution pipeline:
Layer 1 — Intent Classification: Fine-tuned a distilled BERT model on 50k labeled historical tickets to classify intent into 38 categories with 96% accuracy.
Layer 2 — RAG Resolution Engine: Built a vector retrieval system over the client's documentation (Confluence, Zendesk macros, API docs) using pgvector and a GPT-4 API proxy. Responses were grounded strictly in retrieved context with citation links.
Layer 3 — Escalation Router: A confidence-threshold + rule-based system that detects when the AI is uncertain and constructs a structured handoff packet (user history, ticket category, retrieved context, AI's attempted answer) for human agents.
Implementation
- Inference: Azure OpenAI on a private endpoint, proxied through a FastAPI gateway
- Vector Store: PostgreSQL + pgvector, refreshed nightly from Confluence and Zendesk
- Ticket Ingestion: Webhooks from Zendesk, normalised into a canonical schema
- Agent Interface: Custom React dashboard showing AI decisions, confidence scores, and one-click approvals
- Monitoring: Prometheus + Grafana for latency, accuracy, and escalation-rate dashboards
Results
Within 90 days of deployment:
- 80% of tier-1 tickets resolved autonomously, with no human touchpoint
- Average first-response time dropped from 18 hours to 4 minutes
- Support team headcount held flat despite 40% user growth
- Customer CSAT improved from 3.2 to 4.7 / 5