Fintech · Series C
−64% LLM inference cost on a customer support copilot
The team was burning $190k/month on GPT-4 for 22M monthly support requests. We replaced with a distilled Llama 3.1 70B fine-tune on vLLM, added semantic + KV cache, and routed cheap queries through Haiku 3.5. Eval pass rate held within 1.2pp.
−64%$/request
2.1×Throughput
99.99%Uptime
Healthcare · HIPAA
HIPAA-compliant clinical RAG agent in 9 weeks
Private VPC on AWS Bedrock + OpenSearch vector. Full audit trail, model cards, LLM-as-judge nightly eval against 1,400 gold answers. Passed two third-party security reviews and an internal red-team exercise.
9 wkTime to prod
93%Eval pass rate
0PII leaks
E-commerce · Public
Real-time personalization with 11ms p99 inference
Replaced batch scoring with online inference using Triton + ONNX on G6 GPUs. Feature store on Tecton with online/offline parity. CTR up double-digits, GPU spend flat through autoscaling.
11msp99 latency
+14%CTR lift
flatGPU $
SaaS · Series B
Tier-1 ticket-resolving agent — 38% auto-resolution
LangGraph supervisor pattern. Specialist agents for billing, account, integrations. Tool registry via MCP. Confidence-gated escalation with full trace replay. CSAT moved from 4.2 to 4.6 / 5.
38%Auto-resolved
4.6/5CSAT
$0.04Cost / ticket
Government · Federal
FedRAMP-Moderate inference platform
Air-gapped Llama 3.1 70B on bare-metal H100 cluster. STIG-hardened K8s, Cosign-signed artifacts, PII redaction at ingress and egress. ATO achieved on first audit.
ATOFirst audit
100%Air-gapped
5Models in prod
Insurance · Top 10 US
Claims triage agent under EU AI Act high-risk
End-to-end documentation: model cards, system cards, FRIA (Fundamental Rights Impact Assessment), continuous monitoring. Decision-explainability surfaced to claimants and adjusters.
−51%Triage time
+19ppAdjuster CSAT
auditpassed Q1'26
Logistics · Series D
Real-time forecasting on 9B-row Iceberg lakehouse
Migrated Spark ETL to Iceberg + Trino + dbt. Forecasting models on Ray, served via SageMaker async. Pipeline cost down 71%, training cycle from 14h to 2.3h.
−71%Pipeline $
6×Training speed
9B rowsDaily
B2B SaaS · Pre-IPO
SOC 2 Type II for an ML platform — zero ML findings
Hardened the entire ML lifecycle: signed model artifacts, dataset lineage, immutable training records, role-scoped serving credentials. Walked auditors through every control with engineering present.
0ML findings
100%Controls evidenced
8 wkCert prep