Case studies — MLOPS

Fintech · Series C

−64% LLM inference cost on a customer support copilot

The team was burning $190k/month on GPT-4 for 22M monthly support requests. We replaced with a distilled Llama 3.1 70B fine-tune on vLLM, added semantic + KV cache, and routed cheap queries through Haiku 3.5. Eval pass rate held within 1.2pp.

−64%$/request

2.1×Throughput

99.99%Uptime

Healthcare · HIPAA

HIPAA-compliant clinical RAG agent in 9 weeks

Private VPC on AWS Bedrock + OpenSearch vector. Full audit trail, model cards, LLM-as-judge nightly eval against 1,400 gold answers. Passed two third-party security reviews and an internal red-team exercise.

9 wkTime to prod

93%Eval pass rate

0PII leaks

E-commerce · Public

Real-time personalization with 11ms p99 inference

Replaced batch scoring with online inference using Triton + ONNX on G6 GPUs. Feature store on Tecton with online/offline parity. CTR up double-digits, GPU spend flat through autoscaling.

11msp99 latency

+14%CTR lift

flatGPU $

SaaS · Series B

Tier-1 ticket-resolving agent — 38% auto-resolution

LangGraph supervisor pattern. Specialist agents for billing, account, integrations. Tool registry via MCP. Confidence-gated escalation with full trace replay. CSAT moved from 4.2 to 4.6 / 5.

38%Auto-resolved

4.6/5CSAT

$0.04Cost / ticket

Government · Federal

FedRAMP-Moderate inference platform

Air-gapped Llama 3.1 70B on bare-metal H100 cluster. STIG-hardened K8s, Cosign-signed artifacts, PII redaction at ingress and egress. ATO achieved on first audit.

ATOFirst audit

100%Air-gapped

5Models in prod

Insurance · Top 10 US

Claims triage agent under EU AI Act high-risk

End-to-end documentation: model cards, system cards, FRIA (Fundamental Rights Impact Assessment), continuous monitoring. Decision-explainability surfaced to claimants and adjusters.

−51%Triage time

+19ppAdjuster CSAT

auditpassed Q1'26

Logistics · Series D

Real-time forecasting on 9B-row Iceberg lakehouse

Migrated Spark ETL to Iceberg + Trino + dbt. Forecasting models on Ray, served via SageMaker async. Pipeline cost down 71%, training cycle from 14h to 2.3h.

−71%Pipeline $

6×Training speed

9B rowsDaily

B2B SaaS · Pre-IPO

SOC 2 Type II for an ML platform — zero ML findings

Hardened the entire ML lifecycle: signed model artifacts, dataset lineage, immutable training records, role-scoped serving credentials. Walked auditors through every control with engineering present.

0ML findings

100%Controls evidenced

8 wkCert prep

Receipts. Not promises.

−64% LLM inference cost on a customer support copilot

HIPAA-compliant clinical RAG agent in 9 weeks

Real-time personalization with 11ms p99 inference

Tier-1 ticket-resolving agent — 38% auto-resolution

FedRAMP-Moderate inference platform

Claims triage agent under EU AI Act high-risk

Real-time forecasting on 9B-row Iceberg lakehouse

SOC 2 Type II for an ML platform — zero ML findings

The problems are universal.
The constraints aren't.

Want to be the next case study?

−64% LLM inference cost on a customer support copilot

HIPAA-compliant clinical RAG agent in 9 weeks

Real-time personalization with 11ms p99 inference

Tier-1 ticket-resolving agent — 38% auto-resolution

FedRAMP-Moderate inference platform

Claims triage agent under EU AI Act high-risk

Real-time forecasting on 9B-row Iceberg lakehouse

SOC 2 Type II for an ML platform — zero ML findings

The problems are universal.The constraints aren't.

Want to be the next case study?

The problems are universal.
The constraints aren't.