Case studies

Receipts. Not promises.

A selection of engagements we've shipped — across fintech, healthcare, e-commerce, SaaS, and government. Names anonymized where required by NDA.

All LLMOps Agents Cost Reliability Governance
Fintech · Series C

−64% LLM inference cost on a customer support copilot

The team was burning $190k/month on GPT-4 for 22M monthly support requests. We replaced with a distilled Llama 3.1 70B fine-tune on vLLM, added semantic + KV cache, and routed cheap queries through Haiku 3.5. Eval pass rate held within 1.2pp.

−64%$/request
2.1×Throughput
99.99%Uptime
Healthcare · HIPAA

HIPAA-compliant clinical RAG agent in 9 weeks

Private VPC on AWS Bedrock + OpenSearch vector. Full audit trail, model cards, LLM-as-judge nightly eval against 1,400 gold answers. Passed two third-party security reviews and an internal red-team exercise.

9 wkTime to prod
93%Eval pass rate
0PII leaks
E-commerce · Public

Real-time personalization with 11ms p99 inference

Replaced batch scoring with online inference using Triton + ONNX on G6 GPUs. Feature store on Tecton with online/offline parity. CTR up double-digits, GPU spend flat through autoscaling.

11msp99 latency
+14%CTR lift
flatGPU $
SaaS · Series B

Tier-1 ticket-resolving agent — 38% auto-resolution

LangGraph supervisor pattern. Specialist agents for billing, account, integrations. Tool registry via MCP. Confidence-gated escalation with full trace replay. CSAT moved from 4.2 to 4.6 / 5.

38%Auto-resolved
4.6/5CSAT
$0.04Cost / ticket
Government · Federal

FedRAMP-Moderate inference platform

Air-gapped Llama 3.1 70B on bare-metal H100 cluster. STIG-hardened K8s, Cosign-signed artifacts, PII redaction at ingress and egress. ATO achieved on first audit.

ATOFirst audit
100%Air-gapped
5Models in prod
Insurance · Top 10 US

Claims triage agent under EU AI Act high-risk

End-to-end documentation: model cards, system cards, FRIA (Fundamental Rights Impact Assessment), continuous monitoring. Decision-explainability surfaced to claimants and adjusters.

−51%Triage time
+19ppAdjuster CSAT
auditpassed Q1'26
Logistics · Series D

Real-time forecasting on 9B-row Iceberg lakehouse

Migrated Spark ETL to Iceberg + Trino + dbt. Forecasting models on Ray, served via SageMaker async. Pipeline cost down 71%, training cycle from 14h to 2.3h.

−71%Pipeline $
Training speed
9B rowsDaily
B2B SaaS · Pre-IPO

SOC 2 Type II for an ML platform — zero ML findings

Hardened the entire ML lifecycle: signed model artifacts, dataset lineage, immutable training records, role-scoped serving credentials. Walked auditors through every control with engineering present.

0ML findings
100%Controls evidenced
8 wkCert prep
Industries

The problems are universal.
The constraints aren't.

Compliance, latency, accuracy, and cost trade-offs differ by domain. We've worked across enough of them to bring patterns, not surprises.

$
Fintech
Banks · Trading · Payments
+
Healthcare
Payers · Providers · Pharma
E-commerce
Retail · Marketplaces
SaaS
B2B · Vertical · Platform
🛡
Government
Federal · State · Defense
📦
Logistics
Routing · Forecasting
🛡
Insurance
Underwriting · Claims
🎓
EdTech
Learning · Assessment
"MLOPS rebuilt our LLM stack in six weeks. We went from arguing in design docs to shipping features. The cost savings paid for the engagement four times over in the first quarter."
VP Engineering Series C fintech · 250-person engineering org

Want to be the next case study?

Tell us what you're building. If we can help, we'll come back with a written approach and an estimate within five business days.

Start the conversation