Tech stack — MLOPS

Reference architecture

Eight layers. One coherent platform.

L7Application & UX

Next.jsStreamlitSlack apps

L6Agent & chain orchestration

LangGraphCrewAIDSPyLlamaIndexHaystack

L5Gateway · routing · gov.

LiteLLMPortkeyHeliconeKong AI Gateway

L4Evals & observability

LangSmithArizePhoenixBraintrustLangfuseW&B WeaveOpenTelemetry

L3Serving & inference

vLLMSGLangTensorRT-LLMTritonBedrockVertexModalAnyscale

L2Retrieval & memory

PineconeWeaviateQdrantpgvectorTurbopufferNeo4j

L1Training & experimentation

MLflowW&BRayKubeflowAxolotlHugging Face

L0Data & foundation

SnowflakeDatabricksIcebergTrinodbtTectonFeast

L1 · Training & experimentation

Where models are made.

Reproducible runs, distributed scaling, and registries that double as your single source of truth for what's in production.

MLflow

Tracking · Registry

Weights & Biases

Experiments · Eval

Ray

Distributed compute

Kubeflow

K8s-native ML

Axolotl

Fine-tuning

🤗

Hugging Face

Models · Datasets

Neptune.ai

Run tracking

Determined

Training platform

L3 · Serving & inference

Throughput, latency, and price.

Open-source for control, managed for speed-to-value. We pick by TCO, latency budget, and operational maturity.

vLLM

High-throughput LLM

SGLang

Structured serving

TensorRT-LLM

NVIDIA optimized

Triton

Multi-framework

AWS Bedrock

Managed FM

Vertex AI

GCP managed

Modal

Serverless GPU

Anyscale

Ray-managed

SageMaker

AWS endpoint

Fireworks

Fast OSS hosting

Together AI

OSS inference

Groq

Ultra-low latency

L2 · Retrieval & memory

Vector, graph, hybrid — the right tool for the job.

Recall, latency, and cost are a three-way trade-off. We benchmark on your data, not someone else's leaderboard.

Pinecone

Managed vector

Weaviate

Hybrid search

Qdrant

High-perf OSS

pgvector

Postgres-native

Turbopuffer

Serverless · cheap

Neo4j

Knowledge graph

Elastic

Search + vector

OpenSearch

AWS-friendly

L0 · Data & foundation

The substrate. Get it right, everything else gets easier.

Snowflake

Warehouse

Databricks

Lakehouse

Iceberg

Open table format

Delta Lake

ACID lakehouse

Trino

Federated SQL

dbt

Transformations

Tecton

Feature store

Feast

OSS feature store

Kafka

Streaming

Flink

Stream processing

Airflow

Batch orchestration

Prefect

Modern workflows

L6 · Orchestration

From single calls to multi-agent systems.

LangGraph

State machines

DSPy

Compile prompts

CrewAI

Multi-agent crews

LlamaIndex

RAG framework

Haystack

Pipeline framework

MCP

Tool protocol

L4 · Observability & evals

You can't improve what you don't measure.

LangSmith

Trace · eval

Arize

ML obs

Phoenix

OSS LLM trace

Braintrust

Eval platform

Langfuse

OSS observability

W&B Weave

LLM tracing

OpenTelemetry

Standard traces

Grafana

Dashboards

Prometheus

Metrics

Datadog

APM

L4 · Guardrails & safety

Production AI is defended, not just deployed.

NeMo Guardrails

Programmable

LlamaGuard

Safety classifier

Guardrails AI

Validators

Lakera

Injection defense

Presidio

PII redaction

Garak

Red-team scanner

Infrastructure

The boring layer that makes everything else possible.

K8s

Kubernetes

Container orch

Argo CD

GitOps

Argo Rollouts

Progressive delivery

Flux

GitOps

Karpenter

Node autoscaling

Terraform

IaC

Pulumi

IaC (typed)

Vault

Secrets

Cosign

Artifact signing

Buildkite

GitHub Actions

Sentry

Error tracking

This stack is opinionated.
It's not the only one we know.

Tell us what you're already running. We'll meet you where you are — bring patterns, not migrations, unless they're worth it.

Discuss your stack →