The stack

The 2026 production AI stack we know cold.

Forty-plus tools, eight categories, picked by what works in production — not what trends on Twitter. Click any layer to see how it fits.

Reference architecture

Eight layers. One coherent platform.

L7Application & UX
Next.jsStreamlitSlack apps
L6Agent & chain orchestration
LangGraphCrewAIDSPyLlamaIndexHaystack
L5Gateway · routing · gov.
LiteLLMPortkeyHeliconeKong AI Gateway
L4Evals & observability
LangSmithArizePhoenixBraintrustLangfuseW&B WeaveOpenTelemetry
L3Serving & inference
vLLMSGLangTensorRT-LLMTritonBedrockVertexModalAnyscale
L2Retrieval & memory
PineconeWeaviateQdrantpgvectorTurbopufferNeo4j
L1Training & experimentation
MLflowW&BRayKubeflowAxolotlHugging Face
L0Data & foundation
SnowflakeDatabricksIcebergTrinodbtTectonFeast
L1 · Training & experimentation

Where models are made.

Reproducible runs, distributed scaling, and registries that double as your single source of truth for what's in production.

M
MLflow
Tracking · Registry
W
Weights & Biases
Experiments · Eval
R
Ray
Distributed compute
K
Kubeflow
K8s-native ML
A
Axolotl
Fine-tuning
🤗
Hugging Face
Models · Datasets
N
Neptune.ai
Run tracking
D
Determined
Training platform
L3 · Serving & inference

Throughput, latency, and price.

Open-source for control, managed for speed-to-value. We pick by TCO, latency budget, and operational maturity.

v
vLLM
High-throughput LLM
SG
SGLang
Structured serving
TR
TensorRT-LLM
NVIDIA optimized
T
Triton
Multi-framework
B
AWS Bedrock
Managed FM
V
Vertex AI
GCP managed
M
Modal
Serverless GPU
A
Anyscale
Ray-managed
SM
SageMaker
AWS endpoint
F
Fireworks
Fast OSS hosting
T
Together AI
OSS inference
G
Groq
Ultra-low latency
L2 · Retrieval & memory

Vector, graph, hybrid — the right tool for the job.

Recall, latency, and cost are a three-way trade-off. We benchmark on your data, not someone else's leaderboard.

P
Pinecone
Managed vector
W
Weaviate
Hybrid search
Q
Qdrant
High-perf OSS
pg
pgvector
Postgres-native
TP
Turbopuffer
Serverless · cheap
N
Neo4j
Knowledge graph
E
Elastic
Search + vector
OS
OpenSearch
AWS-friendly
L0 · Data & foundation

The substrate. Get it right, everything else gets easier.

SF
Snowflake
Warehouse
DB
Databricks
Lakehouse
I
Iceberg
Open table format
D
Delta Lake
ACID lakehouse
T
Trino
Federated SQL
D
dbt
Transformations
T
Tecton
Feature store
F
Feast
OSS feature store
K
Kafka
Streaming
F
Flink
Stream processing
A
Airflow
Batch orchestration
P
Prefect
Modern workflows
L6 · Orchestration

From single calls to multi-agent systems.

LG
LangGraph
State machines
DS
DSPy
Compile prompts
CR
CrewAI
Multi-agent crews
LI
LlamaIndex
RAG framework
H
Haystack
Pipeline framework
MCP
MCP
Tool protocol
L4 · Observability & evals

You can't improve what you don't measure.

LS
LangSmith
Trace · eval
A
Arize
ML obs
P
Phoenix
OSS LLM trace
B
Braintrust
Eval platform
LF
Langfuse
OSS observability
W
W&B Weave
LLM tracing
OT
OpenTelemetry
Standard traces
G
Grafana
Dashboards
P
Prometheus
Metrics
D
Datadog
APM
L4 · Guardrails & safety

Production AI is defended, not just deployed.

N
NeMo Guardrails
Programmable
L
LlamaGuard
Safety classifier
G
Guardrails AI
Validators
L
Lakera
Injection defense
P
Presidio
PII redaction
G
Garak
Red-team scanner
Infrastructure

The boring layer that makes everything else possible.

K8s
Kubernetes
Container orch
A
Argo CD
GitOps
AR
Argo Rollouts
Progressive delivery
F
Flux
GitOps
K
Karpenter
Node autoscaling
T
Terraform
IaC
P
Pulumi
IaC (typed)
V
Vault
Secrets
C
Cosign
Artifact signing
B
Buildkite
CI
G
GitHub Actions
CI
S
Sentry
Error tracking

This stack is opinionated.
It's not the only one we know.

Tell us what you're already running. We'll meet you where you are — bring patterns, not migrations, unless they're worth it.

Discuss your stack