GAIA — Multi-Cloud AI Gateway
Designed and shipped Tanium's internal AI infrastructure layer: a multi-cloud gateway routing across OpenAI, Anthropic, and Azure OpenAI with cost controls, observability, and team-level attribution.
GAIA is Tanium’s internal AI gateway — the foundational layer through which every AI-powered feature in the product flows.
The problem
As AI capabilities proliferated across Tanium’s product, every team was making independent decisions about which models to call, how to handle failures, and how to manage costs. There was no visibility into aggregate spend, no consistency in how models were versioned, and no ability to swap providers without rewriting integrations across the codebase.
What we built
A centralized AI gateway that abstracts model provider complexity from product engineers. Key capabilities:
- Multi-provider routing across OpenAI, Anthropic, and Azure OpenAI
- Capability-based API — engineers request a capability type, not a specific model
- Cost attribution — every call tagged by team, feature, and environment
- Rate limiting and fallback — graceful degradation when primary providers are unavailable
- Observability — latency, token usage, and error rates surfaced to engineering teams
Impact
Centralized AI infrastructure for a product used by thousands of enterprise security teams. The cost attribution model alone drove a measurable reduction in unnecessary API calls within the first quarter post-launch.
What I learned
The hardest part was governance: who owns the SLA, who gets paged when a model goes down, how teams request new capabilities. Getting this right before incidents happen matters more than architecture perfection.