GAIA — Multi-Cloud AI Gateway

GAIA is Tanium’s internal AI gateway — the foundational layer through which every AI-powered feature in the product flows.

The problem

As AI capabilities proliferated across Tanium’s product, every team was making independent decisions about which models to call, how to handle failures, and how to manage costs. There was no visibility into aggregate spend, no consistency in how models were versioned, and no ability to swap providers without rewriting integrations across the codebase.

What we built

A centralized AI gateway that abstracts model provider complexity from product engineers. Key capabilities:

Multi-provider routing across OpenAI, Anthropic, and Azure OpenAI
Capability-based API — engineers request a capability type, not a specific model
Cost attribution — every call tagged by team, feature, and environment
Rate limiting and fallback — graceful degradation when primary providers are unavailable
Observability — latency, token usage, and error rates surfaced to engineering teams

Impact

Centralized AI infrastructure for a product used by thousands of enterprise security teams. The cost attribution model alone drove a measurable reduction in unnecessary API calls within the first quarter post-launch.

What I learned

The hardest part was governance: who owns the SLA, who gets paged when a model goes down, how teams request new capabilities. Getting this right before incidents happen matters more than architecture perfection.