How Google scaled Kubernetes to 130,000 nodes for AI workloads

What does it take to run Kubernetes at a scale most organizations can barely imagine, and how do those innovations benefit everyone?

In this exclusive interview from Kubeco/CloudNativeCon, Gari Singh, product manager for Google Kubernetes Engine, reveals the technical breakthroughs that enable GKE to support some of the world’s most demanding AI workloads. From massive GPU clusters running training jobs for companies like Anthropic and Magic to the infrastructure improvements that power Google’s cloud platform, Singh provides a rare glimpse into the cutting edge of container orchestration.

But this isn’t just a story about extreme scale. Singh explains how Google has fundamentally changed its approach to enterprise features, making capabilities that were previously locked behind paid tiers available to all GKE users at no additional cost. He also discusses the technical innovations that are reshaping how we think about managing heterogeneous compute environments and offers a provocative vision for the future where AI agents help manage Kubernetes platforms.

What you’ll discover in this interview

Watch this video to learn about the specific technical challenges Google faced when scaling to unprecedented levels and how they solved problems around networking, autoscaling, and resource management. Singh discusses the real-world performance implications of running thousands of GPU pods with complex interconnected topologies and explains why improvements made for extreme-scale customers benefit organizations of all sizes.

You’ll gain insight into dynamic resource allocation (DRA), a relatively new Kubernetes capability that’s transforming how platforms handle diverse accelerator types from multiple vendors. Singh explains how this open source innovation enables standardized management of GPUs, TPUs, and specialized networking hardware without hard-coding device types into the core platform.

The interview covers Google’s strategic decision to integrate GKE Enterprise features into the base platform and what this means for organizations currently evaluating Kubernetes options. Singh provides practical guidance on multi-cloud strategies, explaining how to leverage cloud-native optimizations without creating vendor lock-in—a concern that many organizations struggle with when adopting managed Kubernetes services.

Key topics explored

The technical journey to supporting 130,000-node GPU clusters and what comes next
How scaling work improves autoscaling speed and API response times for all customers
Why thousands of organizations now run Kubernetes clusters with 1,000-5,000 nodes
The networking challenges that emerge at extreme scale and how Google addresses them
Dynamic resource allocation and the future of heterogeneous compute management
Infrastructure innovations including 400-800 Gbps networking and network offload processors
The integration of GKE Enterprise features and what’s now available free
Multi-cloud strategies that leverage cloud-native features without vendor lock-in
Using AI agents to manage Kubernetes platforms and generate infrastructure code
Whether Kubernetes has become easier for smaller organizations to adopt

Singh offers candid perspectives on topics ranging from the complexity of managing global-scale network infrastructure to the challenges organizations face when deciding between vanilla Kubernetes and cloud-native enhancements. His insights into Google’s infrastructure evolution and upcoming features provide valuable context for anyone planning Kubernetes deployments at any scale.

Whether you’re running a small development cluster or planning enterprise-scale deployments, this interview offers practical insights into the current state and future direction of Kubernetes. Press play to discover how innovations driven by extreme-scale AI workloads are making Kubernetes better for everyone.