With KubeCon Europe taking place this week, Microsoft has delivered a flurry of Azure Kubernetes announcements. In addition to a new framework for running machine learning workloads, new workload scheduling capabilities, new deployment safeguards, and security and scalability improvements, Microsoft has placed a strong emphasis on developer productivity, working to improve the developer experience and helping reduce the risks of error.
Prior to the event I sat down with Brendan Burns, one of the creators of Kubernetes, and now CVP, Azure Open Source and Cloud-Native at Microsoft. We talked about what Microsoft was announcing at KubeCon Europe, Microsoft’s goals for Kubernetes, and Kubernetes’ importance to Microsoft as both as a provider and a user of the container management system. Burns also provided updates on Microsoft’s progress in delivering a long-term support version of Kubernetes.
This is an interesting time for Kubernetes, as it transitions from a bleeding-edge technology to a mature platform. It’s an essential shift that every technology needs to go through, but one that’s harder for an open-source project that’s relied on by many different cloud providers and many more application developers.
Kaito: Deploying AI inference models on Kubernetes
Much of what Microsoft is doing at the moment around its Azure Kubernetes Service (AKS), and the related Azure Container Service (ACS), is focused on delivering that proverbial mature, trustworthy platform, with its own long-term support plan that goes beyond the current Kubernetes life cycle. The company is also working on tools that help support the workloads it sees developers building both inside Microsoft and on its public-facing cloud services.
So it wasn’t surprising to find our conversation quickly turning to AI, and the tools needed to support the resulting massive-scale workloads on AKS.
One of the new tools Burns talked about was the Kubernetes AI Toolchain Operator for AKS. This is a tool for running large workloads across massive Kubernetes clusters. If you’ve been monitoring the Azure GitHub repositories, you’ll recognize this as the open-source Kaito project that Microsoft has been using to manage LLM projects and services, many of which are hosted in Azure Kubernetes instances. It’s designed to work with large open-source inference models.
You start by defining a workspace that includes the GPU requirements of your model. Kaito will then deploy model images from your repositories to provisioned GPU nodes. As you’re working with preset configurations, Kaito will deploy model images where they can run without additional tuning. All you need to do is set up an initial nodepool configuration using an Azure host SKU with a supported GPU. As part of setting up nodes using Kaito, AKS automatically configures the correct drivers and any other necessary prerequisites.
Having Kaito in AKS is an important development for deploying applications based on pre-trained open source AI models. And building on top of an existing GitHub-hosted open source project allows the broader community to help shape its future direction.
Fleet: Managing Kubernetes at massive scale
Managing workloads is a big issue for many organizations that have moved to cloud-native application architectures. As more applications and services move to Kubernetes, the size and number of clusters becomes an issue. Where experiments may have involved managing one or two AKS clusters, now we’re having to work with hundreds or even thousands, and manage those clusters around the globe.
While you can build your own tools to handle this level of orchestration, there are complex workload placement issues that need to be considered. AKS has been developing fleet management tools as a higher-level scheduler above the base Kubernetes services. This allows you to manage workloads using a different set of heuristics, for example, using metrics like the cost of compute or the overall availability of resources in an Azure region.
Azure Kubernetes Fleet Manager is designed to help you get the most out of your Kubernetes resources, allowing clusters to join and leave a fleet as necessary, with a central control plane to support workload orchestration. You can think of Fleet as a way to schedule and orchestrate groups of applications, with Kubernetes handling the applications that make up a workload. Microsoft needs a tool like this as much as any company, as it runs many of its own applications and services on Kubernetes.
With Microsoft 365 running in AKS-hosted containers, Microsoft has a strong economic incentive to get the most value from its resources, to maximize profit by ensuring optimum usage of its resources. Like Kaito, Fleet is built on an open-source project, hosted in one of Azure’s GitHub repositories. This approach also allows Microsoft to increase the available sizes for AKS clusters, now up to 5,000 nodes and 100,000 pods.
Burns told me this is the philosophy behind much of what Microsoft is doing with Kubernetes on Azure: “Starting with an open source project, but then bringing it in as a supported part of the Azure Kubernetes service. And then, also obviously, committed to taking this technology and making it easy and available to everybody.”
That point about “making it easy” is at the heart of much of what Microsoft announced at KubeCon Europe, building on existing services and features. As an example, Burns pointed to the support for AKS in Azure Copilot, where instead of using complex tools, you can simply ask questions.
“Using a natural language model, you can also figure out what’s going on in your cluster—you don’t have to dig through a bunch of different screens and a bunch of different YAML files to figure out where a problem is,” Burns said. “The model will tell you and identify problems in the cluster that you have.”
Reducing deployment risk with policy
Another new AKS tool aims to reduce the risks associated with Kubernetes deployments. AKS deployment safeguards build on Microsoft’s experience with running its own and its customers’ Kubernetes applications. These lessons are distilled into a set of best practices that are used to help you avoid common configuration errors.
AKS deployment safeguards scan configuration files before applications are deployed, giving you options for “warning” or “enforcement.” Warnings provide information about issues but don’t stop deployment, while enforcement blocks errors from deploying, reducing the risks of out-of-control code running up significant bills.
“The Kubernetes service has been around in Azure for seven years at this point,” Burns noted. “And, you know, we’ve seen a lot of mistakes—mistakes you can make that make your application less reliable, but also mistakes you can make that make your application insecure.” The resulting collective knowledge from Azure engineering teams, including field engineers working with customers and engineers in the Azure Kubernetes product group, has been used to build these guard rails. Other inputs have come from the Azure security team.
At the heart of the deployment safeguards is a policy engine that is installed in managed clusters. This is used to confirm configurations, actively rejecting those that don’t follow best practices. Currently policies are generic, but future developments may allow you to target policies for specific application types, based on a user’s description of their code.
Burns is definitely optimistic about the future of Kubernetes on Azure, and its role in supporting the current and future generation of AI applications. “We’re continuing to see how we can help lead the Kubernetes community forward with how they think about AI. And I think, this kind of project is the beginning of that. But it’s there’s a lot of pieces to how you do AI really well on top of Kubernetes. And I think we’re in a pretty unique position as both a provider of Kubernetes, but also as a heavy user of Kubernetes for AI, to contribute to that discussion.”