Realizing Machine Learning anywhere with Azure Kubernetes Service and Arc-enabled Machine Learning
We are thrilled to announce the general availability of Azure Machine Learning (Azure ML) Kubernetes compute, including support of seamless Azure Kubernetes Service (AKS) integration and Azure Arc-enabled Machine Learning.
With a simple cluster extension deployment on AKS or Azure Arc-enabled Kubernetes (Arc Kubernetes) cluster, Kubernetes cluster is seamlessly supported in Azure ML to run training or inference workload. In addition, Azure ML service capabilities for streamlining full ML lifecycle and automation with MLOps become instantly available to enterprise teams of professionals. Azure ML Kubernetes compute empowers enterprises ML operationalization at scale across different infrastructures and addresses different needs with seamless experience of Azure ML CLI v2, Python SDK v2 (preview), and Studio UI. Here are some of the capabilities that customers can benefit
- Deploy ML workload on customer managed AKS cluster and gain more security and controls to meet compliance requirements.
- Run Azure ML workload on Arc Kubernetes cluster right where data lives and meets data residency, security, and privacy compliance, or harness existing IT investment.
- Use Arc Kubernetes cluster to deploy ML workload or aspect of ML lifecycle across multiple public clouds.
- Fully automated hybrid workload in cloud and on-premises to leverage different infrastructure advantages and IT investments.
How it works
The IT-operations team and data-science team are both integral parts of the broader ML team. By letting the IT-operations team manage Kubernetes compute setup, Azure ML creates a seamless compute experience for data-science team who does not need to learn or use Kubernetes directly. The design for Azure ML Kubernetes compute also helps IT-operations team leverage native Kubernetes concepts such as namespace, node selector, and resource requests/limits for ML compute utilization and optimization. Data-science team now can focus on models and work with productivity tools such as Azure ML CLI v2, Python SDK v2, Studio UI, and Jupyter notebook.
It is easy to enable and use an existing Kubernetes cluster for Azure ML workload with the following simple steps:
IT-operation team. The IT-operation team is responsible for the first 3 steps above: prepare an AKS or Arc Kubernetes cluster, deploy Azure ML cluster extension, and attach Kubernetes cluster to Azure ML workspace. In addition to these essential compute setup steps, IT-operation team also uses familiar tools such as Azure CLI or kubectl to take care of the following tasks for the data-science team:
- Network and security configurations, such as outbound proxy server connection or Azure firewall configuration, Azure ML inference router (azureml-fe) setup, SSL/TLS termination, and no-public IP with VNET.
- Create and manage instance types for different ML workload scenarios and gain efficient compute resource utilization.
- Trouble shooting workload issues related to Kubernetes cluster.
Data-science team. Once the IT-operations team finishes compute setup and compute target(s) creation, data-science team can discover list of available compute targets and instance types in Azure ML workspace to be used for training or inference workload. Data science specifies compute target name and instance type name using their preferred tools or APIs such as Azure ML CLI v2, Python SDK v2, or Studio UI.
Recommended best practices
Separation of responsibilities between the IT-operations team and data-science team. As we mentioned above, managing your own compute and infrastructure for ML workload is a complicated task and it is best to be done by IT-operations team so data-science team can focus on ML models for organizational efficiency.
Create and manage instance types for different ML workload scenarios. Each ML workload uses different amounts of compute resources such as CPU/GPU and memory. Azure ML implements instance type as Kubernetes custom resource definition (CRD) with properties of nodeSelector and resource request/limit. With a carefully curated list of instance types, IT-operations can target ML workload on specific node(s) and manage compute resource utilization efficiently.
Multiple Azure ML workspaces share the same Kubernetes cluster. You can attach Kubernetes cluster multiple times to the same Azure ML workspace or different Azure ML workspaces, creating multiple compute targets in one workspace or multiple workspaces. Since many customers organize data science projects around Azure ML workspace, multiple data science projects can now share the same Kubernetes cluster. This significantly reduces ML infrastructure management overheads as well as IT cost saving.
Team/project workload isolation using Kubernetes namespace. When you attach Kubernetes cluster to Azure ML workspace, you can specify a Kubernetes namespace for the compute target and all workloads run by the compute target will be placed under the specified namespace.
New Azure ML use patterns enabled
Azure Arc-enabled ML enables teams of ML professionals to build, train, and deploy models in any infrastructure on-premises and across multi-cloud using Kubernetes. This opens a variety of new use patterns previously unthinkable in cloud setting environment. Below table provides a summary of the new use patterns enabled by Azure ML Kubernetes compute, including where the training data resides in each use pattern, the motivation driving each use pattern, and how the use pattern is realized using Azure ML and infrastructure setup.
Get started today
To get started with Azure Machine Learning Kubernetes compute, please visit Azure ML documentation and GitHub repo, where you can find detailed instructions to setup Kubernetes cluster for Azure Machine Learning, and train or deploy models with a variety of Azure ML examples. Lastly, visit Azure Hybrid, Multicloud, and Edge Day and watch “Real time insights from edge to cloud” where we announced the GA.
Published on:
Learn moreRelated posts
Azure Migrate Execute
From Manual Testing to AI-Generated Automation: Our Azure DevOps MCP + Playwright Success Story
In today’s fast-paced software development cycles, manual testing often becomes a significant bottleneck. Our team was facing a growing backlo...
Cognitive services and Azure ML for Dataflows will be fully retired by September 15th, 2025
This blog is outlining the depreciation announcement for Azure ML and Cognitive services using dataflows.
Azure Developer CLI: From Dev to Prod with One Click
This post walks through how to implement a “build once, deploy everywhere” pattern using Azure Developer CLI (azd) that provisions...
Azure Migrate assessments
AI Builder – Invoice processing and Invoices document type to begin using Azure
Starting on July 21, 2025, the prebuilt model invoice processing and invoices document type (built on Azure Document Intelligence 4.0) will be...
Dataverse: Learn How to Implement Azure Durable Functions – Payment Scenario
Azure Durable Functions is an extension of Azure Functions that offers specialized capabilities, including statefulness, orchestration, handli...
Build reliable Go applications: Configuring Azure Cosmos DB Go SDK for real-world scenarios
When building applications that interact with databases, developers frequently encounter scenarios where default SDK configurations don’...