Loading...

Realizing Machine Learning anywhere with Azure Kubernetes Service and Arc-enabled Machine Learning

Realizing Machine Learning anywhere with Azure Kubernetes Service and Arc-enabled Machine Learning

 

We are thrilled to announce the general availability of Azure Machine Learning (Azure ML) Kubernetes compute, including support of seamless Azure Kubernetes Service (AKS) integration and Azure Arc-enabled Machine Learning.

 

With a simple cluster extension deployment on AKS or Azure Arc-enabled Kubernetes (Arc Kubernetes) cluster, Kubernetes cluster is seamlessly supported in Azure ML to run training or inference workload. In addition, Azure ML service capabilities for streamlining full ML lifecycle and automation with MLOps become instantly available to enterprise teams of professionals. Azure ML Kubernetes compute empowers enterprises ML operationalization at scale across different infrastructures and addresses different needs with seamless experience of Azure ML CLI v2, Python SDK v2 (preview), and Studio UI. Here are some of the capabilities that customers can benefit

  • Deploy ML workload on customer managed AKS cluster and gain more security and controls to meet compliance requirements.
  • Run Azure ML workload on Arc Kubernetes cluster right where data lives and meets data residency, security, and privacy compliance, or harness existing IT investment.
  • Use Arc Kubernetes cluster to deploy ML workload or aspect of ML lifecycle across multiple public clouds.
  • Fully automated hybrid workload in cloud and on-premises to leverage different infrastructure advantages and IT investments.

 

How it works

 

The IT-operations team and data-science team are both integral parts of the broader ML team. By letting the IT-operations team manage Kubernetes compute setup, Azure ML creates a seamless compute experience for data-science team who does not need to learn or use Kubernetes directly. The design for Azure ML Kubernetes compute also helps IT-operations team leverage native Kubernetes concepts such as namespace, node selector, and resource requests/limits for ML compute utilization and optimization. Data-science team now can focus on models and work with productivity tools such as Azure ML CLI v2, Python SDK v2, Studio UI, and Jupyter notebook.

 

It is easy to enable and use an existing Kubernetes cluster for Azure ML workload with the following simple steps:

 

easy-k8s-setup.png

 

IT-operation team. The IT-operation team is responsible for the first 3 steps above: prepare an AKS or Arc Kubernetes cluster, deploy Azure ML cluster extension, and attach Kubernetes cluster to Azure ML workspace. In addition to these essential compute setup steps, IT-operation team also uses familiar tools such as Azure CLI or kubectl to take care of the following tasks for the data-science team:

  • Network and security configurations, such as outbound proxy server connection or Azure firewall configuration, Azure ML inference router (azureml-fe) setup, SSL/TLS termination, and no-public IP with VNET.
  • Create and manage instance types for different ML workload scenarios and gain efficient compute resource utilization.
  • Trouble shooting workload issues related to Kubernetes cluster.

 

Data-science team. Once the IT-operations team finishes compute setup and compute target(s) creation, data-science team can discover list of available compute targets and instance types in Azure ML workspace to be used for training or inference workload. Data science specifies compute target name and instance type name using their preferred tools or APIs such as Azure ML CLI v2, Python SDK v2, or Studio UI.

 

k8s-compute list.png

 

Recommended best practices

 

Separation of responsibilities between the IT-operations team and data-science team. As we mentioned above, managing your own compute and infrastructure for ML workload is a complicated task and it is best to be done by IT-operations team so data-science team can focus on ML models for organizational efficiency.

 

Create and manage instance types for different ML workload scenarios. Each ML workload uses different amounts of compute resources such as CPU/GPU and memory. Azure ML implements instance type as Kubernetes custom resource definition (CRD) with properties of nodeSelector and resource request/limit. With a carefully curated list of instance types, IT-operations can target ML workload on specific node(s) and manage compute resource utilization efficiently.

 

Multiple Azure ML workspaces share the same Kubernetes cluster. You can attach Kubernetes cluster multiple times to the same Azure ML workspace or different Azure ML workspaces, creating multiple compute targets in one workspace or multiple workspaces. Since many customers organize data science projects around Azure ML workspace, multiple data science projects can now share the same Kubernetes cluster. This significantly reduces ML infrastructure management overheads as well as IT cost saving.

 

Team/project workload isolation using Kubernetes namespace. When you attach Kubernetes cluster to Azure ML workspace, you can specify a Kubernetes namespace for the compute target and all workloads run by the compute target will be placed under the specified namespace.

 

New Azure ML use patterns enabled

 

Azure Arc-enabled ML enables teams of ML professionals to build, train, and deploy models in any infrastructure on-premises and across multi-cloud using Kubernetes. This opens a variety of new use patterns previously unthinkable in cloud setting environment. Below table provides a summary of the new use patterns enabled by Azure ML Kubernetes compute, including where the training data resides in each use pattern, the motivation driving each use pattern, and how the use pattern is realized using Azure ML and infrastructure setup.

 

use patterns.png

 

Get started today

 

To get started with Azure Machine Learning Kubernetes compute, please visit Azure ML documentation and GitHub repo, where you can find detailed instructions to setup Kubernetes cluster for Azure Machine Learning, and train or deploy models with a variety of Azure ML examples. Lastly, visit Azure Hybrid, Multicloud, and Edge Day and watch “Real time insights from edge to cloud” where we announced the GA.

 

 

Published on:

Learn more
Azure Arc Blog articles
Azure Arc Blog articles

Azure Arc Blog articles

Share post:

Related posts

From Manual Testing to AI-Generated Automation: Our Azure DevOps MCP + Playwright Success Story

In today’s fast-paced software development cycles, manual testing often becomes a significant bottleneck. Our team was facing a growing backlo...

2 days ago

Cognitive services and Azure ML for Dataflows will be fully retired by September 15th, 2025

This blog is outlining the depreciation announcement for Azure ML and Cognitive services using dataflows.

5 days ago

Azure Developer CLI: From Dev to Prod with One Click

This post walks through how to implement a “build once, deploy everywhere” pattern using Azure Developer CLI (azd) that provisions...

6 days ago

AI Builder – Invoice processing and Invoices document type to begin using Azure

Starting on July 21, 2025, the prebuilt model invoice processing and invoices document type (built on Azure Document Intelligence 4.0) will be...

6 days ago

Dataverse: Learn How to Implement Azure Durable Functions – Payment Scenario

Azure Durable Functions is an extension of Azure Functions that offers specialized capabilities, including statefulness, orchestration, handli...

7 days ago

Build reliable Go applications: Configuring Azure Cosmos DB Go SDK for real-world scenarios

When building applications that interact with databases, developers frequently encounter scenarios where default SDK configurations don’...

10 days ago
Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy