Optimize Azure Kubernetes Service Node Cost by Combining OnDemand And Spot VMs
While it's possible to run the Kubernetes nodes either in on-demand or spot node pools separately, we can optimize the application cost without compromising the reliability by placing the pods unevenly on spot and OnDemand VMs using the topology spread constraints. With baseline amount of pods deployed in OnDemand node pool offering reliability, we can scale on spot node pool based on the load at a lower cost.Kubernetes Topology Spread
In this post, we will go through a step by step approach on deploying an application spread unevenly on spot and OnDemand VMs.
Prerequisites
- Azure Subscription with permissions to create the required resources
- Azure CLI
- kubectl CLI
1. Create a Resource Group and an AKS Cluster
Create a resource group in your preferred Azure location using Azure CLI as shown below
Let's create an AKS cluster using one of the following commands.
OR
2. Create two node pools using spot and OnDemand VMs
3. Deploy a sample application
4. Update the application deployment using topology spread constraints
requiredDuringSchedulingIgnoredDuringExecution we ensure that the pods are placed in nodes which has deploy as key and the value as either spot or ondemand. Whereas for preferredDuringSchedulingIgnoredDuringExecution we will add weight such that spot nodes has more preference over OnDemand nodes for the pod placement.topologySpreadConstraints with two label selectors. One with deploy label as topology key, the attribute maxSkew as 3 and DoNotSchedule for whenUnsatisfiable which ensures that not less than 3 instances (as we use 9 replicas) will be in single topology domain (in our case spot and ondemand). As the nodes with spot as value for deploy label has the higher weight preference in node affinity, scheduler will most likely will place more pods on spot than OnDemand node pool. For the second label selector we use topology.kubernetes.io/zone as the topology key to evenly distribute the pods across availability zones, as we use ScheduleAnyway for whenUnsatisfiable scheduler won't enforce this distribution but attempt to make it if possible.Conclusion
maxSkew configuration in topology spread constraints is the maximum skew allowed as the name suggests, so it's not guaranteed that the maximum number of pods will be in a single topology domain. However, this approach is a good starting point to achieve optimal placement of pods in a cluster with multiple node pools.Published on:
Learn moreRelated posts
Announcing: Dynamic Data Masking for Azure Cosmos DB (Preview)
Today marks a big step forward with the public preview of Dynamic Data Masking (DDM) for Azure Cosmos DB. This feature helps organizations pro...
Use Azure SRE Agent with Azure Cosmos DB: Smarter Diagnostics for Your Applications
We’re excited to announce the Azure Cosmos DB SRE Agent built on Azure SRE Agent; a new capability designed to simplify troubleshooting and im...
General Availability: Priority-Based Execution in Azure Cosmos DB
Have you ever faced a situation where two different workloads share the same container, and one ends up slowing down the other? This is a comm...
Announcing Preview of Online Copy Jobs in Azure Cosmos DB: Migrate Data with Minimal Downtime!
We are excited to announce the preview of Online Copy Jobs, a powerful new feature designed to make data migration between containers seamless...
Azure Developer CLI (azd) Nov 2025 – Container Apps (GA), Layered Provisioning (Beta), Extension Framework, and Aspire 13
This post announces the November release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) Nov 2025 – Container App...
Announced at Ignite 2025: Azure DocumentDB, MCP Toolkit, Fleet Analytics, and more!
Microsoft Ignite 2025 kicked off with a wave of announcements for Azure Cosmos DB and Azure DocumentDB, setting the tone for a week of innovat...
Automating Microsoft Fabric Workspace Creation with Azure DevOps Pipelines
In today’s fast-paced analytics landscape, Microsoft Fabric has become the leader of enterprise BI implementations, one of the fundamental con...
New T-SQL AI Features are now in Public Preview for Azure SQL and SQL database in Microsoft Fabric
At the start of this year, we released a new set of T-SQL AI features for embedding your relational data for AI applications. Today, we have b...