Optimize Azure Kubernetes Service Node Cost by Combining OnDemand And Spot VMs
While it's possible to run the Kubernetes nodes either in on-demand or spot node pools separately, we can optimize the application cost without compromising the reliability by placing the pods unevenly on spot and OnDemand VMs using the topology spread constraints. With baseline amount of pods deployed in OnDemand node pool offering reliability, we can scale on spot node pool based on the load at a lower cost.Kubernetes Topology Spread
In this post, we will go through a step by step approach on deploying an application spread unevenly on spot and OnDemand VMs.
Prerequisites
- Azure Subscription with permissions to create the required resources
- Azure CLI
- kubectl CLI
1. Create a Resource Group and an AKS Cluster
Create a resource group in your preferred Azure location using Azure CLI as shown below
Let's create an AKS cluster using one of the following commands.
OR
2. Create two node pools using spot and OnDemand VMs
3. Deploy a sample application
4. Update the application deployment using topology spread constraints
requiredDuringSchedulingIgnoredDuringExecution
we ensure that the pods are placed in nodes which has deploy as key and the value as either spot or ondemand. Whereas for preferredDuringSchedulingIgnoredDuringExecution
we will add weight such that spot nodes has more preference over OnDemand nodes for the pod placement.topologySpreadConstraints
with two label selectors. One with deploy label as topology key, the attribute maxSkew
as 3 and DoNotSchedule
for whenUnsatisfiable
which ensures that not less than 3 instances (as we use 9 replicas) will be in single topology domain (in our case spot and ondemand). As the nodes with spot
as value for deploy label has the higher weight preference in node affinity, scheduler will most likely will place more pods on spot than OnDemand node pool. For the second label selector we use topology.kubernetes.io/zone
as the topology key to evenly distribute the pods across availability zones, as we use ScheduleAnyway
for whenUnsatisfiable
scheduler won't enforce this distribution but attempt to make it if possible.Conclusion
maxSkew
configuration in topology spread constraints is the maximum skew allowed as the name suggests, so it's not guaranteed that the maximum number of pods will be in a single topology domain. However, this approach is a good starting point to achieve optimal placement of pods in a cluster with multiple node pools.Published on:
Learn moreRelated posts
Fabric Mirroring for Azure Cosmos DB: Public Preview Refresh Now Live with New Features
We’re thrilled to announce the latest refresh of Fabric Mirroring for Azure Cosmos DB, now available with several powerful new features that e...
Power Platform – Use Azure Key Vault secrets with environment variables
We are announcing the ability to use Azure Key Vault secrets with environment variables in Power Platform. This feature will reach general ava...
Validating Azure Key Vault Access Securely in Fabric Notebooks
Working with sensitive data in Microsoft Fabric requires careful handling of secrets, especially when collaborating externally. In a recent cu...
Azure Developer CLI (azd) – May 2025
This post announces the May release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) – May 2025 appeared first on ...
Azure Cosmos DB with DiskANN Part 4: Stable Vector Search Recall with Streaming Data
Vector Search with Azure Cosmos DB In Part 1 and Part 2 of this series, we explored vector search with Azure Cosmos DB and best practices for...