Azure VMware Solution Availability Design Considerations
Overview
A global enterprise wants to migrate thousands of virtual machines (VMs) to Microsoft Azure as part of their application modernization strategy. Their first step is to exit their on-premises data centers and rapidly relocate their legacy application VMs to the Azure VMware Solution as a staging area for the first phase of their modernization strategy. What should the Azure VMware Solution look like?
Azure VMware Solution is a VMware validated first party Azure service from Microsoft that provides private clouds containing VMware vSphere clusters built from dedicated bare-metal Azure infrastructure. It enables customers to leverage their existing investments in VMware skills and tools, allowing them to focus on developing and running their VMware-based workloads on Azure.
In this post, I will introduce the typical customer workload availability requirements, describe the Azure VMware Solution architectural components, and describe the availability design considerations for Azure VMware Solution private clouds.
In the next section, I will introduce the typical availability requirements of a customer’s workload.
Customer Workload Requirements
A typical customer has multiple tiers of applications that have specific Service Level Agreement (SLA) requirements that need to be met. These SLAs are normally named by a tiering system such as Platinum, Gold, Silver, and Bronze or Mission-Critical, Business-Critical, Production, and Test/Dev. Each SLA will have different availability, recoverability, performance, manageability, and security requirements that need to be met.
For the availability design quality, customers will normally have an uptime percentage requirement with an availability zone (AZ) or region requirement that defines each SLA level. For example:
Table 1 – Typical Customer SLA requirements for Availability
A typical legacy business-critical application will have the following application architecture:
- Load Balancer layer: Uses load balancers to distribute traffic across multiple web servers in the web layer to improve application availability.
- Web layer: Uses web servers to process client requests made via the secure Hypertext Transfer Protocol (HTTPS). Receives traffic from the load balancer layer and forwards to the application layer.
- Application layer: Uses application servers to run software that delivers a business application through a communication protocol. Receives traffic from the web layer and uses the database layer to access stored data.
- Database layer: Uses a relational database management service (RDMS) cluster to store data and provide database services to the application layer.
Depending upon the availability requirements for the service, the application components could be many and spread across multiple sites and regions to meet the customer SLA.
Figure 1 – Typical Legacy Business-Critical Application Architecture
In the next section, I will introduce the architectural components of the Azure VMware Solution.
Architectural Components
The diagram below describes the architectural components of the Azure VMware Solution.
Figure 2 – Azure VMware Solution Architectural Components
Each Azure VMware Solution architectural component has the following function:
- Azure Subscription: Used to provide controlled access and quota management for the Azure VMware Solution.
- Azure Region: Physical locations around the world where we group data centers into Availability Zones (AZs) and then group AZs into regions.
- Azure Resource Group: Container used to place Azure services and resources into logical groups.
- Azure VMware Solution Private Cloud: Uses VMware software, including vCenter Server, NSX-T Data Center software-defined networking, vSAN software-defined storage, and Azure bare-metal ESXi hosts to provide compute, networking, and storage resources.
- Azure VMware Solution Resource Cluster: Uses VMware software, including vSAN software-defined storage, and Azure bare-metal ESXi hosts to provide compute, networking, and storage resources for customer workloads by scaling out the Azure VMware Solution private cloud.
- VMware HCX: Provides mobility, migration and network extension services.
- VMware Site Recovery: Provides Disaster Recovery automation and storage replication services with VMware vSphere Replication.
- Azure Virtual Network (vNET): Private network used to connect Azure services and resources together.
- Azure Route Server: Enables network appliances to exchange dynamic route information with Azure networks.
- Azure Virtual Network Gateway: Cross premises gateway for connecting Azure services and resources to other private networks using IPSec VPN, ExpressRoute, and vNET to vNET.
- Azure ExpressRoute: Provides high-speed private connections between Azure data centers and on-premises or colocation infrastructure.
- Azure Virtual WAN (vWAN): Aggregates networking, security, and routing functions together into a single unified Wide Area Network (WAN).
In the next section, I will describe the availability design considerations for the Azure VMware Solution.
Availability Design Considerations
The architectural design process takes the business problem to be solved and the business goals to be achieved and distills these into customer requirements, design constraints and assumptions. Design constraints can be characterized by the following three categories:
- Laws of the Land – data and application sovereignty, governance, compliance, etc.
- Laws of Physics – data and machine gravity, network latency, etc.
- Laws of Economics – owning versus renting, total cost of ownership (TCO), return on investment (ROI), capital expenditure, operational expenditure, etc.
Each design consideration will be a trade-off between the availability, recoverability, performance, manageability, and security design qualities. The desired result is to deliver business value with the minimum of risk by working backwards from the customer problem.
Design Consideration 1 – Azure Region and AZs: Azure VMware Solution is available in 25 Azure Regions around the world. Select the relevant Azure Regions and AZs that meet your geographic requirements. These locations will typically be driven by your design constraints.
Design Consideration 2 – Deployment topology: Select the Azure VMware Solution topology that best matches the uptime and geographic requirements of your SLAs. For very large deployments, it may make sense to have separate private clouds dedicated to each SLA for cost efficiency.
The Azure VMware Solution supports a maximum of 12 clusters per private cloud. Each cluster supports a minimum of 3 hosts and a maximum of 16 hosts per cluster. Each private cloud supports a maximum of 96 hosts.
VMware vSphere HA provides protection against ESXi host failures and VMware vSphere DRS provides distributed resource management. VMware vSphere Fault Tolerance is not supported by the Azure VMware Solution. These features are preconfigured as part of the managed service and cannot be changed by the customer.
VMware vCenter Server, VMware HCX Manager, VMware SRM and VMware vSphere Replication Manager are individual appliances and are protected by vSphere HA.
VMware NSX-T Manager is a cluster of 3 unified appliances that have a VM-VM anti-affinity placement policy to spread them across the hosts of the cluster. The VMware NSX-T Data Center Edge cluster is a pair of appliances that also use a VM-VM anti-affinity placement policy.
Topology 1 – Standard: The Azure VMware Solution standard private cloud is deployed within a single AZ in an Azure Region, which delivers an infrastructure SLA of 99.9%.
Figure 3 – Azure VMware Solution Private Cloud Standard Topology
Topology 2 – Multi-AZ: Azure VMware Solution private clouds in separate AZs per Azure Region. VMware HCX is used to connect private clouds across AZs. Application clustering is required to provide the multi-AZ availability mechanism. The customer is responsible for ensuring their application clustering solution is within the limits of bandwidth and latency between private clouds.
The Azure VMware Solution does not support AZ selection during provisioning. This is mitigated by having separate Azure Subscriptions with quota in each separate AZ.
Figure 4 – Azure VMware Solution Private Cloud Multi-AZ Topology
Topology 3 – Stretched: The Azure VMware Solution stretched clusters private cloud is deployed across dual AZs in an Azure Region, which delivers a 99.99% infrastructure SLA. This also includes a third AZ for the Azure VMware Solution witness site. Stretched clusters support policy-based synchronous replication to deliver a recovery point objective (RPO) of zero. It is possible to use placement policies and storage policies to mix SLA levels within stretched clusters, by pinning lower SLA workloads to a particular AZ, which will experience downtime during an AZ failure.
This feature is in Public Preview and is currently only available in West Europe, UK South and Germany West Central Azure Regions with the AV36 SKU. The stretched cluster private cloud currently supports a single cluster.
Figure 5 – Azure VMware Solution Private Cloud with Stretched Clusters Topology
Topology 4 – Multi-Region: Azure VMware Solution private clouds across Azure regions. VMware HCX is used to connect private clouds across Azure Regions. Application clustering is required to provide the multi-region availability mechanism. The customer is responsible for ensuring their application clustering solution is within the limits of bandwidth and latency between private clouds.
Figure 6 – Azure VMware Solution Private Cloud Multi-Region Topology
Design Decision 3 – Shared Services or Separate Services Model: The management and control plane cluster (Cluster-1) can be shared with customer workload VMs or be a dedicated cluster for management and control, including customer enterprise services, such as Active Directory, DNS, and DHCP. Additional resource clusters can be added to support customer workload demand. This also includes the option of using separate clusters for each customer SLA.
Figure 7 – Azure VMware Solution Shared Services Model
Figure 8 – Azure VMware Solution Separate Services Model
Design Consideration 4 – SKU type: Three SKU types can be selected for provisioning an Azure VMware Solution private cloud. The smaller AV36 SKU can be used to minimize the impact radius of a failed node. The larger AV36P and AV52 SKUs can be used to run more workloads with less nodes which increases the impact radius of a failed node.
The AV36 SKU is widely available in most Azure regions and the AV36P and AV52 SKUs are limited to certain Azure regions. Azure VMware Solution does not support mixing different SKU types within a private cloud.
Design Consideration 5 – Placement Policies: Placement policies are used to increase the availability of a service by separating the VMs in an application availability layer across ESXi hosts. When an ESXi failure occurs, it would only impact one VM of a multi-part application layer, which would then restart on another ESXi host through vSphere HA. Placement policies support VM-VM and VM-Host affinity and anti-affinity rules. The vSphere Distributed Resource Scheduler (DRS) is responsible for migrating VMs to enforce the placement policies.
To increase the availability of an application cluster, a placement policy with VM-VM anti-affinity rules for each of the web, application and database service layers can be used. Alternatively, VM-Host affinity rules can be used to segment the web, application, and database components to dedicated groups of hosts.
Figure 9 – Azure VMware Solution Placement Policies – VM-VM Anti-Affinity
Figure 10 – Azure VMware Solution Placement Policies – VM-Host Affinity
Design Consideration 6 – Storage Policies: Table 2 lists the pre-defined VM Storage Policies available for use with VMware vSAN. The appropriate redundant array of independent disks (RAID) and failures to tolerate (FTT) settings per policy need to be considered to match the customer workload SLAs. Each policy has a trade-off between availability, performance, capacity, and cost that needs to be considered.
The storage policies for stretched clusters include a designation for the dual site (synchronous replication), preferred site and secondary site policies that need to be considered.
To comply with the Azure VMware Solution SLA, the customer is responsible for using an FTT=2 storage policy when the cluster has 6 or more nodes in a standard cluster. The customer must also retain a minimum slack space of 25% for backend vSAN operations.
Table 2 – VMware vSAN Storage Policies
In the following section, I will describe the next steps that would need to be made to progress this high-level design estimate towards a validated detailed design.
Next Steps
The Azure VMware Solution sizing estimate should be assessed using Azure Migrate. With large enterprise solutions for strategic and major customers, an Azure VMware Solution Solutions Architect from Azure, VMware, or a VMware Partner should be engaged to ensure the solution is correctly sized to deliver business value with the minimum of risk.
Summary
In this post, we took a closer look at the typical availability requirements of a customer workload, the architectural building blocks, and the availability design considerations for the Azure VMware Solution. We also discussed the next steps to continue an Azure VMware Solution design.
If you are interested in the Azure VMware Solution, please use these resources to learn more about the service:
- Homepage: Azure VMware Solution | Microsoft Azure
- Documentation: Azure VMware Solution
- SLA: SLA for Azure VMware Solution | Microsoft Azure
- Azure Regions: Azure Products by Region | Microsoft Azure
- Service Limits: Azure VMware Solution subscription limits and quotas
- Stretched Clusters (Public Preview): Deploy vSAN stretched clusters (Preview)
- SKU types: Introduction
- Placement policies: Create placement policy
- Storage policies: Configure storage policy
- GitHub repository: Azure/azure-vmware-solution
- Cloud Adoption Framework: Introduction to the Azure VMware Solution adoption scenario
- Network connectivity scenarios: Enterprise-scale network topology and connectivity for Azure VMware Solution
- Enterprise Scale Landing Zone: Enterprise-scale for Microsoft Azure VMware Solution
- Enterprise Scale GitHub repository: Azure/Enterprise-Scale-for-AVS
Author Bio
René van den Bedem is a Principal Technical Program Manager in the Azure VMware Solution product group at Microsoft. His background is in enterprise architecture with extensive experience across all facets of the enterprise, public cloud & service provider spaces, including digital transformation and the business, enterprise, and technology architecture stacks. In addition to being the first quadruple VMware Certified Design Expert (VCDX), he is also a Dell Technologies Certified Master Enterprise Architect, a Nutanix Platform Expert (NPX) and an NPX Panelist.
Published on:
Learn moreRelated posts
Fabric Mirroring for Azure Cosmos DB: Public Preview Refresh Now Live with New Features
We’re thrilled to announce the latest refresh of Fabric Mirroring for Azure Cosmos DB, now available with several powerful new features that e...
Power Platform – Use Azure Key Vault secrets with environment variables
We are announcing the ability to use Azure Key Vault secrets with environment variables in Power Platform. This feature will reach general ava...
Validating Azure Key Vault Access Securely in Fabric Notebooks
Working with sensitive data in Microsoft Fabric requires careful handling of secrets, especially when collaborating externally. In a recent cu...
Azure Developer CLI (azd) – May 2025
This post announces the May release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) – May 2025 appeared first on ...
Azure Cosmos DB with DiskANN Part 4: Stable Vector Search Recall with Streaming Data
Vector Search with Azure Cosmos DB In Part 1 and Part 2 of this series, we explored vector search with Azure Cosmos DB and best practices for...