VMware HCX Troubleshooting with Azure VMware Solution
Overview
VMware HCX is one of the Azure VMware Solution components that generates a large number of service requests from our customers. The Azure VMware Solution product group has worked to cover the most common troubleshooting considerations that you should know about when using VMware HCX with the Azure VMware Solution.
Azure VMware Solution is a VMware validated first party Azure service from Microsoft that provides private clouds containing VMware vSphere clusters built from dedicated bare-metal Azure infrastructure. It enables customers to leverage their existing investments in VMware skills and tools, allowing them to focus on developing and running their VMware-based workloads on Azure.
VMware HCX is the mobility and migration software used by the Azure VMware Solution to connect remote VMware vSphere environments to the Azure VMware Solution. These remote VMware vSphere environments can be on-premises, co-location or cloud-based instances.
Figure 1 – Azure VMware Solution with VMware HCX Service Mesh
In the next section, I will introduce the architectural components of the Azure VMware Solution.
Architectural Components
The diagram below describes the architectural components of the Azure VMware Solution.
Figure 2 – Azure VMware Solution Architectural Components
Each Azure VMware Solution architectural component has the following function:
- Azure Subscription: Used to provide controlled access, budget and quota management for the Azure VMware Solution.
- Azure Region: Physical locations around the world where we group data centers into Availability Zones (AZs) and then group AZs into regions.
- Azure Resource Group: Container used to place Azure services and resources into logical groups.
- Azure VMware Solution Private Cloud: Uses VMware software, including vCenter Server, NSX software-defined networking, vSAN software-defined storage, and Azure bare-metal ESXi hosts to provide compute, networking, and storage resources. Azure NetApp Files, Azure Elastic SAN, and Pure Cloud Block Store are also supported.
- Azure VMware Solution Resource Cluster: Uses VMware software, including vSAN software-defined storage, and Azure bare-metal ESXi hosts to provide compute, networking, and storage resources for customer workloads by scaling out the Azure VMware Solution private cloud. Azure NetApp Files, Azure Elastic SAN, and Pure Cloud Block Store are also supported.
- VMware HCX: Provides mobility, migration, and network extension services.
- VMware Site Recovery: Provides Disaster Recovery automation, and storage replication services with VMware vSphere Replication. Third party Disaster Recovery solutions Zerto DR and JetStream DR are also supported.
- Dedicated Microsoft Enterprise Edge (D-MSEE): Router that provides connectivity between Azure cloud and the Azure VMware Solution private cloud instance.
- Azure Virtual Network (VNet): Private network used to connect Azure services and resources together.
- Azure Route Server: Enables network appliances to exchange dynamic route information with Azure networks.
- Azure Virtual Network Gateway: Cross premises gateway for connecting Azure services and resources to other private networks using IPSec VPN, ExpressRoute, and VNet to VNet.
- Azure ExpressRoute: Provides high-speed private connections between Azure data centers and on-premises or colocation infrastructure.
- Azure Virtual WAN (vWAN): Aggregates networking, security, and routing functions together into a single unified Wide Area Network (WAN).
In the next section, I will describe the troubleshooting steps you should follow for VMware HCX when used with the Azure VMware Solution.
Troubleshooting Considerations
Before opening a ticket with Microsoft support, please use the following steps as a checklist to ensure you are not impacted by the most common VMware HCX issues.
Troubleshooting Step 1: Download the VMware HCX Connector.
Once VMware HCX is deployed on the Azure VMware Solution side, the download for the VMware HCX Connector OVA is in the VMware HCX UI plugin. Under the Administration there is a Request Download Link. The OVA can be copied locally or a download link for the OVA can be selected.
Figure 3 – VMware HCX Connector OVA Download
Troubleshooting Step 2: Upgrade to HCX Enterprise.
Azure VMware Solution comes with an Enterprise license key for VMware HCX. If you have a pre-existing VMware HCX Connector on-prem that is licensed for VMware HCX Advanced, please be sure to upgrade the connector to the Enterprise version. To upgrade VMware HCX navigate to the HCX Connector at https://<hcx_connector_fqdn>:9443, under the Configuration section select Licensing and Activation, edit the current license and enter the VMware HCX enterprise license key obtained from the Azure VMware Solution portal. Verify that the License is showing Enterprise.
Figure 4 – VMware HCX Connector License Key
Once you have updated the VMware HCX Connector, be sure to update/edit the VMware HCX Compute Profile and Service Mesh to include the updated VMware HCX services that you would like to take advantage of, such as Replicated Assisted vMotion and OS Assisted Migration. OS Assisted Migration is used for migrating and converting Microsoft Hyper-V and RedHat KVM workloads into Azure VMware Solution.
Figure 5 – VMware HCX Connector Compute Profile Service Activation
Troubleshooting Step 3: Do not use an IPSec VPN.
If possible, avoid using an IPSec VPN connection to Azure VMware Solution when migrations with VMware HCX will happen. Migrating with VMware HCX over VPN has been known to cause issues and multiple failures around migrations. Although utilizing VMware HCX via VPN is supported, it is not the recommended way to migrate virtual machines to Azure VMware Solution. One of the biggest caveats of migrating VMs with VMware HCX over VPN is that a separate uplink network profile is needed on-premises. The management network cannot be used as an uplink profile, as the MTU of the uplink profile needs to be adjusted to 1300 to accommodate the IPSec overhead. Note that VMware HCX uses IPSec VPN natively as part of the VMware HCX Service Mesh.
Troubleshooting Step 4: Check MTU size within your Network Profile.
Be sure to verify the MTU setting on the Network Profiles setup. Within VMware HCX, navigate to the Interconnect section, select Network Profiles and be sure to verify the correct MTU size is being used for each Profile. Be sure to verify this on both ends of the VMware HCX site pair.
Figure 6 – VMware HCX MTU size in Network Profile
Use this guide of recommended MTU sizes for the Network Profiles in the table below when connecting to Azure VMware Solution.
Connectivity Method |
Management |
Uplink |
Replication |
vMotion |
Azure ExpressRoute |
1500 |
1500 |
1500 or 9000 |
1500 or 9000 |
VMware HCX over IPSec VPN |
1500 |
1300 |
1500 or 9000 |
1500 or 9000 |
Table 1 – VMware HCX Network Profile MTU Sizes
Troubleshooting Step 5: Always keep your VMware HCX versions updated (Connectors, Cloud Manager and Service Meshes).
Before you upgrade VMware HCX, check the VMware product interoperability matrix to ensure the integrated versions of on-premises VMware solution software are supported by the new version of VMware HCX you are going to upgrade to.
Updates to VMware HCX are released regularly by VMware. It is the responsibility of the customer to upgrade and maintain VMware HCX on both sides of the Service Mesh (on-premises and Azure VMware Solution). When updating VMware HCX, the VMware HCX Cloud Managers should be updated first. It is recommended to create a back-up to the VMware HCX Connector before updating.
Backups to the VMware HCX Connector can be done through the VMware HCX manager UI at https://<hcx_connector_fqdn>:9443 with the admin password created at the time of VMware HCX Connector deployment. Under the Administration section head to the Backups and restore section. Backups can be taken here and scheduled to be taken as well.
Optionally, you can take a vSphere snapshot of the VMware HCX Connector on-premises as well.
Figure 7 – VMware HCX Connector Backup & Restore
Updates for the VMware HCX Cloud Managers can be found in the administration section, select your current version, and hit the ‘Check for Updates’ button. If a new version is available, you will be able to download and update to the newest version. Backups of the VMware HCX Cloud Manager are taken automatically each day.
Figure 8 – VMware HCX Upgrades
It should be noted that VMware HCX Service Meshes are updated independently of the VMware HCX Cloud Managers and Connectors. Upon completion of the VMware HCX Cloud Manager and Connector updates, Service Meshes should be updated next. VMware HCX Cloud Managers and Service Meshes should be upgraded in order and together as to not cause an issue with mixed mode versions of Managers and Service Meshes. Running mixed mode versions of VMware HCX Cloud Managers, Connectors, and Service Meshes in production is highly discouraged. You can lose certain features and it often creates issues within the environment.
Figure 9 – VMware HCX Manager Service Mesh Update
During the Service Mesh update process, if Network Extension appliances are deployed a temporary loss of connectivity will occur while the appliances update. For Network Extension in an HA pair, down time is approximately a few seconds. Network Extension appliances not in an HA pair will incur downtime of approximately one minute.
Troubleshooting Step 6: On-Premises Network Connectivity and Firewalls.
For VMware HCX to be activated and receive updates, your on-premises firewalls need to allow outbound traffic to port 443 for the following websites:
Your on-premises firewalls will also need to allow outbound traffic to UDP port 4500. Within VMware HCX UDP port 4500 serves a specific purpose, it allows IPSec VPN communication between VMware HCX components across environments and is essential for communication and data transfer between environments to work. When configuring VMware HCX, you need to ensure that this port is open between your on-premises VMware HCX Connector uplink network profile and the Azure VMware Solution HCX Cloud Manager uplink network profile.
Another common issue we see within VMware HCX, is that your on-premises VMware HCX Connector is unable to reach the VMware HCX activation and entitlement website. A simple way to verify your on-premises environment has access to the activation and entitlement website is as follows. SSH into the on-premises VMware HCX Connector and run the below curl commands to verify connectivity:
- Curl -k -v https://connect.hcx.vmware.com
- Curl -k -v https://hyridity-depot.vmware.com
A successful connection to the above website will look like the figure below.
Figure 10 – VMware HCX Connector SSH CURL connectivity test
Troubleshooting Step 7: Diagnostics page on the Service Mesh.
Built into the VMware HCX Service Mesh there is an option to run a diagnostics check on the Service Mesh appliances. This is an effective way to verify the health of your Service Mesh and pinpoint any specific issues the appliances may have.
In the VMware HCX Connect user interface, under the Interconnect section, select the Service Mesh you want to run the diagnostics on. Under the “More” link, select Run Diagnostics to perform a health check on the appliances.
Figure 11 – VMware HCX Service Mesh Run Diagnostics
Once the Diagnostics test is completed, if there are any issues, a red banner will appear under the Service Mesh name. You can drill down to the specific issues by clicking on the red alert (!) icon.
Figure 12 – VMware HCX Service Mesh Alert
Troubleshooting Step 8: Network Extensions are for temporary migration phases, not for permanent use.
At its core VMware HCX is a migration tool. When using Network Extensions in VMware HCX, it is important to understand that these Network Extensions should be a temporary solution used during the migration process to migrate VMs into Azure VMware Solution with no downtime during the migration. It is best practice to remove the network extensions as soon as the migration waves are completed. Leaving network extensions in place for extended periods of time can cause issues and outages in your environment. Use Network Extensions with caution.
Figure 13 – VMware HCX Network Extension
Troubleshooting Step 9: If you are having issues with the source side interface reboot the VMware HCX Connector.
VMware HCX Connectors may have issues over time. It is recommended to reboot the VMware HCX Connector if it has been up and running for an extended period without a reboot.
On the Azure VMware Solution side, we do have the option for customers to reboot the VMware HCX Cloud Manager within Azure VMware Solution through a Run Command in the Azure portal. The option to Force or Hard Reboot the VMware HCX Cloud Manager is also an option that is offered. Please use this with caution as it does not check for any active migrations or replications that may be occurring.
Figure 14 – Azure VMware Solution Run Command Restart-HCXManager
Troubleshooting Step 10: Logging into the VMware HCX Cloud Manager directly
You have the ability to log into the VMware HCX Cloud Manager directly. At times the VMware HCX plugin through your Azure VMware Solution vSphere Client will not be available or fail to open. You can obtain the IP address of the VMware HCX Cloud Manager in the Azure portal when you are in the Azure VMware Solution resource. In the Add-ons section under the “Migration using VMware HCX”, the IP address of the VMware HCX Cloud manager will be listed. It is part of the /22 network you provided when deploying Azure VMware Solution. Access the manager directly at https://<x.x.x.9>:443 or https://hcx.<guid>.<region>.avs.azure.com. The VMware HCX Cloud Manager will always end with a .9 octet.
Figure 15 – VMware HCX Cloud Manager Login
Troubleshooting Step 11: Only use the key from the Azure VMware Solution private cloud you are connecting to.
When deploying the VMware HCX Connector on-premises, the activation key should come from the Azure VMware Solution you are migrating to. In the Azure portal, an activation Key can be obtained in the Add-Ons section. Simply request an activation key, provide it with a friendly name and map that activation key to the on-premises VMware HCX connector.
Figure 16 – VMware HCX Connector License Key
Troubleshooting Step 12: If you have Mobility Optimized Networking (MON) enabled, ensure you have the router location set to the correct side.
When configuring MON, verify where the default gateway resides. The default gateway will always be located on the source side of the network extension. Primarily, it will reside in the on-premises data center when connecting to Azure VMware Solution.
Figure 17 – VMware HCX Mobility Optimized Network (MON)
Troubleshooting Step 13: OS Assisted Migration -Sentinel Gateway Appliances.
When using VMware HCX OS Assisted Migration, it is important to maintain and manage the VMware HCX Sentinel Gateway Appliance (SGW) at the source site (On-premises). The Sentinel Gateway Appliance is responsible for establishing a forwarding connection with the VMware HCX Sentinel Data Receiver (SDR) on the destination site. Managing and maintaining the Sentinel Gateway appliance’s resources, CPU and memory configuration, is the responsibility of the customer.
Next Steps
If this has not resolved the VMware HCX issue in your Azure VMware Solution private cloud, please open a Service Request with Microsoft to continue the resolution process.
Summary
In this post, we described helpful troubleshooting tips when facing some of the most common VMware HCX services issues our customers have with the Azure VMware Solution.
If you are interested in the Azure VMware Solution, please use these resources to learn more about the service:
- Homepage: Azure VMware Solution
- Documentation: Azure VMware Solution
- SLA: SLA for Azure VMware Solution
- Azure Regions: Azure Products by Region
- VMware Ports and Protocols for HCX VMware HCX - VMware Ports and Protocols
- VMware Interoperability Matrix Product Interoperability Matrix (vmware.com)
- VMware HCX: Configuration & Best Practices
- Design: Availability Design Considerations
- Design: Recoverability Design Considerations
- Design: Performance Design Considerations
- Design: Security Design Considerations
- GitHub repository: Azure/azure-vmware-solution
- Well-Architected Framework: Azure VMware Solution workloads
- Cloud Adoption Framework: Introduction to the Azure VMware Solution adoption scenario
- Network connectivity scenarios: Enterprise-scale network topology and connectivity for Azure VMware Solution
- Enterprise Scale Landing Zone: Enterprise-scale for Microsoft Azure VMware Solution
- Enterprise Scale GitHub repository: Azure/Enterprise-Scale-for-AVS
- Azure CLI: Azure Command-Line Interface (CLI) Overview
- PowerShell module: Az.VMware Module
- Azure Resource Manager: Microsoft.AVS/privateClouds
- REST API: Azure VMware Solution REST API
- Terraform provider: azurerm_vmware_private_cloud Terraform Registry
Author Bios
Ricky Perez is a Senior Cloud Solution Architect in the international Customer Success Unit (iCSU) at Microsoft. His background is in solution architecture with experience in public cloud and core infrastructure services.
Jason Trammell is a Senior Software Engineer in the Azure VMware Solution engineering group at Microsoft.
Kenyon Hensler is a Principal Technical Program Manager in the Azure VMware Solution product group at Microsoft. His background is in system engineering with experience across all facets of enterprise networking and compute stacks.
René van den Bedem is a Principal Technical Program Manager in the Azure VMware Solution product group at Microsoft. His background is in enterprise architecture with extensive experience across all facets of the enterprise, public cloud & service provider spaces, including digital transformation and the business, enterprise, and technology architecture stacks. René works backwards from the problem to be solved and designs solutions that deliver business value with the minimum of risk. In addition to being the first quadruple VMware Certified Design Expert (VCDX), he is also a Dell Technologies Certified Master Enterprise Architect, a Nutanix Platform Expert (NPX), and a VMware vExpert.
Published on:
Learn moreRelated posts
This Month in Azure Static Web Apps | 09/2024
We are back with another edition of the Azure Static Web Apps Community! :party_popper: September was yet another month ...
IPv6 Adoption: Enhancing Azure WAF on Front Door
The transition to IPv6 is a significant step for enterprise corporations, reflecting the evolution of internet technology and the need for a l...
Introducing the Data-Bound Reference Layer in Azure Maps Visual for Power BI
Imagine managing a nationwide sales team and needing to understand how your sales align with factors like population density, competitor locat...
GitHub Copilot for Azure: 6 Must-Try Features
As developers, we are constantly seeking tools that streamline our workflows and boost productivity. … Enter GitHub Copilot for Azure, now in ...
Unlocking the Best of Azure with AzureRM and AzAPI Providers
With the recent release of AzAPI 2.0, Azure offers two powerful Terraform providers to meet your infrastructure needs: AzureRM and AzAPI. The ...
Azure Communication Services Ideas Board: Share your feedback with the product team
Innovation is not a solitary pursuit, and we recognize that some of the best ideas come from you, our Azure Communication Services community. ...
Engage with the Azure Community Services Ideas Board: Your Voice Matters
Innovation is not a solitary pursuit, and we recognize that some of the best ideas come from you, our Azure Communication Services community. ...
Optimizing custom copilot (agent) performance with Azure Load Testing: A comprehensive guide
As we move into the next phase of digital transformation, the role of custom copilots is set to become increasingly pivotal. By leveragin...
Azure Storage - TLS 1.0 and 1.1 retirement
Overview TLS 1.0 and 1.1 retirement on Azure Storage was previously announced for Nov 1st, 2024, and it was postponed recently to 1 year later...