Deploy Multi-Region HPC clusters in Azure with CycleCloud

CycleCloud Multi-Region Cluster Overview

Overview

High Performance Computing (HPC) clusters in Azure are almost exclusively deployed per Azure Region (ie. East US, South Central US, West Europe, etc). Data gravity usually drives this as your data should be as close to the compute as possible to reduce latency. If a need arises to use a different Region the default answer is to create a new/separate cluster in the Region and manage multiple clusters. This additional management overhead isn't always desired and many customers ask how to create a single cluster that can span multiple Azure regions. This blog will provide an example of how to create a Multi-Region Slurm cluster using Azure CycleCloud (CC).

What are some drivers for creating a Multi-Region HPC cluster?

Capacity. Sometimes a single Azure Region can not accommodate the quantity of compute cores needed. In this case the loosely coupled workload can be split across multiple regions to get access to all the compute cores requested.
Specialty Compute (ie. GPU, FPGA, Infiniband). Your organization created an Azure environment in a specific Region (ie. East US 2) and later requires specialty compute VMs not available in that Region. Examples of these are high end GPU VMs (ie. NDv4), FPGA VMs (ie. NP) or HPC VMs (ie. HB & HC).
Public Datasets. Azure hosts numerous Public Data Sets but they are generally specific to Regions (ie. West US 2). You may have an HPC cluster and Data Lake configured in South Central US but need a compute queue/partition in West US 2 to optimize use of the Public Data Set (ie. Genomics Data Lake).

REQUIREMENTS

Azure CC UI has filters to restrict configuring resources to a single region. To create a Multi-Region Cluster with CC requires all "Parameters" typically configured in the UI to be hardcoded in the cluster template file or a parameters file. The configured template file and parameters will be imported to CC as a cluster instance using the CC CLI.
Networking connectivity must exist between head node (aka "scheduler") in Region1 and compute nodes in Region2. The easiest way to accomplish this is with VNET Peering between VNET-2 in Region-2 and VNET-1 in Region-1 (Refer to drawing above). The cluster parameters file will need to specify both the Region and VNET for each node/nodearray definition (NOTE: these are typically defined once in the template [[ node defaults]] section). Example Azure CLI commands for VNET Peering:

#FOLLOWING EXAMPLE ASSUMES BOTH VNETS IN SAME RESOURCE GROUP #CREATE VNET PEERING FROM VNET-1 TO VNET-2 az network vnet peering create -g MyResourceGroup -n VNET1ToVNET2 \ --vnet-name VNET-1 --remote-vnet VNET-2 --allow-vnet-access #CREATE VNET PEERING FROM VNET-2 TO VNET-1 az network vnet peering create -g MyResourceGroup -n VNET2ToVNET1 \ --vnet-name VNET-2 --remote-vnet VNET-1 --allow-vnet-access

Name resolution is another key requirement to enable a Multi-Region cluster. Traditionally CC will provide name resolution by managing the /etc/hosts file on each cluster node with pre-populated hostnames in the format ip-0A0A0004, which is a hash of the node IP address (ie. 0A0A0004 > 10.10.0.4). This has been updated in CC v8.2.1 and Slurm Project version 2.5.x to use Azure DNS instead, which allows use of custom hostnames (for Nodes/VMs) and prefix (for NodeArray/VMSS). For a Multi-Region cluster this must to be taken a step further with use of an Azure Private DNS Zone linked to VNET-1 and VNET-2. For example:

# CREATE PRIVATE DNS ZONE az network private-dns zone create -g MyResourceGroup \ -n private.ccmr.net #LINK VNETS TO PRIVATE DNS ZONE az network private-dns link vnet create -g MyResourceGroup -n CCMRClusterLink1 \ -z private.ccmr.net -v VNET-1 -e true az network private-dns link vnet create -g MyResourceGroup -n CCMRClusterLink2 \ -z private.ccmr.net -v VNET-2 -e true

IMPLEMENT

The remaining portion assumes you have a working CC environment setup with CC CLI installed and VM quota in both Regions
Acquire the sample template from GitHub repo. No need to clone the entire repo, just download the Slurm Multi-Region template and accompanying Parameters file
Edit the parameters file ( slurm-multiregion-params-min.json ) in your editor of choice (ie. Visual Studio Code, vim, etc)
1. Credentials is the common name of the CC credential in your environment. This can be found in your CC GUI or CC CLI command: cyclecloud account list
2. Primary* represents the scheduler and HTC partition, whereas Secondary* represents the HPC partition
3. Update PrimarySubnet ,PrimaryRegion , SecondarySubnet & SecondaryRegion
  1. *Subnet is of the format resource-group-name/vnet-name/subnet-name (the template has a placeholder name)
  2. *Region Name can be found with the azure-cli command az account list-locations -o table
4. Update HPCMachineType , MaxHPCExecuteCoreCount, HTCMachineType &MaxHTCExecuteCoreCount as necessary
5. Save your updates and exit
Edit the template file ( slurm-multiregion-git.txt ) and replace private.ccmr.net with your specific Private DNS Zone name
Upload your modified template file to your CC server as follows:
1. cyclecloud import_cluster slurm-multigregion-cluster -c Slurm -f slurm-multiregion-git.txt -p slurm-multiregion-params-min.json
  1. slurm-multigregion-cluster = a name for the cluster chosen by you (no spaces)
  2. -c Slurm = name of the cluster defined in the template file (ie. line #6)
  3. -f slurm-multiregion-git.txt = file name of the template to upload
  4. -p slurm-multiregion-params-min.json = file name of the parameters file to upload
Submit a test job to the Region2 VM ( sbatch mpi.sh )#!/bin/bash #SBATCH --job-name=mpiMultiRegion #SBATCH --partition=hpc #SBATCH -N 2 #SBATCH -n 120 # 60 MPI processes per node #SBATCH --chdir /tmp #SBATCH --exclusive set -x source /etc/profile.d/modules.sh module load mpi/hpcx echo "SLURM_JOB_NODELIST = " $SLURM_JOB_NODELIST # Assign the number of processors NPROCS=$SLURM_NTASKS #Run the job mpirun -n $NPROCS --report-bindings echo "hello world!" mv slurm-${SLURM_JOB_ID}.out $HOME
```
NOTE:  the default Slurm working directory is the path from which the job was submitted, typically the user home directory.  As the home directory will likely be in Region1 its important to explicitly set a working dir to something local to Region2.  In the above example I set it to the VM local /tmp (#SBATCH --chdir /tmp) and added a line at the end to move the Slurm output file to the user home directory.
```
Review the output file in your home directory (ie. slurm-2.out for JobID 2)

CONCLUSION

With careful planning and implementation it is possible to create a Slurm Multi-Region cluster with Azure CycleCloud. This blog is not all inclusive and there is likely additional customization required for a customer specific environment, such as adding mounts (ie. datasets) specific to the workflow in Region2.

Published on: April 05, 2022

Learn more

Azure Compute Blog articles

Episode 413 – Simplifying Azure Files with a new file share-centric management model

Welcome to Episode 413 of the Microsoft Cloud IT Pro Podcast. Microsoft has introduced a new file share-centric management model for Azure Fil...

21 hours ago

Bringing Context to Copilot: Azure Cosmos DB Best Practices, Right in Your VS Code Workspace

Developers love GitHub Copilot for its instant, intelligent code suggestions. But what if those suggestions could also reflect your specific d...

1 day ago

Build an AI Agentic RAG search application with React, SQL Azure and Azure Static Web Apps

Introduction Leveraging OpenAI for semantic searches on structured databases like Azure SQL enhances search accuracy and context-awareness, pr...

1 day ago

Announcing latest Azure Cosmos DB Python SDK: Powering the Future of AI with OpenAI

We’re thrilled to announce the stable release of Azure Cosmos DB Python SDK version 4.14.0! This release brings together months of innov...

3 days ago

How Azure CLI handles your tokens and what you might be ignoring

Running az login feels like magic. A browser pops up, you pick an account, and from then on, everything just works. No more passwords, no more...

4 days ago

Boost your Azure Cosmos DB Efficiency with Azure Advisor Insights

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database service, trusted for mission-critical workloads that demand high ava...

6 days ago

Microsoft Azure Fundamentals #5: Complex Error Handling Patterns for High-Volume Microsoft Dataverse Integrations in Azure

🚀 1. Problem Context When integrating Microsoft Dataverse with Azure services (e.g., Azure Service Bus, Azure Functions, Logic Apps, Azure SQ...

7 days ago

Using the Secret Management PowerShell Module with Azure Key Vault and Azure Automation

Automation account credential resources are the easiest way to manage credentials for Azure Automation runbooks. The Secret Management module ...

8 days ago

Microsoft Azure Fundamentals #4: Azure Service Bus Topics and Subscriptions for multi-system CRM workflows in Microsoft Dataverse / Dynamics 365

🚀 1. Scenario Overview In modern enterprise environments, a single business event in Microsoft Dataverse (CRM) can trigger workflows across m...

8 days ago

Easily connect AI workloads to Azure Blob Storage with adlfs

Microsoft works with the fsspec open-source community to enhance adlfs. This update delivers faster file operations and improved reliability f...

8 days ago

Blog image

Azure Compute Blog articles

Learn more

Deploy Multi-Region HPC clusters in Azure with CycleCloud

CycleCloud Multi-Region Cluster Overview

Overview

What are some drivers for creating a Multi-Region HPC cluster?

REQUIREMENTS

IMPLEMENT

CONCLUSION

Related posts