A quick start guide to benchmarking AI models in Azure: MLPerf Inferencing v2.0
This blog was authored by Aimee Garcia, Program Manager - AI Benchmarking. Additional contributions by Program Manager Daramfon Akpan, Program Manager Gaurav Uppal, Program Manager Hugo Affaticati.
Microsoft Azure’s publicly available AI inferencing capabilities are led by the NDm A100 v4, ND A100 v4 and NC A100 v4 virtual machines (VMs) powered by the latest NVIDIA A100 Tensor Core GPUs. These results showcase Azure’s commitment to making AI inferencing available to all researchers and users in the most accessible way while raising the bar in AI inferencing in Azure. To see the announcement on Azure.com please click here.
Highlights from the results
ND96amsr A100 v4 powered by NVIDIA A100 80G SXM Tensor Core GPU
|
Benchmark |
Samples/second |
Queries/second |
Scenarios |
|
bert-99 |
27.5K+ |
~22.5K |
Offline and server |
|
resnet |
300K+ |
~200K+ |
Offline and server |
|
3d-unet |
24.87 |
|
Offline |
NC96ads A100 v4 powered by NVIDIA A100 80G PCIe Tensor Core GPU
|
Benchmark |
Samples/second |
Queries/second |
Scenarios |
|
bert-99.9 |
~6.3K |
~5.3K |
Offline and server |
|
resnet |
144K |
~119.6K |
Offline and server |
|
3d-unet |
11.7 |
|
Offline |
The results were generated by deploying the environment using the VM offerings and Azure’s Ubuntu 18.04-HPC marketplace image.
Steps to reproduce the results in Azure
Set up and connect to a VM via SSH - decide which VM you want to benchmark
- Image: Ubuntu 18.04-HPC marketplace image
- Availability: Depending on client need (ex. No redundancy)
- Region: Depending on client need (ex. South Central US)
Set up the dependencies
- Verify the nvidia-docker version:
cd /mnt
nvidia-smi
- If the driver version is less than 510, install the following: CUDA Toolkit 11.6 Downloads | NVIDIA Developer
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo dpkg -i cuda-repo-ubuntu1804-11-6-local_11.6.1-510.47.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
- Update docker to the latest version:
sudo dpkg -P moby-cli
curl https://get.docker.com | sh && sudo systemctl --now enable docker
sudo chmod 777 /var/run/docker.sock
- To check the docker version run:
docker info (add check docker version by running this)
sudo reboot
You should have version 20.10.12 or newer
- To verify the version again:
nvidia-smi
- Create and run the script to mount the nvme disk using the following:
cd /mnt
sudo touch nvme.sh
sudo vi nvme.sh
- RAID the nvme disks and mount onto the machine by copying and inserting the following into your file:
#!/bin/bash
NVME_DISKS_NAME=`ls /dev/nvme*n1`
NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`
echo "Number of NVMe Disks: $NVME_DISKS"
if [ "$NVME_DISKS" == "0" ]
then
exit 0
else
mkdir -p /mnt/resource_nvme
# Needed incase something did not unmount as expected. This will delete any data that may be left behind
mdadm --stop /dev/md*
mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME
mkfs.xfs -f /dev/md128
mount /dev/md128 /mnt/resource_nvme
fi
chmod 1777 /mnt/resource_nvme
- Run the script:
sudo sh nvme.sh
- Update Docker root directory in the docker daemon config file:
sudo vi /etc/docker/daemon.json
Add this line after the first curly bracket:
"data-root": "/mnt/resource_nvme/data",
- Run the following:
sudo systemctl restart docker
cd resource_nvme
- Now that your environment is set up, get the repository from the MLCommons github and run the benchmarks:
- When setting up the scratch path, the path should be /mnt/resource_nvme/scratch
export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch
- Run benchmarks by following the steps in the README.md file in the working directory. To open the file:
vi README.md
Below are graphs showing the achieved results for the NDm A100 v4, NC A100 v4 and ND A100 v4 VMs. The units are in throughput/second (samples and queries).
More about MLPerf
To learn more about MLCommons benchmarks, visit the MLCommons website.
Published on:
Learn moreRelated posts
Microsoft Entra ID Governance: Azure subscription required to continue using guest governance features
Starting January 30, 2026, Microsoft Entra ID Governance requires tenants to link an Azure subscription to use guest governance features. With...
Azure Developer CLI (azd) – January 2026: Configuration & Performance
This post announces the January 2026 release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) – January 2026: Conf...
Azure SDK Release (January 2026)
Azure SDK releases every month. In this post, you'll find this month's highlights and release notes. The post Azure SDK Release (January 2026)...
Azure Cosmos DB TV Recap – From Burger to Bots – Agentic Apps with Cosmos DB and LangChain.js | Ep. 111
In Episode 111 of Azure Cosmos DB TV, host Mark Brown is joined by Yohan Lasorsa to explore how developers can build agent-powered application...
Accelerate Your Cosmos DB Infrastructure with GitHub Copilot CLI and Azure Cosmos DB Agent Kit
Modern infrastructure work is increasingly agent driven, but only if your AI actually understands the platform you’re deploying. This guide sh...
Accelerate Your Cosmos DB Infrastructure with GitHub Copilot CLI and Azure Cosmos DB Agent Kit
Modern infrastructure work is increasingly agent driven, but only if your AI actually understands the platform you’re deploying. This guide sh...
SharePoint: Migrate the Maps web part to Azure Maps
The SharePoint Maps web part will migrate from Bing Maps to Azure Maps starting March 2026, completing by mid-April. Key changes include renam...
Microsoft Azure Maia 200: Scott Guthrie EVP
Azure Cosmos DB TV Recap: Supercharging AI Agents with the Azure Cosmos DB MCP Toolkit (Ep. 110)
In Episode 110 of Azure Cosmos DB TV, host Mark Brown is joined by Sajeetharan Sinnathurai to explore how the Azure Cosmos DB MCP Toolkit is c...