Loading...

A quick start guide to benchmarking AI models in Azure: MLPerf Inferencing v2.0

A quick start guide to benchmarking AI models in Azure: MLPerf Inferencing v2.0

This blog was authored by Aimee Garcia, Program Manager - AI Benchmarking.  Additional contributions by Program Manager Daramfon Akpan, Program Manager Gaurav Uppal, Program Manager Hugo Affaticati.

 

Microsoft Azure’s publicly available AI inferencing capabilities are led by the NDm A100 v4, ND A100 v4 and NC A100 v4 virtual machines (VMs) powered by the latest NVIDIA A100 Tensor Core GPUs. These results showcase Azure’s commitment to making AI inferencing available to all researchers and users in the most accessible way while raising the bar in AI inferencing in Azure.  To see the announcement on Azure.com please click here.

 

Highlights from the results

ND96amsr A100 v4 powered by NVIDIA A100 80G SXM Tensor Core GPU

Benchmark

Samples/second

Queries/second

Scenarios

bert-99

27.5K+

~22.5K

Offline and server

resnet

300K+

~200K+

Offline and server

3d-unet

24.87

 

Offline

 

NC96ads A100 v4 powered by NVIDIA A100 80G PCIe Tensor Core GPU

Benchmark

Samples/second

Queries/second

Scenarios

bert-99.9

~6.3K

~5.3K

Offline and server

resnet

144K

~119.6K

Offline and server

3d-unet

11.7

 

Offline

 

The results were generated by deploying the environment using the VM offerings and Azure’s Ubuntu 18.04-HPC marketplace image.

 

Steps to reproduce the results in Azure

Set up and connect to a VM via SSH - decide which VM you want to benchmark

  • Image: Ubuntu 18.04-HPC marketplace image
  • Availability: Depending on client need (ex. No redundancy)
  • Region: Depending on client need (ex. South Central US)

Set up the dependencies

  1. Verify the nvidia-docker version:

cd /mnt

nvidia-smi

 

  1. If the driver version is less than 510, install the following: CUDA Toolkit 11.6 Downloads | NVIDIA Developer

sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin

sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

sudo wget https://developer.download.nvidia.com/compute/cuda/11.6.1/local_installers/cuda-repo-ubuntu1804-11-6-local_11.6.1-510.47.03-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu1804-11-6-local_11.6.1-510.47.03-1_amd64.deb

sudo apt-key add /var/cuda-repo-ubuntu1804-11-6-local/7fa2af80.pub

sudo apt-get update

sudo apt-get -y install cuda

 

  1. Update docker to the latest version:

sudo dpkg -P moby-cli

curl https://get.docker.com | sh && sudo systemctl --now enable docker​

sudo chmod 777 /var/run/docker.sock

 

  1. To check the docker version run:

docker info (add check docker version by running this)

sudo reboot

 

You should have version 20.10.12 or newer

 

  1. To verify the version again:

nvidia-smi

 

  1. Create and run the script to mount the nvme disk using the following:

cd /mnt

sudo touch nvme.sh

sudo vi nvme.sh

 

  1. RAID the nvme disks and mount onto the machine by copying and inserting the following into your file:

#!/bin/bash 

 

NVME_DISKS_NAME=`ls /dev/nvme*n1`

NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`

 

echo "Number of NVMe Disks: $NVME_DISKS"

 

if [ "$NVME_DISKS" == "0" ]

then

    exit 0

else

    mkdir -p /mnt/resource_nvme

    # Needed incase something did not unmount as expected. This will delete any data that may be left behind

    mdadm  --stop /dev/md*

    mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME

    mkfs.xfs -f /dev/md128

    mount /dev/md128 /mnt/resource_nvme

fi

chmod 1777 /mnt/resource_nvme

 

  1. Run the script:

sudo sh nvme.sh

 

  1. Update Docker root directory in the docker daemon config file:

sudo vi /etc/docker/daemon.json

Add this line after the first curly bracket:

"data-root": "/mnt/resource_nvme/data",

 

  1. Run the following:

sudo systemctl restart docker

cd resource_nvme

 

  1. Now that your environment is set up, get the repository from the MLCommons github and run the benchmarks:
    • When setting up the scratch path, the path should be /mnt/resource_nvme/scratch

export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch

 

  1. Run benchmarks by following the steps in the README.md file in the working directory. To open the file:

vi README.md

 

Below are graphs showing the achieved results for the NDm A100 v4, NC A100 v4 and ND A100 v4 VMs. The units are in throughput/second (samples and queries).

 

KevinRaines_0-1648829171858.png

 

KevinRaines_1-1648829184471.png

 

KevinRaines_2-1648829191850.png

 

KevinRaines_3-1648829201351.png

 

More about MLPerf

To learn more about MLCommons benchmarks, visit the MLCommons website.

Published on:

Learn more
Azure Compute Blog articles
Azure Compute Blog articles

Azure Compute Blog articles

Share post:

Related posts

Setting up Team-Based Access for Dynamics 365 CRM Documents Stored on SharePoint, Dropbox or Azure Blob Storage

Attach2Dynamics by Inogic is a seamless document management solution for Dynamics 365 CRM that integrates with popular cloud storage platforms...

3 hours ago

Azure SDK Release (October 2024)

The Azure SDKs release every month. This post includes the month's highlights and release notes. The post Azure SDK Release (October 2024) app...

15 hours ago

Using Entra profile information in Azure DevOps

We’re excited to announce the ability to use Entra profile information in Azure DevOps. This has been a long-standing feature request from the...

15 hours ago

Exploring SUSE Enterprise Linux on Azure

Exploring SUSE Enterprise Linux on Azure In today's cloud-centric world, leveraging robust and reliable operating systems is crucial for busin...

18 hours ago

Azure Adaptive Cloud Pre-Days at Microsoft Ignite 2024

As the excitement builds for Microsoft Ignite 2024, tech enthusiasts and professionals worldwide are eagerly anticipating the Azure Adaptive C...

23 hours ago

Build Intelligent Apps Code-First with Prompty and Azure AI

      Building Generative AI applications can feel daunting for traditional app developers. What does the end-to-end applicati...

2 days ago

Soccer Analytics Copilot with Azure SQL and OpenAI

The Football (aka Soccer in US 😀) Analisys Copilot provides an intuitive interface for users to interact with complex football data without n...

2 days ago

End-to-End Full-Stack Web Application with Azure AD B2C Authentication: A Complete Guide

Application Overview The purpose of this sample application is to demonstrate the usage of Azure Active Directory B2C (Azure AD B2C) for authe...

3 days ago

Updates to Azure Cosmos DB’s Portal Networking Settings

We are happy to share with you an update to the Azure Cosmos DB networking configuration options within the Azure Portal. This update introduc...

3 days ago
Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy