General Availability: Azure confidential VMs with NVIDIA H100 Tensor Core GPUs
Today, we are announcing the general availability of Azure confidential virtual machines (VMs) with NVIDIA H100 Tensor core GPUs. These VMs combine the hardware-based data-in-use protection capabilities of 4th generation AMD EPYCTM processor based confidential VMs with the performance of NVIDIA H100 Tensor Core GPUs. By enabling confidential computing on GPUs, Azure offers customers more options and flexibility to run their workload securely and efficiently on the cloud. These VMs are ideal for inferencing, fine-tuning or training small-to-medium sized models such as Whisper, Stable diffusion and its variants (SDXL, SSD), and language models such as Zephyr, Falcon, GPT2, MPT, Llama2, Wizard and Xwin.
Azure NCC H100 v5 virtual machines are currently available in East US2 and West Europe regions.
Figure 1. Simplified NCCH100 v5 architecture
Hardware partner endorsements
We are grateful to our hardware partners for their support and endorsements.
“The expanding landscape of innovations, particularly generative AI, are creating boundless opportunities for enterprises and developers. NVIDIA’s accelerated computing platform equips pioneers like Azure to boost performance for AI workloads while maintaining robust security through confidential computing.” Daniel Rohrer, VP of software product security, architecture and research, NVIDIA.
"AMD is a pioneer in confidential computing, with a long-standing collaboration with Azure to enable numerous confidential computing services powered by our leading AMD EPYC processors. We are now expanding our confidential computing capabilities into AI workloads with the new Azure confidential VMs with NVIDIA H100 Tensor Core GPUs and 4th Gen AMD EPYC CPUs, the industry's first offering of a confidential AI service. We are excited to expand our confidential computing offerings with Azure to address demands of AI workloads." Ram Peddibhotla, corporate vice president, product management, cloud business, AMD.
Customer use cases and feedback
Some examples of workloads our customers have experimented with during the preview and planning further with the power of Azure NCC H100 v5 GPU virtual machine are:
- Confidential inference on audio to text (Whisper models)
- Video input to detect anomaly behavior for incident prevention - leveraging confidential computing to meet data privacy.
- Stable diffusion with privacy sensitive design data in the automobile industry (inference & training)
- Multi-party clean rooms to run analytical tasks against billions of transactions and terabytes of data of financial institute and its subsidiaries.
Advancing AI securely is core to our mission, and we were pleased to collaborate with Azure confidential computing to validate and test Confidential Inference for our audio-to-text Whisper models on Nvidia GPUs. Matthew Knight, Head of Security, OpenAI |
|
F5 can leverage Microsoft Azure Confidential VMs with NVIDIA H100 Tensor Core GPUs to develop and deploy GenAI models. While the AI model learns from private data, the underlying information remains encrypted within the Trusted Execution Environments (TEEs). This solution allows us to build advanced AI-powered security solutions, while ensuring confidentiality of the data our models are analyzing. This bolsters customer trust and strengthens our position as a leader in secure network protection. Azure confidential computing helps us build a better, more secure, and more innovative digital world. Arul Elumalai, SVP & GM, Distributed Cloud Platform & Security Services, F5, Inc. |
|
ServiceNow works closely with Microsoft, NVIDIA, and Opaque to put AI to work for people and deliver great experiences to both customers and employees on the Now Platform. The partnership between Opaque and Microsoft allows us to quickly deploy and leverage the power of Azure confidential VMs with NVIDIA H100 Tensor Core GPUs to deliver confidential AI with verifiable data privacy and security. Kellie Romack, Chief Digital Information Officer, ServiceNow |
|
The integration of the Opaque platform with Azure confidential VMs with NVIDIA H100 Tensor Core GPUs to create Confidential AI makes AI adoption faster and easier by helping to eliminate data sovereignty and privacy concerns. Confidential AI is the future of AI deployments, and with Opaque, Microsoft Azure, and NVIDIA, we're making this future a reality today. Aaron Fulkerson, CEO, Opaque Systems |
|
Leveraging the power of the preview of the Azure confidential VMs with NVIDIA H100 Tensor Core GPUs, our team has successfully integrated 'Constellation', a Kubernetes distribution focused on Confidential Computing, with GPU capabilities. This allows customers to lift and shift even sophisticated AI stacks to Azure confidential computing. With 'Continuum AI', we've created a framework for the end-to-end confidential serving of LLMs that ensures the utmost privacy of data, setting a new standard in AI inference solutions. We are thrilled to partner with Azure confidential computing to uncover the transformative potential of Confidential Computing, especially in the era of generative AI. Felix Schuster, CEO and co-founder, Edgeless Systems |
|
Cyborg is excited to collaborate with Azure in previewing Azure confidential VMs with NVIDIA H100 Tensor Core GPUs. This partnership allows us to leverage GPU acceleration for our Confidential Vector Search algorithm, maintaining the highest degree of security while readying it for the stringent performance requirements of AI applications. We eagerly await the general availability of this VM SKU as we prepare to deploy our production-grade service. Nicolas Dupont, CEO, Cyborg |
“RBC has been working very closely with Microsoft on confidential computing initiatives since the early days of technology availability within Azure,” said Justin Simonelis, Director, Service Engineering and Confidential Computing, RBC. “We’ve leveraged the benefits of confidential computing and integrated it into our own data clean room platform known a Arxis. As we continue to develop our platform capabilities, we fully recognize the importance of privacy preserving machine learning inference and training to protect sensitive customer data within GPUs and look forward to leveraging Azure confidential VMs with NVIDIA H100 Tensor Core GPUs.”
Performance insights
Azure confidential VMs with NVIDIA H100 Tensor core GPUs offer best-in-class performance for inferencing small-to-medium sized models while protecting code and data throughout their lifecycle. We have benchmarked these VMs across a variety of models using vLLM.
The table below shows configuration for the tests:
VM Configuration |
vCPUs – 40 cores GPU - 1 Memory – 320GB |
Operating System |
Ubuntu 22.04.4 LTS (6.5.0-1023-azure) |
GPU driver version |
550.90.07 |
GPU vBIOS version |
96.00.88.00.11 |
The figure above shows the overheads of confidential computing, with and without CUDA graph enabled. For most models, the overheads are negligible. For smaller models, the overheads are higher due to increased latency of encrypting PCIe traffic and kernel invocations. Increasing the batch size or input token length is a viable strategy to mitigate confidential computing overhead.
Learn more
- Azure confidential GPU Options
- Azure AI Confidential Inferencing Preview
- Azure AI Confidential Inferencing: Technical Deep-Dive
Published on:
Learn moreRelated posts
Azure SQL Cryptozoology AI Embeddings Lab Now Available!
Missed out on MS Build 2025? No worries! Our lab is now available for your exploration. Dive into a unique cryptozoology experience using Azur...
Vector Support Public Preview now extended to Azure SQL MI
We are thrilled to announce that Azure SQL Managed Instance now supports Vector type and functions in public preview. This builds on the mome...
Building Multi-Agent AI Apps in Java with Spring AI and Azure Cosmos DB!
As AI-driven apps become more sophisticated, there’s an increasing need for them to mimic collaborative problem solving – like a t...
What runs ChatGPT, Sora, DeepSeek & Llama on Azure? (feat. Mark Russinovich)
Build and run your AI apps and agents at scale with Azure. Orchestrate multi-agent apps and high-scale inference solutions using open-source a...
Azure Cosmos DB TV – Everything New in Azure Cosmos DB from Microsoft Build 2025
Microsoft Build 2025 brought major innovations to Azure Cosmos DB, and in Episode 105 of Azure Cosmos DB TV, Principal Program Manager Mark Br...
Azure DevOps with GitHub Repositories – Your path to Agentic AI
GitHub Copilot has evolved beyond a coding assistant in the IDE into an agentic teammate – providing actionable feedback on pull requests, fix...
Power Platform Data Export: Track Cloud Flow Usage with Azure Application Insights
In my previous article Power Platform Data Export: Track Power Apps Usage with Azure Data Lake, I explained how to use the Data Export feature...
Announcing General Availability of JavaScript SDK v4 for Azure Cosmos DB
We’re excited to launch version 4 of the Azure Cosmos DB JavaScript SDK! This update delivers major improvements that make it easier and faste...