Azure Compute Blog articles

Azure Compute Blog articles

https://techcommunity.microsoft.com/t5/azure-compute-blog/bg-p/AzureCompute

Azure Compute Blog articles

Join Microsoft and NVIDIA experts at this Microsoft GTC Session [WP41730] Watch Party

Published

Join Microsoft and NVIDIA experts at this Microsoft GTC Session [WP41730] Watch Party

Join Microsoft and NVIDIA experts at this GTC Session [WP41730] Watch Party: Operationalize Large-Model Training on Azure Machine Learning using NVIDIA’s Multi-Node A100 GPUs.  A GTC Session Watch Party is a replay of an original GTC talk. This is an interactive session, and we encourage you to join the discussion with any comments or questions.

 

This session takes place on Thursday, Sep 22, 2:00 PM - 3:30 PM CEST and is hosted by:

  • Gabrielle Davelaar, AI Technical Specialist, Microsoft
  • Maxim Salnikov, Senior Azure GTM Manager, Microsoft
  • Henk Boelman, Senior Cloud Advocate–AI & Machine Learning, Microsoft
  • Alexander Young, Technical Marketing Engineer, NVIDIA
  • Ulrich Knechtel, Microsoft Partner Manager (EMEA), NVIDIA

Deep learning models have grown in size by several orders of magnitude in recent years, demonstrating a growing need for customers to train and fine-tune them using large-scale infrastructure with many GPUs and requiring large memory. Azure Machine Learning offers the breakthrough software stack running on the latest multi-node NVIDIA GPUs. Azure Machine Learning offers ready-to-use environments with stable PyTorch for Enterprise, including optimizers like DeepSpeed and ONNX Runtime, to enable data scientists to easily train large models.

 

We'll showcase experiments using 1,024 A100s to scale the training of a 2T parameter model with a streamlined user experience at 1K+ GPU scale. We'll describe the software innovations to customers through Azure Machine Learning (including a fully optimized PyTorch environment) that offers great performance and an easy-to-use interface for large-scale training. Use simple training pipelines on Azure Machine Learning (AzureML) to train large models on Azure using NVIDIA A100 Tensor Core GPUs. 

Continue to website...

More from Azure Compute Blog articles

Related Posts