Loading...

Enabling Job accounting for SLURM with Azure Cyclecloud 8.2 and Azure MariaDB database

Enabling Job accounting for SLURM with Azure Cyclecloud 8.2 and Azure MariaDB database

Overview

SLURM, the Simple Linux Utility for Resource Management, is an open-source cluster resource management and job scheduler. This package contains also SlurmDBD (Slurm Database Daemon) that can be used to securely manage the accounting data for several Slurm clusters in a central location. Slurm can be configured to collect accounting information for every job and user. Accounting records can be written to a simple text file or a database. Information is available about both currently executing jobs and jobs which have already terminated. The sacct command can report resource usage for running or terminated jobs including individual tasks, which can be useful to detect load imbalance between the tasks.

From Azure cyclecloud 8.1.0 onwards Slurm template supports enabling SlurmDBD on Slurm 20.11+. This blog will give you the information about how to enable SLURM job accounting with Azure Cyclecloud and Azure MariaDB instance.

Architecture:

vinilv_0-1653300753566.png

Prerequisites:

This blog post assumes that you have access to Azure Cyclecloud 8.2 and Azure Managed MariaDB Instance for setting up the Slurm cluster and SlurmDBD configuration. 

If you don't, please refer the following Azure CycleCloud Documentation  and  Azure Database for MariaDB documentation

 

Solution:

Here are the steps to integrate slurm job accounting in Azure Cyclecloud.

  1. First, you need to have a managed MariaDB database instance for job accounting as Cyclecloud expect a DB URL for job accounting. Slurm uses MariaDB for writing the job accounting information.

Please note, the "#" character is not permitted to use by Slurm DBD to access MariaDB, so make sure the MariaDB admin password does not contains "#".

vinilv_1-1653300753568.png

        2. Once the MariaDB instance is spun up, you have all the required information to fill in to enable job accounting feature in Cyclecloud portal.

vinilv_2-1653300753573.png

 

  1. You need to update the VNET rules with the scheduler virtual network in the MariaDB connection security settings for accessing the MariaDB from the scheduler node (slurmdbd daemon will be running in scheduler node and it uses Managed Azure MariaDB as the Database).

vinilv_3-1653300753578.png

 

  1. After setting up the MariaDB we could add the DB information in the Advanced Settings section of the Cyclecloud's Slurm cluster. Select “Job Accounting”, Enter the DB information and save and start the cluster.

vinilv_4-1653300753581.png

  1. Once the cluster is up, run a sample job and check sacct to see the job accounting functionality.

vinilv_5-1653300753587.png

You could pass many parameters to sacct to get the required accounting information from SlurmDBD.

 

Example – finding a start and end time of job in given time period.

vinilv_1-1653400068019.png

You could also find out the job statistics for the specific user or a specific cluster. See saact documentation for more examples.

 

Granular Cost Control 

 

Very important aspect in each organization is the ability to calculate the consumption cost in a more granular way, e.g. on the job level or per user, as the infrastructure is usually share between different users. This gives rise to estimating the internal spend for different teams, departments and also forecasting the expenses.

In order to provide that functionality, in addition to the accounting information described above that provides the job duration, SKU type for a specific user, you could leverage the Azure Pricing API to obtain information about the cost of a specific SKU in a region where the cluster is located. This can help you to build a custom parser to calculate the costs of the cluster usage in more granular way as indicated below.

 

vinilv_0-1653399974580.png

An example query to get hourly pricing for HBv2 Spot VM in West Europe:

 

https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and meterName eq 'HB120-64rs v2 Low Priority' and armRegionName eq 'westeurope' and productName eq 'Virtual Machines HBSv2 Series'

 

Provides the following response:

 

{ "currencyCode": "USD", "tierMinimumUnits": 0.0, "retailPrice": 0.936, "unitPrice": 0.936, "armRegionName": "westeurope", "location": "EU West", "effectiveStartDate": "2022-03-01T00:00:00Z", "meterId": "06ccc665-c195-4b78-a4c2-a5d16b9c4768", "meterName": "HB120-64rs v2 Low Priority", "productId": "DZH318Z0CGR8", "skuId": "DZH318Z0CGR8/00FV", "productName": "Virtual Machines HBSv2 Series", "skuName": "Standard_HB120-64rs_v2 Low Priority", "serviceName": "Virtual Machines", "serviceId": "DZH313Z7MMC8", "serviceFamily": "Compute", "unitOfMeasure": "1 Hour", "type": "Consumption", "isPrimaryMeterRegion": false, "armSkuName": "Standard_HB120-64rs_v2" }

 

 

Ideas for Parser to calculate the Per job cost :

We could get the Job-related information from sacct and price information from Azure pricing API.

Slurm Accounting - Job Elapsed time (Hr) & No.of Nodes
Azure Pricing API - Price of each Instance/Hr

Per Job cost = Job Elapsed time (Hr) x No.of Nodes x Price of each Instance/Hr (Normalize to minutes if needed)

You could create parser based on your ideas and the information collected from Slurm job accounting and Azure Pricing API.

 

Conclusion:

You have successfully enabled Job accounting for SLURM with Azure Cyclecloud 8.2 and Azure MariaDB, learned couple of commands for reviewing the usage patterns and cluster utilization using sacct command. In combination with Azure Pricing API you could build a customized parser to calculate the cost per job, per user, per cluster and use it for more granular cost control within your organisation.

 

Reference Links:

https://slurm.schedmd.com/sacct.html

https://docs.microsoft.com/en-us/azure/cyclecloud/?view=cyclecloud-8

https://docs.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices

https://docs.microsoft.com/en-us/azure/mariadb/

 

Technical contribution: Vinil Vadakkepurakkal, Łukasz Mirosław (Microsoft)

Published on:

Learn more
Azure Compute Blog articles
Azure Compute Blog articles

Azure Compute Blog articles

Share post:

Related posts

Azure Developer CLI (azd) – November 2024

This post announces the November release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) – November 2024 appeared...

3 days ago

Microsoft Purview | Information Protection: Auto-labeling for Microsoft Azure Storage and Azure SQL

Microsoft Purview | Information Protection will soon offer Auto-labeling for Microsoft Azure Storage and Azure SQL, providing automatic l...

4 days ago

5 Proven Benefits of Moving Legacy Platforms to Azure Databricks

With evolving data demands, many organizations are finding that legacy platforms like Teradata, Hadoop, and Exadata no longer meet their needs...

5 days ago

November Patches for Azure DevOps Server

Today we are releasing patches that impact our self-hosted product, Azure DevOps Server. We strongly encourage and recommend that all customer...

5 days ago

Elevate Your Skills with Azure Cosmos DB: Must-Attend Sessions at Ignite 2024

Calling all Azure Cosmos DB enthusiasts: Join us at Microsoft Ignite 2024 to learn all about how we’re empowering the next wave of AI innovati...

5 days ago

Getting Started with Bicep: Simplifying Infrastructure as Code on Azure

Bicep is an Infrastructure as Code (IaC) language that allows you to declaratively define Azure resources, enabling automated and repeatable d...

7 days ago

How Azure AI Search powers RAG in ChatGPT and global scale apps

Millions of people use Azure AI Search every day without knowing it. You can enable your apps with the same search that enables retrieval-augm...

10 days ago
Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy