Enabling Job accounting for SLURM with Azure Cyclecloud 8.2 and Azure MariaDB database
Overview
SLURM, the Simple Linux Utility for Resource Management, is an open-source cluster resource management and job scheduler. This package contains also SlurmDBD (Slurm Database Daemon) that can be used to securely manage the accounting data for several Slurm clusters in a central location. Slurm can be configured to collect accounting information for every job and user. Accounting records can be written to a simple text file or a database. Information is available about both currently executing jobs and jobs which have already terminated. The sacct command can report resource usage for running or terminated jobs including individual tasks, which can be useful to detect load imbalance between the tasks.
From Azure cyclecloud 8.1.0 onwards Slurm template supports enabling SlurmDBD on Slurm 20.11+. This blog will give you the information about how to enable SLURM job accounting with Azure Cyclecloud and Azure MariaDB instance.
Architecture:
Prerequisites:
This blog post assumes that you have access to Azure Cyclecloud 8.2 and Azure Managed MariaDB Instance for setting up the Slurm cluster and SlurmDBD configuration.
If you don't, please refer the following Azure CycleCloud Documentation and Azure Database for MariaDB documentation
Solution:
Here are the steps to integrate slurm job accounting in Azure Cyclecloud.
- First, you need to have a managed MariaDB database instance for job accounting as Cyclecloud expect a DB URL for job accounting. Slurm uses MariaDB for writing the job accounting information.
Please note, the "#" character is not permitted to use by Slurm DBD to access MariaDB, so make sure the MariaDB admin password does not contains "#".
2. Once the MariaDB instance is spun up, you have all the required information to fill in to enable job accounting feature in Cyclecloud portal.
- You need to update the VNET rules with the scheduler virtual network in the MariaDB connection security settings for accessing the MariaDB from the scheduler node (slurmdbd daemon will be running in scheduler node and it uses Managed Azure MariaDB as the Database).
- After setting up the MariaDB we could add the DB information in the Advanced Settings section of the Cyclecloud's Slurm cluster. Select “Job Accounting”, Enter the DB information and save and start the cluster.
- Once the cluster is up, run a sample job and check sacct to see the job accounting functionality.
You could pass many parameters to sacct to get the required accounting information from SlurmDBD.
Example – finding a start and end time of job in given time period.
You could also find out the job statistics for the specific user or a specific cluster. See saact documentation for more examples.
Granular Cost Control
Very important aspect in each organization is the ability to calculate the consumption cost in a more granular way, e.g. on the job level or per user, as the infrastructure is usually share between different users. This gives rise to estimating the internal spend for different teams, departments and also forecasting the expenses.
In order to provide that functionality, in addition to the accounting information described above that provides the job duration, SKU type for a specific user, you could leverage the Azure Pricing API to obtain information about the cost of a specific SKU in a region where the cluster is located. This can help you to build a custom parser to calculate the costs of the cluster usage in more granular way as indicated below.
An example query to get hourly pricing for HBv2 Spot VM in West Europe:
Provides the following response:
Ideas for Parser to calculate the Per job cost :
We could get the Job-related information from sacct and price information from Azure pricing API.
Slurm Accounting - Job Elapsed time (Hr) & No.of Nodes
Azure Pricing API - Price of each Instance/Hr
Per Job cost = Job Elapsed time (Hr) x No.of Nodes x Price of each Instance/Hr (Normalize to minutes if needed)
You could create parser based on your ideas and the information collected from Slurm job accounting and Azure Pricing API.
Conclusion:
You have successfully enabled Job accounting for SLURM with Azure Cyclecloud 8.2 and Azure MariaDB, learned couple of commands for reviewing the usage patterns and cluster utilization using sacct command. In combination with Azure Pricing API you could build a customized parser to calculate the cost per job, per user, per cluster and use it for more granular cost control within your organisation.
Reference Links:
https://slurm.schedmd.com/sacct.html
https://docs.microsoft.com/en-us/azure/cyclecloud/?view=cyclecloud-8
https://docs.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices
https://docs.microsoft.com/en-us/azure/mariadb/
Technical contribution: Vinil Vadakkepurakkal, Łukasz Mirosław (Microsoft)
Published on:
Learn moreRelated posts
Azure Developer CLI (azd) – November 2024
This post announces the November release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) – November 2024 appeared...
Microsoft Purview | Information Protection: Auto-labeling for Microsoft Azure Storage and Azure SQL
Microsoft Purview | Information Protection will soon offer Auto-labeling for Microsoft Azure Storage and Azure SQL, providing automatic l...
5 Proven Benefits of Moving Legacy Platforms to Azure Databricks
With evolving data demands, many organizations are finding that legacy platforms like Teradata, Hadoop, and Exadata no longer meet their needs...
November Patches for Azure DevOps Server
Today we are releasing patches that impact our self-hosted product, Azure DevOps Server. We strongly encourage and recommend that all customer...
Elevate Your Skills with Azure Cosmos DB: Must-Attend Sessions at Ignite 2024
Calling all Azure Cosmos DB enthusiasts: Join us at Microsoft Ignite 2024 to learn all about how we’re empowering the next wave of AI innovati...
Query rewriting for RAG in Azure AI Search
Getting Started with Bicep: Simplifying Infrastructure as Code on Azure
Bicep is an Infrastructure as Code (IaC) language that allows you to declaratively define Azure resources, enabling automated and repeatable d...
How Azure AI Search powers RAG in ChatGPT and global scale apps
Millions of people use Azure AI Search every day without knowing it. You can enable your apps with the same search that enables retrieval-augm...