Loading...

Mount ADLS Gen2 or Blob Storage in Azure Databricks

Mount ADLS Gen2 or Blob Storage in Azure Databricks

Scenario:
Azure Databricks offers many of the same features as the open-source Databricks platform, such as a web-based workspace for managing Spark clusters, notebooks, and data pipelines, along with Spark-based analytics and machine learning tools. It is fully integrated with Azure cloud services, providing native access to Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, and other Azure services. This blog shows example of mounting Azure Blob Storage or Azure Data Lake Storage in the Databricks File System (DBFS), with two authentication methods for mount: Access Key and SAS token.

 

Objective:

To become acquainted with Databricks storage mount with ABFS/WASB driver and various authentication methods.

Pre-requisites:

For this example, you would need:

  1. An Azure Databricks Service.
  2. A Databricks Cluster (compute).
  3. A Databricks Notebook.
  4. An Azure Data Lake Storage or Blob Storage.

 

Steps to mount storage container on Databricks File System (DBFS):

  1. Create storage container and blobs.
  2. Mount with dbutils.fs.mount().
  3. Verify mount point with dbutils.fs.mounts().
  4. List the contents with dbutils.fs.ls().
  5. Unmount with dbutils.fs.unmount().

 

[STEP 1]: Create storage container and blobs

Below is the storage structure used in this example. I have created a container “aaa”, a virtual folder “bbb”, in which has 5 PNG files. The storage “charlesdatabricksadlsno” is a blob storage with no hierarchical namespace.

 

databricks-test1.png

databricks-test.png

 

 

[STEP 2]: Mount with dbutils.fs.mount()

We can use below code snippet to mount container "aaa" with Azure Databricks.

 

 

 

storageAccountName = "charlesdatabricksadlsno" storageAccountAccessKey = <access-key> sasToken = <sas-token> blobContainerName = "aaa" mountPoint = "/mnt/data/" if not any(mount.mountPoint == mountPoint for mount in dbutils.fs.mounts()): try: dbutils.fs.mount( source = "wasbs://{}@{}.blob.core.windows.net".format(blobContainerName, storageAccountName), mount_point = mountPoint, #extra_configs = {'fs.azure.account.key.' + storageAccountName + '.blob.core.windows.net': storageAccountAccessKey} extra_configs = {'fs.azure.sas.' + blobContainerName + '.' + storageAccountName + '.blob.core.windows.net': sasToken} ) print("mount succeeded!") except Exception as e: print("mount exception", e)

 

 

 

Some keypoints to note:

  1. I provide two authentication methods for mount: Access Key and SAS token. You may use either (by choosing the 1st or 2nd line that starts with "extra_configs". Instructions for getting the Access Key and SAS token are in the next section.
  2. This mount example does not re-mount an existing mount point. To re-mount, you have to unmount (mentioned in later section) and then mount again.

To get the Access Key, you would go to Azure portal/Access Keys and copy either key1 or key2.

 

charleswang_4-1682178395217.png

To get a SAS token, you can generate in two ways:

  • Generate an account level SAS with all Allowed Resource Types enabled.

charleswang_0-1685953446828.png

 

  • Generate a container level SAS with read and list permissions. For this example, I generate a SAS for container “aaa” which I would later mount on the Databricks cluster.

charleswang_0-1685953796024.png

 

 [STEP 3]: Verify mount point (/mnt/data) with dbutils.fs.mounts()

 

 

 

dbutils.fs.mounts()

 

 

charleswang_5-1682179093078.png

 

[STEP 4]: List the contents with dbutils.fs.ls()

 

 

 

dbutils.fs.ls("/mnt/data/bbb")

 

 

charleswang_0-1682180048076.png

 

[STEP 5]: Unmount with dbutils.fs.unmount()

 

 

 

dbutils.fs.unmount('/mnt/data')

 

 

charleswang_2-1682180294442.png

 

Others:

  • To use ADLS Gen2 storage as mount source, just replace the storage account name, Access Key, and SAS token in the mount step. You may reuse the BLOB endpoint (blob.core.windows.net).
  • If you want to take advantage of the hierarchical namespace feature of ADLS Gen2, such as ACL on the files and folders, you can switch to use ABFS, which stands for Azure Blob File System, and the DFS endpoint (dfs.core.windows.net), from the previous WASBS (Windows Azure Storage Blob) used with BLOB endpoint. The mount source would become: 
abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/
  • from
wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/
  • To prevent mount point authentication error in case an Access Key or SAS token is rotated, you can modify the mount condition such that if there is an existing mount point, it will first unmount before mounting.

 

 

 

storageAccountName = "charlesdatabricksadlsno" storageAccountAccessKey = <access-key> sasToken = <sas-token> blobContainerName = "aaa" mountPoint = "/mnt/data/" if any(mount.mountPoint == mountPoint for mount in dbutils.fs.mounts()): dbutils.fs.unmount(mountPoint) try: dbutils.fs.mount( source = "wasbs://{}@{}.blob.core.windows.net".format(blobContainerName, storageAccountName), mount_point = mountPoint, #extra_configs = {'fs.azure.account.key.' + storageAccountName + '.blob.core.windows.net': storageAccountAccessKey} extra_configs = {'fs.azure.sas.' + blobContainerName + '.' + storageAccountName + '.blob.core.windows.net': sasToken} ) print("mount succeeded!") except Exception as e: print("mount exception", e)

 

 

 

 

References:

Published on:

Learn more
Azure PaaS Blog articles
Azure PaaS Blog articles

Azure PaaS Blog articles

Share post:

Related posts

Code AI apps on Azure - Python, Prompty & Visual Studio

Build your own custom applications with Azure AI right from your code. With Azure AI, leverage over 1,700 models, seamlessly integrating them ...

2 days ago

Network Connectivity for RISE with SAP S/4HANA Cloud Private Edition on Azure

In this article, we will explore different ways to connect to RISE with SAP S/4HANA Cloud Private Edition deployment on Azure, guiding yo...

2 days ago

Azure Landing Zones - Policy Refresh Q1 FY25

ALZ - Policy Refresh Q1 FY25 is here! As you may be aware, the ALZ team release cadence is now on quarterly basis to help customers and partne...

2 days ago

Debug Queries More Efficiently with the Improved Error Messaging in Azure Cosmos DB Data Explorer

Azure Cosmos DB Data Explorer is a web-based tool available in the Azure Portal that allows you to manage data, as well as track and fix issue...

2 days ago

Meet the Winners | Microsoft Developers Azure AI & Azure Cosmos DB Learning Hackathon

Azure Cosmos DB powers some of the world’s most popular intelligent apps like ChatGPT. In a recent hackathon, Over 9,500 developers engaged wi...

2 days ago

Introducing RBAC Authentication and more for the Azure Cosmos DB Integrated Cache

We’re excited to announce new features for the Azure Cosmos DB! The integrated cache is built into the dedicated gateway, and now there’s new ...

3 days ago

Microsoft DiskANN in Azure Cosmos DB Whitepaper

We are excited to publish a new whitepaper titled, Microsoft DiskANN in Azure Cosmos DB, where we examine the impressive capabilities of Micro...

3 days ago

Announcing Private Preview: VS Code Extension of vCore-based Azure Cosmos DB for MongoDB

Overview We’re excited to introduce a new VS Code extension for vCore-based Azure Cosmos DB for MongoDB ! This tool allows users to conn...

3 days ago

Azure Communication Services September 2024 Feature Updates

The Azure Communication Services team is excited to share several new product and feature updates released in August 2024. (You can view previ...

3 days ago
Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy