Loading...

Restoring Soft-Deleted Blobs with multithreading in Azure Storage Using C#

Restoring Soft-Deleted Blobs with multithreading in Azure Storage Using C#

Blob soft delete is an essential feature that safeguards your data against accidental deletions or overwrites. By retaining deleted data for a specified period, it ensures data integrity and availability, even in the event of human error. However, restoring data in the soft delete state can be more labor-intensive, as the undelete API must be called for each individual deleted blob. Currently, there is no option to bulk undelete all blobs.

 

In this blog, we provide a sample C# code that will help you restore soft-deleted data efficiently. The code leverages multiple threads to expedite the restoration process, making it particularly effective if you have a large number of blobs to restore. Additionally, this program can be configured to undelete blobs within a specific container or directory, rather than scanning the entire storage account.

 

To run this program, follow these steps:

  • Install .NET SDK: Ensure you have the .NET SDK installed on your machine.
  • Connect to Azure Account:

 

Connect-AzAccount

 

  • Add NuGet Source:

 

dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org

 

  • Create a New Console Application:

 

dotnet new console --force

 

  • Add the following code to Program.cs.

 

using Azure.Core; using Azure.Identity; using Azure.Storage.Files.DataLake; using Azure.Storage.Files.DataLake.Models; var StorageAccountName = "xxxx"; var ContainerName = "xxxx"; var DirectoryPath = ""; var Concurrency = 500; var BatchSize = 500; static DataLakeServiceClient GetDatalakeClient(string accountName) { DataLakeClientOptions clientOptions = new DataLakeClientOptions() { Retry = { Delay = TimeSpan.FromMilliseconds(500), MaxRetries = 5, Mode = RetryMode.Fixed, MaxDelay = TimeSpan.FromSeconds(5), NetworkTimeout = TimeSpan.FromSeconds(30) }, }; // only works for prod. DataLakeServiceClient client = new( new Uri($"https://{accountName}.blob.core.windows.net"), new DefaultAzureCredential(), clientOptions); return client; } Console.WriteLine("Starting the program"); var client = GetDatalakeClient(StorageAccountName); var throttler = new SemaphoreSlim(initialCount: Concurrency); List<Task> tasks = new List<Task>(); List<string> containerNames = new List<string>(); if (string.IsNullOrEmpty(ContainerName)) { var containers = client.GetFileSystems(); foreach (var container in containers) { containerNames.Add(container.Name); } } else { containerNames.Add(ContainerName); } var totalSuccessCount = 0; var totalFailedCount = 0; foreach (var container in containerNames) { Console.WriteLine($"Recoverying for container {container}"); var fileSystem = client.GetFileSystemClient(container); var deletedItems = fileSystem.GetDeletedPaths(pathPrefix: DirectoryPath); var count = 0; var totalSuccessCountForContainer = 0; var totalFailedCountForContainer = 0; foreach (PathDeletedItem item in deletedItems) { await throttler.WaitAsync(); count++; try { var task = (fileSystem.UndeletePathAsync(item.Path, item.DeletionId)); var continuedTask = task.ContinueWith(t => { throttler.Release(); if (t.IsFaulted) { Interlocked.Increment(ref totalFailedCount); Interlocked.Increment(ref totalFailedCountForContainer); Console.WriteLine($"Failed count for container {totalFailedCountForContainer}, total failed count {totalFailedCount}, path {DirectoryPath + item.Path} due to {t.Exception.Message}"); } else { Interlocked.Increment(ref totalSuccessCount); Interlocked.Increment(ref totalSuccessCountForContainer); Console.WriteLine($"Success count for container {totalSuccessCountForContainer}, total success count {totalSuccessCount}"); } }); tasks.Add(continuedTask); } catch (Exception ex) { Console.WriteLine("Failed to create task: " + ex.ToString()); } finally { if (count == Math.Max(Concurrency, BatchSize)) { count = 0; await Task.WhenAll(tasks); tasks.Clear(); } } } await Task.WhenAll(tasks); Console.WriteLine($"Recover finished for container {container}"); }

 

 

Replace xxxx with your specific storage account and container name. If you need to restore a particular directory, provide the directory name; otherwise, leave it empty to scan the entire container. The code is configured to run with 500 threads by default, but you can adjust this number according to your needs.

 

  • Add Required Packages:

 

dotnet add package Azure.Identity dotnet add package Azure.Storage.Blobs

 

  • Build the Project:

 

dotnet build --configuration Release

 

 

  • Run the Program:

 

dotnet <path_to_dll>

 

 

Once the application is running, you can monitor the console window to track its progress and identify any potential issues or failures.

Published on:

Learn more
Azure PaaS Blog articles
Azure PaaS Blog articles

Azure PaaS Blog articles

Share post:

Related posts

Integrate Dataverse Azure solutions – Part 2

Dataverse that help streamline your integrations, such as Microsoft Azure Service Bus, Microsoft Azure Event Hubs, and Microsoft Azure Logic A...

2 days ago

Dynamics 365 CE Solution Import Failed in Azure DevOps Pipelines

Got the below error while importing Dynamics CRM Solution via Azure DevOps Pipeline. 2024-12-18T23:14:20.4630775Z ]2024-12-18T23:14:20.74...

3 days ago

Dedicated SQL Pool and Serverless SQL in Azure: Comparison and Use Cases

Table of Contents Introduction Azure Synapse Analytics provides two powerful SQL-based options for data processing: Dedicated SQL Pools and Se...

3 days ago
Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy