Loading...

Azure Data Factory\ Synapse Analytics: Validate File\Folder before using!

Azure Data Factory\ Synapse Analytics:  Validate File\Folder before using!

In real time, every project would deal with Azure storage or Azure SQL Database. It can be blobs, folders/directories, files. It becomes a crucial step to validate the file\folder\table before actually using them.

 

Few usecases:

  1. Suppose we have to load a file named SalesData.csv from a folder that gets created everyday in the format yyyy/MM/dd. Before we use this file in a copy data activity or a data flow activity, we have to first validate, if the folder exists or not. 
  2. If the folder exists, we might want to validate the size of the file. This is important, because, sometimes the files bring no data, i.e. a 0 kb file.
  3. Another usecase would be to validate a table structure or file structure and make sure it is compliant with what we expect. 

In ADF\Synapse, we can use Validation activity and\or getmetadata activity to validate files, folders and tables. 

 

Validation Activity Settings & short description follows:

Screenshot 2024-04-06 at 10.30.31 AM.png

In the above image, we have a referenced a dataset called DelimitedText2, which would point to either a folder or file in azure blob storage or a table in a Azure SQL DB. Timeout property holds the time afterwhich the activity would timeout (note that, it wont fail). For instance, if the validation activity is meant to validate the presence of a file\folder\table, and it doesn't find the corresponding item, after the timeout time, the activity execution stops and timesout. Next, we have the sleep property which makes the validation activity wait for certain number of seconds before revalidation or timeout. Minimum size property is used to mention the minimum size of a file in bytes (not applicable to table based dataset). 

 

When the validation activity completes execution or times out, we can access the output json of the validation activity to know about the validation results.

 

Let us look at the GetMetadata activity and its settings. 

Screenshot 2024-04-06 at 10.42.55 AM.png

Like validation activity, we have few properties that would help us validate the file\folder\table in a GetMetadata activity in ADF\Synapse. Depending on whether the dataset points to a folder\file\table, the properties (field list) would differ.

 

The above image depicts the field list corresponding to a folder in ADLS. When we make the dataset point to a file, it would be as below.

 

Screenshot 2024-04-06 at 10.47.37 AM.png

So, if the dataset points to a file or table, we see couple of additional properties like Column Count, Size, Structure. 

 

Having seen about the individual properties\ field list of both the activities, its time to compare and know the similarities and differences. 

The below table compares the capabilities of Validation activity & Get metadata activity.

  Validation Activity Get metadata activity Property Used
Validate File Yes Yes Exists (returns boolean)
Validate Folder Yes Yes Exists (returns boolean)
Validate File Size Yes Yes

Get Metadata Activity: Use Size property in field list

Validation Activity: Use Minimum size field in Settings tab

Validate File Structure No Yes

Get Metadata:

Use the Field List: Structure.

Then, Use another activity like If condition to validate the structure against the expected.

Validate File Column Count No Yes Get Metadata:

Use the Field List: Column Count.

Then, Use another activity like If condition to validate the count against the expected.

 

In a nutshell, ADF\Synapse comes with a variety of activities, sometimes with similar characteristics or capabilities. When it comes to validation, it is based on the use case, we decide to use either Validation activity or Get Metadata activity or both. 

Published on:

Learn more
Azure Developer Community Blog articles
Azure Developer Community Blog articles

Azure Developer Community Blog articles

Share post:

Related posts

Azure Developer CLI (azd): Run and test AI agents locally with azd

New azd ai agent run and invoke commands let you start and test AI agents from your terminal—locally or in the cloud. The post Azure Developer...

1 day ago

Microsoft Purview compliance portal: Endpoint DLP classification support for Azure RMS–protected Office documents

Microsoft Purview Endpoint DLP will soon classify Azure RMS–protected Office documents, enabling consistent DLP policy enforcement on encrypte...

2 days ago

Introducing the Azure Cosmos DB Plugin for Cursor

We’re excited to announce the Cursor plugin for Azure Cosmos DB bringing AI-powered database expertise, best practices guidance, and liv...

2 days ago

Azure DevOps Remote MCP Server (public preview)

When we released the local Azure DevOps MCP Server, it gave customers a way to connect Azure DevOps data with tools like Visual Studio and Vis...

3 days ago

Azure Cosmos DB at FOSSASIA Summit 2026: Sessions, Conversations, and Community

The FOSSASIA Summit 2026 was an incredible gathering of developers, open-source contributors, startups, and technology enthusiasts from across...

4 days ago

Azure Cosmos DB at FOSSASIA Summit 2026: Sessions, Conversations, and Community

The FOSSASIA Summit 2026 was an incredible gathering of developers, open-source contributors, startups, and technology enthusiasts from across...

4 days ago

Dataverse: Avoid Concurrency issues by using Azure Service Bus Queue and Azure Functions

Another blog post to handle the concurrency issue. Previously, I shared how to do concurrency via a plugin in this blog post and also how to f...

5 days ago

March Patches for Azure DevOps Server

We are releasing patches for our self‑hosted product, Azure DevOps Server. We strongly recommend that all customers stay on the latest, most s...

6 days ago

Azure Developer CLI (azd): Debug hosted AI agents from your terminal

New azd ai agent show and monitor commands help you diagnose hosted AI agent failures directly from the CLI. The post Azure Developer CLI (azd...

6 days ago
Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy