Microsoft Fabric Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations

In part 2 of this tutorial, Director of Data and AI Barry Smart, dives deep into data validation using Microsoft Fabric and Great Expectations, for a Predictive Analytics use case with the Kaggle Titanic data set. In the tutorial, he demonstrates how to use Microsoft Fabric to create a "data contract" that establishes minimum data quality standards to set up the pipeline to process the data effectively. The demonstration uses the Great Expectations Python package to establish the data contract and Microsoft's mssparkutils
Python package to enable the exit value of the Notebook to be passed back to the Pipeline that has triggered it.
By watching the video, you'll learn how to fail elegantly by dropping bad rows and continuing with only the good ones, removing any bad records to create clean data sets, using Teams pipeline activity in Fabric to send a message to the data stewards about the failed validation, and much more.
The demo uses Notebooks, Pipelines, and the Lakehouse to demonstrate the features in the data engineering experience in Fabric. The video begins with an overview of the architecture and Deep Dive into applying DataOps principles of data validation and alerting that form part of an end-to-end demo of Microsoft Fabric. You'll also find chapters organising the video into different sections to make it easier to follow.
If you're interested in learning how to craft a compelling data narrative from raw data and captivate your audience, this tutorial is a great place to start
You can view the complete series of Microsoft Fabric demos and other content on the endjin website.
The post Microsoft Fabric Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations originally appeared on Endjin.
Published on:
Learn moreRelated posts
Microsoft Fabric Machine Learning Tutorial - Part 1 - Overview of the Course
This video provides an overview of a new tutorial series that would take a deep dive into an end-to-end demo of Microsoft Fabric, with a focus...
Microsoft Fabric Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations
This tutorial delves into the intricacies of data validation in the realm of Microsoft Fabric and Great Expectations. It demonstrates how a da...
Demystifying Delta Lake Table Structure in Microsoft Fabric
If you're wondering about the structure of Delta Lake tables in OneLake for the Lakehouse, this article and video are here to demystify it for...
From Descriptive to Predictive Analytics with Microsoft Fabric | Part 1
This article provides a comprehensive overview of an end-to-end demo of Microsoft Fabric's predictive analytics capabilities using the Kaggle ...
OneLake: Microsoft Fabric’s Ultimate Data Lake
Microsoft Fabric's OneLake is the ultimate solution to revolutionizing how your organization manages and analyzes data. Serving as your OneDri...
Delta Lake 101 Part 4: Schema evolution and enforcement
If you're looking to implement lakehouse solutions in Microsoft Fabric, Databricks or other tools that work with Delta Lake, it's essential to...
Delta Lake 101 Part 3: Optimize ZOrdering and File Pruning
If you're looking to enhance the performance of your Lakehouse, then optimizing your ZOrdering and file pruning techniques are integral to ach...
40 Days of Fabric: Day 2 – OneLake
Day 2 of the "40 Days of Fabric" blog series by DataVeld consists of a detailed explanation of OneLake, which acts as a "OneDrive for Data." T...
Delta Lake 101 Part 2: Transaction Log
In this article, we dive deeper into the world of Delta Lake, focusing specifically on the Transaction Log. As you may recall from the previou...
Delta Lake 101 – Part 1: Introduction
If you're interested in Delta Lake and its growing popularity, this post will provide a comprehensive introduction. Delta Lake has gained imme...