Microsoft Fabric Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations

In part 2 of this tutorial, Director of Data and AI Barry Smart, dives deep into data validation using Microsoft Fabric and Great Expectations, for a Predictive Analytics use case with the Kaggle Titanic data set. In the tutorial, he demonstrates how to use Microsoft Fabric to create a "data contract" that establishes minimum data quality standards to set up the pipeline to process the data effectively. The demonstration uses the Great Expectations Python package to establish the data contract and Microsoft's mssparkutils Python package to enable the exit value of the Notebook to be passed back to the Pipeline that has triggered it.

By watching the video, you'll learn how to fail elegantly by dropping bad rows and continuing with only the good ones, removing any bad records to create clean data sets, using Teams pipeline activity in Fabric to send a message to the data stewards about the failed validation, and much more.

The demo uses Notebooks, Pipelines, and the Lakehouse to demonstrate the features in the data engineering experience in Fabric. The video begins with an overview of the architecture and Deep Dive into applying DataOps principles of data validation and alerting that form part of an end-to-end demo of Microsoft Fabric. You'll also find chapters organising the video into different sections to make it easier to follow.

If you're interested in learning how to craft a compelling data narrative from raw data and captivate your audience, this tutorial is a great place to start

You can view the complete series of Microsoft Fabric demos and other content on the endjin website.

The post Microsoft Fabric Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations originally appeared on Endjin.

Published on: May 07, 2024

Learn more

endjin.com

We help small teams achieve big things.

endjin.com

We help small teams achieve big things.

Learn more

Microsoft Fabric Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations

Related posts

Microsoft Fabric Machine Learning Tutorial - Part 1 - Overview of the Course

Microsoft Fabric Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations

Demystifying Delta Lake Table Structure in Microsoft Fabric

From Descriptive to Predictive Analytics with Microsoft Fabric | Part 1

OneLake: Microsoft Fabric’s Ultimate Data Lake

Delta Lake 101 Part 4: Schema evolution and enforcement

Delta Lake 101 Part 3: Optimize ZOrdering and File Pruning

40 Days of Fabric: Day 2 – OneLake

Delta Lake 101 Part 2: Transaction Log

Delta Lake 101 – Part 1: Introduction