Demystifying data consumption using Azure Synapse Data Explorer
Author:
Introduction
Continuing the series of Demystifying Data Explorer, in this blog, let’s take a look at the various ways of retrieving, consuming and visualizing data from Azure Synapse Data Explorer:
- Querying through KQL / SQL using tools such as Azure Synapse Data Explorer Web UI, Kusto Explorer, Azure Data Studio, and SQL Server Management Studio.
- Visualize data using ADX Dashboards, Power BI, Excel, Grafana, Kibana, Redash.
- Use Apache and Synapse Spark notebooks, Jupyter notebooks to interact with data in Azure Synapse Data Explorer.
- Orchestrate the data in Azure Synapse Data Explorer using Azure Data Factory, Synapse Pipelines, Logic Apps, and Power Automate.
- Export data to external storage (Azure Blob Storage or Azure Data Lake Store Gen 2), or a SQL table using one-time and continuous export.
- Connect SQL supported tools using OBDC/JDBC connectors and TDS endpoint.
- Programmatically retrieve data using REST API and SDKs.
Example business scenario
In the previous blog, Demystifying Data Ingestion into Azure Synapse Data Explorer, ACME Chemicals used batch and streaming enabled sources to ingest logs and sensor data into Azure Synapse Data Explorer. Continuing this business scenario, here are the data retrieval and visualization needs of ACME Chemicals:
1. ACME Chemicals want to visualize their data to prepare a dashboard with the following tiles:
a. Show near real time chemical machinery(vats) telemetry (Ex: Temperature, Mixer speed)
b. Status of critical metrics in relation to time (Ex: Spin rate of a centrifuge, output rate over time)
c. Top 5 products by production volume
2. ACME Chemicals want to send alert notifications when monitored values are breaching thresholds
3. ACME Chemicals want to train machine learning models to predict chemical properties of the output products using data from past batch.
4. ACME Chemicals want to Archive data from product batch productions for regulatory compliance.
Retrieving and consuming data from Azure Synapse Data Explorer
Azure Synapse Data Explorer provides multiple tools and approaches to multiple personas in the modern data ecosystem to consume data.
Persona |
Data Retrieval and Visualization Tools |
Data Engineer |
Query Tools, Data Pipeline Orchestrators, Export |
Analytics Engineer |
Query Tools, Notebook |
Data Scientist |
Notebook (Synapse, Spark, Jupyter, Azure Data Studio) |
BI User |
PowerBI, Excel, Tableau (via ODBC), Qlik (via ODBC) |
End User |
ADX Dashboard, PowerBI, Excel, Grafana, Kibana, Redash |
App Developer |
REST API, SDKs |
SQL Pros |
MS-TDS Endpoint, ODBC, JDBC, SSMS, Azure Data Studio |
Query tools
Kusto Query Language (KQL) is the language for querying data stored in Azure Synapse Data Explorer. KQL was specifically designed to make ad-hoc, interactive analytics simple, powerful, intuitive and easy to read. The following tools are provided to write KQL and interact with the data.
- ADX Web UI – Azure Synapse Data Explorer comes with a publicly available web UI - dataexplorer.azure.com, to navigate through cluster, database, tables, columns, functions, materialized views, ingest data, write queries, visualize charts and create dashboards. This Web UI provides a query editor powered by intellisense enabling you to write KQL statements.
- Kusto Explorer – This windows based desktop application providing all the goodness of ADX web UI but on desktop. You can install the application from - https://aka.ms/ke. The application has easy-to-use user interface and provides advantages like keyboard shortcuts, code refactoring, code analyzer and ready buttons to visualize charts from query results.
- Azure Data Studio - By enabling the native Kusto (KQL) experience in Azure Data Studio, data engineers, data scientists, and data analysts can find trends and anomalies from data stored in Azure Synapse Data Explorer. Azure Data Studio can be enabled with Kusto extension to get a Web UI like experience or use Kql magic in notebooks which brings you the benefit of notebooks, data analysis, and rich Python capabilities.
- SQL Server Management Studio – By using MS-TDS endpoint, ADX cluster can be connected from within SQL Server Management Studio just like a SQL database to write familiar SQL queries on ADX data.
- Kusto CLI - is a command-line utility that is used to send requests to Kusto and display the results. This is primarily provided for automating tasks against Azure Synapse Data Explorer that normally requires writing code like a C# program, or a PowerShell script.
Data Pipeline Orchestrators
To unlock scenarios wherein ADX is an intermediate component of a data pipeline, Azure Synapse Data Explorer has native integration with the following services:
- Azure Data Factory/ Synapse Pipelines - ADX is present as source, sink or in mapping data flows in Azure Data Factory/Synapse pipelines. Supported activities include
- Copy – Copy activity is used to transfer data between data stores. Azure Synapse Data Explorer is supported as a source, where data is copied from Azure Synapse Data Explorer to any supported data store, and a sink, where data is copied from any supported data store to Azure Synapse Data Explorer.
- Lookup - The Lookup activity is used for executing queries on Azure Synapse Data Explorer. The result of the query will be returned as the output of the Lookup activity, and can be used in the next activity in the pipeline
- Command - The Command activity allows the execution of Azure Synapse Data Explorer control commands to modify data or metadata.
- Azure Synapse Data Explorer integration with Azure Data Factory- ADF allows you to create data-driven workflows for orchestrating and automating data movement and data transformation.
- Logic Apps/Power Automate – enables you to run Kusto queries and commands automatically as part of a scheduled or triggered task. Any of these connectors can help users setup a Low Code/No Code pipeline to:
- Send notifications/alerts based on query
- Send regular reports containing records or charts
- Run schedule job to run control commands – Ex: Copy data from one table to another using .set-or-append command
- Export or import data between ADX and other databases.
Tech Tip: Customers can use Logic Apps/Power Automate to enable alerting/notifications on their ADX data – use Power Automate with ADX to send notifications.
Visualization
Azure Synapse Data Explorer integrates with most of the industry leading BI and Data Visualization tools to enable BI Users and End Users build powerful, rich, interactive reports and dashboards.
- Render operator: Using this inbuilt operator, users can instantly visualize their results to find certain patterns/trends in their data, find anomalies, or forecast their business data in the form of a chart.
- ADX Dashboards: Azure Synapse Data Explorer Web UI also provides dashboard feature at no additional cost if the users do not wish to invest in any external tools. ADX Dashboards provide 3 main advantages:
- Natively export KQL queries from Web UI to dashboards
- Explore the data in Web UI
- Optimized dashboard rendering performance
Azure Synapse Data Explorer is also available as data source in a lot of BI tools for visualization
- Power BI - provides Import and Direct Query modes when using ADX as data source. Power BI connector has been optimized to translate M-query, and DAX syntax into KQL without users have to write any KQL. For Power BI users, ADX can work like just another data source with the advantages of M query parameterization, join strategy with dimensions, query results cache at shard level. Users can generate Power BI report from query right from Web UI or Kusto Explorer desktop application.
- Excel - The Azure Synapse Data Explorer Excel native connector offers the ability to import query results to Excel. It also enables importing a KQL query as an Excel data source for additional calculations or visualizations. Once you select ADX to get data, you can work with a Power Query editor to transform data for Excel use.
- Grafana – Grafana is one of the widely used open-source dashboarding and visualization tool. Azure Synapse Data Explorer is a supported data source plugin in Grafana. Users can query and visualize ADX data, create or share dashboards with advantages like inbuilt query builder, and leveraging native Grafana capabilities such as Grafana Alerts.
- Kibana – With K2 (Kibana Kusto) bridge, users that are familiar and conversant with Kibana and are migration from Elastic Search to Azure Synapse Data Explorer use Kibana to search, observe and analyze their ADX data.
- Redash – is a popular open-source tool that supports SQL, NoSQL, Big Data and API data sources. Azure Synapse Data Explorer is a supported data source in Redash enabling users to query data and visualize the results in Redash.
To learn more, read Azure Synapse Data Explorer data visualization.
Data export
ADX can natively export its data using a set of control commands. Using this method, the results of a KQL query are written to an external storage (Azure Blob Storage or Azure Data Lake Store Gen 2), or a SQL table. Export can be a good way to store data for later inspection, archiving or to make data available for downstream platforms such as spark for training Machine Learning models.
Export can be done in 2 ways:
Characteristics |
One-time Export |
Continuous Export |
How? |
||
Frequency |
Manual, on every execution |
Based on defined time period |
Supported formats |
csv, json, parquet |
csv, json, parquet |
Use cases |
One time export of historical data for storage, output of query returning more than 500k rows or 64MB of data or ad-hoc business need |
Regular, schedule-based to support Disaster Recovery scenarios, data backup, supporting downstream applications and platforms that rely on the data stored in Azure Synapse Data Explorer |
Notebook connectivity
Synapse/ Apache Spark - Using kusto-spark connector, users can read from their spark notebooks from Azure Synapse Data Explorer and also write data directly to an ADX table. This capability is useful in supporting machine learning or complex data transformation scenarios that are performed using Spark notebooks.
Additionally, when using Synapse Data Explorer (i.e. Data Explorer pools in Synapse Workspace), there is a native integration between Synapse Spark Notebooks and Data Explorer pools enabling users to right click on an ADX table and select “Write DataFrame to table” to get a prebuilt spark script with table’s data added to a data frame in a Notebook cell.
Kql Magic – kqlmagic is a library that extends the capabilities of the Python kernel in Jupyter Notebooks or Azure Data Studio enabling users to write Kusto Query Language statements within the notebook. For example, users can easily combine Python and Kusto Query Language to query and visualize data using the rich Plot.ly library integrated with render commands.
Connectors
- Using ODBC connector, users can use Tableau, and Qlik to visualize data from Azure Synapse Data Explorer. Sisense is another data analytics platform that can visualize data from ADX using JDBC connector.
- Azure Synapse Data Explorer supports a subset of the Microsoft SQL Server communication protocol (MS-TDS), with a subset of the T-SQL query language. Users can write SQL queries to retrieve data from ADX or using KQL over TDS. .Net SQL Clients, clients that support JDBC/ODBC drivers, PowerShell, Azure Data Studio, SSMS, Power BI, MATLAB support using MS-TDS endpoint over ADX.
API and client libraries:
To programmatically access data from Azure Synapse Data Explorer, users can use REST API or SDK of ADX.
API - Using REST API, we can
- Query Data – Retrieve results based on KQL query
- Ingest Data – Load data into ADX database
- Modify Metadata – Create, alter, drop ADX entities like tables, functions and Materialized Views
- Run control commands – Monitor metadata transactions using commands with. show. Ex:.show ingestion failures | where FailedOn > ago(1d) gives ingestion failures in the last 1 day.
REST API supports the following endpoints and actions:
SDK: Similarly, client libraries are available in python, R, java, .Net, Go, Node js and PowerShell SDK. For more information and to learn more, review Client Libraries.
Now that we have learned tools and approaches to consume data from Azure Synapse Data Explorer, lets solve the ACME Chemicals business scenario.
Example: ACME chemicals data consumption scenario solved
1. Using native ADX Dashboards, ACME Chemicals can create a dashboard
a. Use stat visuals to showcase near real time chemical vats metrics like current temperature, and mixer speed
b. Use time chart to show spin rate, catalyst consumption, output rate over time
c. Use Pie chart to show top 5 chemicals and their distribution by production volume
2. ACME chemicals used Logic Apps to run KQL queries at 30 min intervals to check for conditions and send email alerts.
3. Using Kusto-Spark connector, ACME chemicals read the data from Data Explorer into Synapse spark notebooks and train ML models.
4. ACME Chemicals uses continuous export to export data to Azure Storage account and read this data using external table.
Summary
This blog introduces the various methods and tools to retrieve and visualize data from Azure Synapse Data Explorer. As an analytical platform, Azure Synapse Data Explorer provides a comprehensive, rich ecosystem of options that can suit varied personas with different skill sets and requirements. This ecosystem of consumption and visualization options has been built with the clear intention and purpose of meeting users where they are.
This blog also marks the end of the Demystifying Data Explorer series of blogs that began with explaining what is data explorer (Demystifying Data Explorer), followed up by how to ingest data in data explorer (Demystifying Data Ingestion in Azure Synapse Data Explorer) and concluding with this blog on how to consume and visualize the data from Data Explorer.
Our team publishes blog(s) regularly and you can find all these blogs at https://aka.ms/synapsecseblog. For deeper level of understanding of Synapse implementation best practices, please refer to our Success by Design (SBD) site at https://aka.ms/Synapse-Success-By-Design
Published on:
Learn moreRelated posts
Azure Database for PostgreSQL Flexible Server - Elastic Clusters, faster disks, and AI updates
Increase scalability, optimize performance, and integrate advanced AI features with Azure Database for PostgreSQL Flexible Server. Scale up wi...
Disconnected operations for Azure Local
Introducing the new Linux-based Azure Cosmos DB Emulator (Preview)
We are excited to announce the preview release of the new Linux-based Azure Cosmos DB Emulator! This latest version is built to provide faster...
Azure Cosmos DB Shines at Microsoft Ignite 2024!
Microsoft Ignite 2024 took over the Windy City this week, bringing with it new technological innovation and exciting product announcements apl...