Introduction to Kusto Query Language (KQL)
Author: @SuryaJ is a Program Manager in the Azure Synapse Customer Success Engineering (CSE) team.
Introduction
Kusto Query Language (KQL) is a powerful query language to analyse large volumes of structured, semi structured and unstructured (Free Text) data. It has inbuilt operators and functions that lets you analyse data to find trends, patterns, anomalies, create forecasting, and machine learning. Along with Azure Synapse Data Explorer, other Azure native services like Log Analytics, Azure Monitor and App Insights also use KQL to query data. This blog will be an introduction to KQL and its usage in Azure Synapse Data Explorer.
A KQL query is a read only request that takes a tabular input and produces a tabular output. It follows a simple Unix shell script like structure and uses a Top-Down approach for the query structure. Each operator is separated by a ‘|’ (pipe) delimiter. Similar to relational database systems there are operators available like where, joins, union, and more. In addition, there are also functions available for aggregations, geospatial analytics, Time series analytics and some machine learning as well.
This blog covers the following topics:
- Motivation for a new query language? – Kusto Query Language (KQL)
- Can I write SQL to query data? - SQL to KQL journey
- What entities are present in Azure Synapse Data Explorer? What data types are present to write KQL? – Schema and data types
- How to write a KQL query? – Understanding basic KQL structure
- What happens behind the scenes? – Query evaluation and optimization
- How do I get started? – Some basic and commonly used KQL operators
- Let’s write some queries!!
Kusto Query Language (KQL)
KQL is a Microsoft homegrown query language that is made open source on GitHub. The language was developed with freedom and scale in mind. The queries are easy to read and adopt, with a lot of terms taken from plain English. For Example, if we want to select sample 10 records from a table called ‘NycTaxi’, the query is simply “ NycTaxi | take 10 “.
Here are certain reasons that showcase the importance of creating KQL.
- Start with the Table – In KQL, authoring a query becomes easier as we start with the context of table name first. Because of this, user can concentrate on the logic and not in choosing the right columns, filters and aggregates.
- Easy to understand query structure – Not straying too far from relational query languages, a lot of operators, filters and functions are available in KQL except in an easier to understand query structure. More on the query structure and formation detailed below.
- Productivity boost – with plethora of powerful inbuilt functions, number of lines of KQL code compared to other languages is very less. Ex: A 4–5-line code in python to create timestamp bins for 1 day is done in KQL using bin() function as bin(timestamp,1d)
- IntelliSense at Fingertips - IntelliSense starts helping us right from selecting correct table, columns to filters and aggregates.
SQL to KQL Journey
Azure Synapse Data Explorer also supports a subset of SQL queries. You can simply write a SQL query and execute to display results.
KQL provides an operator called "explain" to translate SQL queries into KQL
While this approach is fine on simple queries and learning KQL, it is recommended to use KQL for Azure Synapse Data Explorer for more complex analytics. SQL to KQL Cheat Sheet is available to help in the journey from SQL to KQL!
Schema and data types
In order to write KQL, let us understand the schema and entities in Azure Synapse Data Explorer.
Similar to relational databases, entities are structured in the following hierarchy.
KQL supports the following datatypes:
- Boolean
- Integer
- Real
- Decimal
- Dates
- Timespan
- String
- Guid
- Dynamic (JSON)
While all other data types are standard ones, dynamic is a proprietary data type of Azure Synapse Data Explorer. It helps to traverse through a Json structure and extract any scalar values from arrays or property bags.
In the following example, we see that ‘Trace’ is dynamic column with Json like data. We are able to extract the key-value pair EventType=Demo using the extend operator(explained further below) and dot(.) notation
More on the dynamic data type here : The dynamic data type - Azure Data Explorer | Microsoft Learn
Understand the basic KQL query structure!
Let’s take an example KQL query and break it down!
- 𝐒𝐭𝐨𝐫𝐦𝐄𝐯𝐞𝐧𝐭𝐬 denotes the table we want to query. This single line is equivalent to ‘select * from StormEvents’ in SQL
- Line 2 and 3 are applied to funnel/filter the data from StormEvents table.
- Line 2 filters ‘StartTime’ datetime column to be between 01 Nov 2007 and 01 Dec 2007
- Line 3 filters ‘State’ column to be Florida
- Finally apply the count() aggregation on the filtered data.
Query evaluation and optimization
Once a KQL query is received to Azure Synapse Data Explorer Engine, it passes the query through certain stages to make it an optimized and distributed query.
Let’s take an example query and see how it is transformed passing through above stages.
The initial RelOp tree for this query looks like this:
And the distributed query looks like this:
- Data nodes and shards with relevant data are selected.
- Filters and aggregates are pushed down to each shard.
- A union of filtered/aggregated shards happen at node level.
- Another round of aggregation happens at node level.
- Another round of union happens on the aggregated nodes data.
- A final aggregation happens on top level.
Basic KQL operators
Now that we have seen how a query is structured and optimized by Azure Synapse Data Explorer Engine, we can start writing some basic KQL.
Most of the KQL queries can be fulfilled by certain common operators listed below:
Operator |
Description |
Example |
where |
Filters source data |
.. | where dept == “IT” |
extend |
Computes and adds a new column to source |
.. | extend col3=col1+col2 |
count |
Counts # of records in a table |
.. | count |
project |
Choose required columns in the output |
.. | project col1, col2 |
take |
Get random sample records |
.. | take 100 |
summarize |
Perform aggregations based on function provided after this |
.. | summarize avg(col1) |
join |
Merges rows of two tables based on matching fields |
tbl1 | join tbl2 on id |
union |
Return rows from 2 or more tables |
tbl1 | union tbl2 |
Note: The list is not indicative of final list of operators but commonly used ones.
Queries!!!
You can run the following queries using the help cluster which is a read only cluster with already ingested data.
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
Output:
print_0 |
Hello KQL!! |
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
Output:
While this blog introduces KQL query and certain operators, you are encouraged to play around with more complex data and operators by
- Creating a Free cluster, ingesting data and querying.
- Use the help cluster and start querying – Cluster Uri: https://help.kusto.windows.net/
Stay tuned for deep dive into more KQL functions related to Time series, Geo Spatial and Machine Learning and also custom KQL codes(pre-written) in the upcoming blogs.
Our team publishes blog(s) regularly and you can find all these blogs at https://aka.ms/synapsecseblog. For deeper level of understanding of Synapse implementation best practices, please refer to our Success by Design (SBD) site at https://aka.ms/Synapse-Success-By-Design
Published on:
Learn more