Loading...

Understand Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool

Understand Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool

Two recurring questions I frequently get from customers are: "What is the difference between Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool?" and "Which one should I choose for my Business?"

 

This post is intended to explain the basic concepts of dedicated SQL pool and Serverless SQL Pool, help you understand how they work, and how to use them based on your business needs.

 

Dedicated SQL pool and serverless SQL pool are analytics runtimes of Azure Synapse Analytics. Both allow you to work with data using SQL.

 

Dedicated SQL pool

 

One or more dedicated SQL pools can be added to a workspace (for reference, please read Quickstart: Create a dedicated SQL pool using Synapse Studio). If not in use, you can pause your dedicated SQL pool to avoid unnecessary costs. When you want to use it again, you will need to resume your dedicated SQL pool first and it will be billed once the database is online. Different Service levels (DW100c to DW30000c) have different costs. DWUs define the number of resources available. DWU stands for Data Warehousing Units and determine resources provisioned, considering a combination of CPU, memory, and IO. To learn more, please read Pricing and Memory and concurrency limits.

 

Dedicated SQL pools have 60 fixed distributions. When a table is created, data is shared amongst them. 

 

At a high level, the basic query flow is as follows: when a query is executed, it will be sent to the Control Node (Fig 1).  The Control node is the brain of the architecture that will optimize and coordinate the parallel queries. Compute nodes provide the computational power for the execution and the distributions will be mapped to those nodes according to the DW size. Using this concept, the engine will divide the work of executing the query into 60 smaller queries that will run in parallel to obtain data from the distributions. Data distribution patterns will be chosen when the table is created. 

 

Liliam_Leme_0-1660035932319.png

Fig 1

 

These are the types of distribution data patterns that you can define at table creation: 

  • Round RobinA distribution is first chosen at random and then buffers of rows are assigned to distributions sequentially.
  • Hash - Dedicated SQL pool uses a hash function to deterministically assign each row to one distribution based on a defined key (table column).
  • ReplicateA full copy of the replicated table will be cached on each compute node.

To learn how to use the different distribution types with sample scenarios, please read Synapse SQL architecture.

 

Example of query syntax of a table creation using Round Robin: 

 

 

--Round Robin CREATE TABLE [dbo].[TableName] ( [ColumnName] Datatype NOT NULL/NULL, [ColumnName] Datatype NOT NULL/NULL, [ColumnName] Datatype NOT NULL/NULL, ) WITH ( DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX )

 

 

Concurrency

I will provide a brief summary of this topic as delving into details deserves a separate blog post. Within your dedicated SQL pools, you want to organize the priority of the user execution and the amount of resources their queries would be able to get. Concurrency is limited per service level and can be managed and prioritized using workload management. To learn more, please read Workload management. For additional information about concurrency limits, please read Memory and concurrency limits

 

Some scenario examples:

  • Physical data warehouse - as data can be stored, organized, and modeled inside of the database
  • Reports
  • Dashboards that demand a fast sub response
  • A scenario where managing query concurrency is a business need. Dedicated SQL Pools enable you to govern the query execution resources inside of the database.

 

Serverless SQL Pool

 

A Serverless SQL pool endpoint will be available on your Azure Synapse workspace by default. You can choose whether to use or not. As it's billed on demand, there is no cost if it's not in use. The Serverless SQL pool endpoint allows you to create multiple databases and you can use it to query data from your Azure Data Lake (ParquetDelta Lakedelimited text formats), Cosmos DB, or Dataverse.

 

Your Serverless SQL pool is managed and will scale up or down automatically. There's no infrastructure to set up or clusters to maintain. This means the topology changes over time by adding and removing nodes or failovers as needed. For example, the figure below (Fig 2) shows a query using 4 backend nodes. If necessary, Serverless SQL pool would be able to scale up to more nodes.  

 

At a high level, the flow is as follows. A query submitted to the front-end node (Control Node) will create an execution plan and pass it to Distributed Query Processing which will break this execution into "pieces/tasks" and assign them to Backend Nodes (BE) for execution. The BEs will receive assigned tasks to execute the query against the storage files organized in units.

 

Liliam_Leme_0-1660037650896.png

Fig 2

 

Selects are fully supported, as this option will be used to query and organize files from storage but will not store them in a database structure. DML operations such as Insert, Update, and Delete are not supported. Table creation is also not supported. Only external tables are supported in this context. To learn how to use CETAS, please read How to use CETAS on serverless SQL pool to improve performance and automatically recreate it. For more information about T-SQL support, please read T-SQL feature in Synapse SQL pool.

 

Concurrency

Concurrency is not limited as in Dedicated SQL pool. In a Serverless SQL pool, you can have a flexible number of active queries, not a fixed value. To learn how to monitor open connections, please read Monitoring Synapse serverless SQL open connections.

  • The number of active sessions and requests depend on query complexity and amount of data scanned.  
  • As with any SQL, a Serverless SQL pool can handle many different sessions that are executing lightweight queries or complex heavy queries consuming most of the resources while the other queries wait.

Some scenarios:

  • Basic discovery and exploration.
  • Logical data warehouse - As data is organized in files on your storage account. Views and external tables can be used to organize them.
  • Reports.
  • Data transformation.

 

Summary

The table below summarizes how Serverless and Dedicated SQL pools compare (Table 1):

 

Features

Serverless SQL Pool

Dedicated SQL Pool

Select Surface Supported

Yes

Yes

External Tables

Yes

Yes

Delta File Support

Yes

No

Query Spark Tables

Yes

No

Concurrency

Not limited

Limited according to Service Level  (check docs)

Concurrency managed (resource governor)

No

Yes

Create Table

No

Yes

Insert, Update and Delete commands

No

Yes

View

Yes

Yes

Materialized View

No

Yes

Index creation

No

Yes

Custom management of table distribution like Hash, Round Robin, and Replicated.

No

Yes

Works with ADF

Yes

Yes

Works with Power BI

Yes

Yes

CETAS

Yes

Yes

Pay per use (data processed by query)

Yes

No

Pay by minutes online (managed by a pause and resume operations)

No

Yes

 

Table 1

 

Note: I am writing this post in August 2022 and the information included above is currently validated by docs. It is recommended to always confirm and look for updates and changes in Microsoft's official documentation.

 

For reference, please read:

What is dedicated SQL pool (formerly SQL DW)?

Serverless SQL pool

Synapse SQL architecture

Memory and concurrency limits

 

And that's it!

 

I hope it helps,

 

Liliam 

UK Engineer

Published on:

Learn more
Azure Synapse Analytics Blog articles
Azure Synapse Analytics Blog articles

Azure Synapse Analytics Blog articles

Share post:

Related posts

Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy