Understand Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool
Two recurring questions I frequently get from customers are: "What is the difference between Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool?" and "Which one should I choose for my Business?"
This post is intended to explain the basic concepts of dedicated SQL pool and Serverless SQL Pool, help you understand how they work, and how to use them based on your business needs.
Dedicated SQL pool and serverless SQL pool are analytics runtimes of Azure Synapse Analytics. Both allow you to work with data using SQL.
Dedicated SQL pool
One or more dedicated SQL pools can be added to a workspace (for reference, please read Quickstart: Create a dedicated SQL pool using Synapse Studio). If not in use, you can pause your dedicated SQL pool to avoid unnecessary costs. When you want to use it again, you will need to resume your dedicated SQL pool first and it will be billed once the database is online. Different Service levels (DW100c to DW30000c) have different costs. DWUs define the number of resources available. DWU stands for Data Warehousing Units and determine resources provisioned, considering a combination of CPU, memory, and IO. To learn more, please read Pricing and Memory and concurrency limits.
Dedicated SQL pools have 60 fixed distributions. When a table is created, data is shared amongst them.
At a high level, the basic query flow is as follows: when a query is executed, it will be sent to the Control Node (Fig 1). The Control node is the brain of the architecture that will optimize and coordinate the parallel queries. Compute nodes provide the computational power for the execution and the distributions will be mapped to those nodes according to the DW size. Using this concept, the engine will divide the work of executing the query into 60 smaller queries that will run in parallel to obtain data from the distributions. Data distribution patterns will be chosen when the table is created.
Fig 1
These are the types of distribution data patterns that you can define at table creation:
- Round Robin - A distribution is first chosen at random and then buffers of rows are assigned to distributions sequentially.
- Hash - Dedicated SQL pool uses a hash function to deterministically assign each row to one distribution based on a defined key (table column).
- Replicate - A full copy of the replicated table will be cached on each compute node.
To learn how to use the different distribution types with sample scenarios, please read Synapse SQL architecture.
Example of query syntax of a table creation using Round Robin:
Concurrency
I will provide a brief summary of this topic as delving into details deserves a separate blog post. Within your dedicated SQL pools, you want to organize the priority of the user execution and the amount of resources their queries would be able to get. Concurrency is limited per service level and can be managed and prioritized using workload management. To learn more, please read Workload management. For additional information about concurrency limits, please read Memory and concurrency limits
Some scenario examples:
- Physical data warehouse - as data can be stored, organized, and modeled inside of the database
- Reports
- Dashboards that demand a fast sub response
- A scenario where managing query concurrency is a business need. Dedicated SQL Pools enable you to govern the query execution resources inside of the database.
Serverless SQL Pool
A Serverless SQL pool endpoint will be available on your Azure Synapse workspace by default. You can choose whether to use or not. As it's billed on demand, there is no cost if it's not in use. The Serverless SQL pool endpoint allows you to create multiple databases and you can use it to query data from your Azure Data Lake (Parquet, Delta Lake, delimited text formats), Cosmos DB, or Dataverse.
Your Serverless SQL pool is managed and will scale up or down automatically. There's no infrastructure to set up or clusters to maintain. This means the topology changes over time by adding and removing nodes or failovers as needed. For example, the figure below (Fig 2) shows a query using 4 backend nodes. If necessary, Serverless SQL pool would be able to scale up to more nodes.
At a high level, the flow is as follows. A query submitted to the front-end node (Control Node) will create an execution plan and pass it to Distributed Query Processing which will break this execution into "pieces/tasks" and assign them to Backend Nodes (BE) for execution. The BEs will receive assigned tasks to execute the query against the storage files organized in units.
Fig 2
Selects are fully supported, as this option will be used to query and organize files from storage but will not store them in a database structure. DML operations such as Insert, Update, and Delete are not supported. Table creation is also not supported. Only external tables are supported in this context. To learn how to use CETAS, please read How to use CETAS on serverless SQL pool to improve performance and automatically recreate it. For more information about T-SQL support, please read T-SQL feature in Synapse SQL pool.
Concurrency
Concurrency is not limited as in Dedicated SQL pool. In a Serverless SQL pool, you can have a flexible number of active queries, not a fixed value. To learn how to monitor open connections, please read Monitoring Synapse serverless SQL open connections.
- The number of active sessions and requests depend on query complexity and amount of data scanned.
- As with any SQL, a Serverless SQL pool can handle many different sessions that are executing lightweight queries or complex heavy queries consuming most of the resources while the other queries wait.
Some scenarios:
- Basic discovery and exploration.
- Logical data warehouse - As data is organized in files on your storage account. Views and external tables can be used to organize them.
- Reports.
- Data transformation.
Summary
The table below summarizes how Serverless and Dedicated SQL pools compare (Table 1):
Features |
Serverless SQL Pool |
Dedicated SQL Pool |
Select Surface Supported |
Yes |
Yes |
External Tables |
Yes |
Yes |
Delta File Support |
Yes |
No |
Query Spark Tables |
Yes |
No |
Concurrency |
Not limited |
Limited according to Service Level (check docs) |
Concurrency managed (resource governor) |
No |
Yes |
Create Table |
No |
Yes |
Insert, Update and Delete commands |
No |
Yes |
View |
Yes |
Yes |
Materialized View |
No |
Yes |
Index creation |
No |
Yes |
Custom management of table distribution like Hash, Round Robin, and Replicated. |
No |
Yes |
Works with ADF |
Yes |
Yes |
Works with Power BI |
Yes |
Yes |
CETAS |
Yes |
Yes |
Pay per use (data processed by query) |
Yes |
No |
Pay by minutes online (managed by a pause and resume operations) |
No |
Yes |
Table 1
Note: I am writing this post in August 2022 and the information included above is currently validated by docs. It is recommended to always confirm and look for updates and changes in Microsoft's official documentation.
For reference, please read:
What is dedicated SQL pool (formerly SQL DW)?
And that's it!
I hope it helps,
Liliam
UK Engineer
Published on:
Learn more