How to use Cosmos DB at extreme scale with large document sizes

Introduction

First, let’s provide an overview of how Cosmos DB is being used in this scenario. As per the architectural overview shown below, there are an ecosystem of Customer systems that emit data, these hit a pipeline and are then inserted into Cosmos DB via a Batch process.

The original design from 2019 had only a single container, Data, which contained all of the documents that were ingested into Cosmos DB. As the Data Container grew and reached a point where Query performance and RU/s became prohibitive. An Index Container was introduced to function as an 'Index' for the Data Container. This design change mandated the following data access pattern:

Query the Index container for the Document ID
Retrieve the Document from the Data Container via a Query using the Document ID
If the Document is >2MB in size, the Document payload will have a Blob Storage Pointer
Retrieve the Document from Blob Storage via GET Operation

At minimum, it’s always two Cosmos DB operations to retrieve a document with a GET Blob operation being needed if the Document Size is >2MB.

The Index Container then underwent further changes by introducing Composite Indexes to optimise the RU/s cost of running queries against it as its size increased in correlation to the Data Container. There are a several challenges plain when reviewing this architecture:

2 or 3 operations are needed to read a single document.
The Index Container necessarily has a duplication of data within the Data Container.
The size of the Data Container and therefore the Index Container are inextricably linked – as the Data container grows, so must the Index container.
A document does not exist until it can be found, predicating the Index Container write needing to be successful for data consistency to be achieved.
The overall evolvability of this architecture is limited.

It’s clear that this architecture is running out of evolutionary capacity. How many more times can it change, reasonably, within this design – not many. Introducing Composite Indexes was the last change that could be made within this architecture to have a reasonable performance and RU/s benefit. A series of architectural changes are now needed to move this architecture back to a baseline position and allow for further evolution.

Recommendations

When evaluating what to change, we had the benefit of being able to take a fresh look at the capabilities of Cosmos DB and the current and future needs of the customer – Cosmos DB has evolved significantly since 2019 and has features and companion services that solve the challenges presented.

The changes that were recommended are as follows:

Enable Dynamic Autoscaling
- After enabling this feature the Customer experienced an immediate and significant cost reduction across their Dev/Test/Pre-Prod and Production Cosmos DB accounts.
- Dynamic Autoscaling allows for partitions to scale independently. Improving cost efficiency for non-uniform large-scale workloads with multiple partitions.
Smooth out Batch Insert/Update/Delete data processing
- Cosmos DB is billed by the Hour – it’s important to remember that how you use Cosmos DB influences cost. In this scenario, a Batch process executed on both containers to update data.
- This however set the RU cost between 45% and 55% higher than the typical data access pattern.
- Smoothing Batch out at the expense of update speed would help mitigate this cost increase.
Enable Cosmos DB Integrated Cache
- Using Cosmos DB Integrated Cache would mean there is no RU cost for a repeated query.
- Cosmos DB Integrated Cache requires no administration or management.
- Each Node can store up to 64GB of Independent Cache.
- High RU and complex queries and point reads > 16KB would be expected to benefit from using Cosmos DB Integrated Cache.
Enable Hierarchical Partition Keys
- Hierarchical Partition Keys allows for data to be sub-partitioned, up to 3 levels deep – beneficial for large, read-heavy documents.
- Queries are efficiently routed to only the subset of physical partitions that contain the relevant data when specifying either the full or partial sub-partitioned key path.
Introduce a Container per Document Type
- This provides greater clarity over the composite indexes that can be applied per document type – especially useful if documents have a variety of property combinations.
- This also isolates high throughput document types that may not benefit enough from Hierarchical Partition Keys.
- Combined with Dynamic Autoscaling, these high throughput document type containers can scale independently.
Retire the Index Container
- Adopting the correct container strategy allows for the Index Container to be retired – allowing for single write consistency, mitigation of extra costs and removes 38TB of duplicated data.
Implement Cosmos DB Analytical Store
- Cosmos DB Analytical Store is a fully isolated column store, optimised for analytical queries - as opposed to running queries of this type on the Transactional Store and incurring an RU/s cost.
- Analytical Store queries are served at 0 RU/s.
- Synapse Link seamlessly replicates data within 2 minutes from the Transactional Store to the Analytical Store.
- Underpins a more efficient data discovery strategy as custom partitioning is supported.
Replicate data using the Cosmos DB Change Feed
- If there is a need to replicate data from Cosmos DB to another database e.g., Azure SQL, use the Cosmos DB Change feed with Azure Functions.
- This pattern is capable of 100,000+ requests per second.
Use Premium Blob Storage
- Storing Documents >2MB on Azure Blob Storage is a recommended design pattern for Cosmos DB.
- Premium Blob Storage will help with accessing these documents faster given their size.

Conclusion

We’ve covered a lot of ground, but hopefully you now have an understanding of the Cosmos DB features that can be enabled to solve challenging requirements at scale.

When designing an architecture, it’s critical to ensure it remains evolvable. The way to achieve this is to conduct regular reviews of the architecture against ‘current’ best practices, product capabilities and the needs of the business – and refactor/rearchitect where necessary.

Microsoft invests in the Well-Architected Framework, and we can conduct these reviews in partnership with you or you can complete them independently. Conduct these reviews at least every six months, at most, every year.

The more you can adopt the features of the product to solve your requirements, the more evolvable your architecture will be over the longer term. One of the key aims, in this scenario, was to move the Customer to a ‘baseline’ position.

We wanted to reset their Cosmos DB architecture and enable more Cosmos DB features to solve their requirements as opposed to bespoke implementations and workarounds within the incumbent architecture.

Published on: May 30, 2024

Learn more

Azure Architecture Blog articles

Blog image