Efficient Identity Management in Azure Chaos Studio for Secure Fault Injection
In my previous blog on chaos engineering, I talk about how to improve application resilience with Azure Chaos Studio. Fault injection impacts resources directly, and it must be done in a controlled and secure way to prevent unforeseen damages to your application. Hence, in this post, I show you how to efficiently manage permissions to perform secure fault injection.
Our customers have been asking for the ability to reuse fault injection permissions across multiple chaos experiments. And we listen! Earlier in August 2023, the Azure Chaos Studio team launched the public preview of User-Assigned Managed Identity and Custom Role Assignment. There are 2 types of Azure managed identities. User assigned managed identity is a standalone Azure resource that can be managed separately from the resource(s), unlike a system assigned managed identity that can only be used by a single azure resource. With user assigned managed identity, the same set of permissions can be used across multiple Azure chaos engineering experiments and other Azure resources. Previously, with system managed identities, the identity is tied to, and lives with the experiment. The additional capability of custom role assignment lets Chaos Studio to automatically create and assign a custom role to your experiment's identity if it requires permissions outside the scope of your selected identity. I will show how both these features can be used with the following example scenarios.
Scenario 1: Reuse of managed identities
Let’s take a scenario where you are testing a Cosmos DB failover. You have access to an existing user assigned identity named “UmiDemo” that has been reviewed by your security team to perform chaos experiments for your project. This identity is already being used for other chaos experiments. Now you want to use this identity for the Cosmos DB failover experiment without needing to create a new identity with same permissions which needs another security review. This is possible with the launch of user assigned identity support in chaos studio which allows more than one experiments to use the same identity.
As shown in the following screenshot, UmiDemo role has the Cosmos DB Operator permissions required for the Cosmos DB failover experiment.
You can directly go to the Chaos Studio’s create an experiment screen to create the failover experiment. On the permissions tab, under managed identities section, make sure to select user assigned identity option and add the Umi demo identity to which has already been verified to have the required permissions.
Scenario 2: Custom role assignment to managed identity
In this scenario, say you have a chaos experiment for shutting down a VM in your application. The experiment uses a managed identity called “vm_chaosexperiment” which has fault injection permissions to shut down a VM. Now, your team wants to add a new step of injecting another VM redeploy fault to the same experiment. You can do so without the burden of creating a new managed identity or a new role. The custom role assignment feature takes care of adding the new set of permissions automatically to the managed identity “vm_chaosexperiment” attached to the experiment.
As shown in the following screenshot you have an experiment with managed identity “vm_chaosexperiment”. You have to make sure the “enable custom role creation and assignment” option under custom role permissions section is enabled in order to add necessary permissions to the managed identity.
When you add VM redeploy fault to the same experiment which has custom role permissions enabled, a custom role has been created with additional permissions and assigned to the attached managed identity.
Note that only if you select the custom role assignment option and you have enough permissions from your security team to create the role, the role assignment happens to the managed identity. This improves security posture as not everyone can add the permissions to inject fault automatically. It's more useful if you have multiple steps and need multiple permissions in the same experiment.
With efficient management of the permissions for injecting faults to your chaos experiments, you can improve the experience of performing chaos engineering with Azure Chaos Studio. Happy chaos engineering and may your team shine with “We were prepared for that!” and “no surprises!” on your game day!!
What’s next?
- Get started by visiting our documentation or get started in the Azure portal. Let the chaos begin!
- If you’d like to learn more about the general principles prescribed by Microsoft, we recommend Microsoft Cloud Adoption Framework for platform and environment-level guidance and Azure Well-Architected Framework. You can also register for an upcoming workshop led by Azure partners on cloud migration and adoption topics and incorporate click-through labs to ensure effective, pragmatic training.
About the Author
Harshitha Putta is a Senior Cloud Solutions Architect in the international Customer Success Unit at Microsoft. As the cloud business continues to experience hyper-growth, she helps the customers to build, grow and enable their cloud.
Published on:
Learn moreRelated posts
November Patches for Azure DevOps Server
Today we are releasing patches that impact our self-hosted product, Azure DevOps Server. We strongly encourage and recommend that all customer...
Configuring Advanced High Availability Features in Azure Cosmos DB SDKs
Azure Cosmos DB is engineered from the ground up to deliver high availability, low latency, throughput, and consistency guarantees for globall...
IntelePeer supercharges its agentic AI platform with Azure Cosmos DB
Reducing latency by 50% and scaling intelligent CX for SMBs This article was co-authored by Sergey Galchenko, Chief Technology Officer, Intele...
From Real-Time Analytics to AI: Your Azure Cosmos DB & DocumentDB Agenda for Microsoft Ignite 2025
Microsoft Ignite 2025 is your opportunity to explore how Azure Cosmos DB, Cosmos DB in Microsoft Fabric, and DocumentDB power the next generat...
Episode 414 – When the Cloud Falls: Understanding the AWS and Azure Outages of October 2025
Welcome to Episode 414 of the Microsoft Cloud IT Pro Podcast.This episode covers the major cloud service disruptions that impacted both AWS an...