Getting rid of credentials in Azure - Part 4 (Kubernetes)
The journey to rid ourselves of credentials in Azure continues, and this time we'll tackle a much in demand service - Azure Kubernetes Service (AKS). Because while Azure App Services, which we have covered in previous parts, are nice and dandy they simply don't serve all your microservice needs.
AKS is kind of a double feature when it comes to going credless:
- The first is what we've been doing up until now. The provisioning and configuration of the cluster is handled by a GitHub Action which is trusted by Azure through using a federated credential.
- The second is that Kubernetes is a system of its own and part of that is having a credential and identity model inside the cluster regardless of Azure. This can in turn be configured to have Azure trust the cluster and create a bridge between the cluster components and Azure resources.
We covered managed identities in the last part so you might be thinking what's different about this. Well, you can use managed identities where you want the cluster resource itself having access to Azure, but this doesn't necessarily extend into the containers running in the cluster. Workload identities allows us to inject Azure AD identities into individual pods to have them perform tasks without dealing with client ids and secrets directly.
For instance it could be that the cluster needs to be able to access a container registry to provision containers - this makes sense to solve with a managed identity belonging to the cluster since that clearly comes before you have the app instances running. However, if an application running in the cluster needs access to a SQL server that is more likely to be specific to the application than cluster-wide. (I'm not accounting for shared database server setups here.)
The concept is explained on the official GitHub page for Azure Workload Identity:
https://azure.github.io/azure-workload-identity/docs/concepts.html
And in case you noticed there's a quick start there I drew inspiration (and some files) from that, but those instructions are geared more towards you working manually from Linux or Mac. I made the necessary adjustments for it to work in a CI/CD pipeline in an automated fashion instead. If you're trying to follow the instructions you might find them being slightly unclear in some areas and I would have to agree there can be a hurdle to get everything right at first try. The docs gets stuff done, but is not at a GA level quality yet. (Since the feature is in a preview this is to be expected, and I would rather have the team working on the code as long as I'm able to get the basics working.)
If you followed along when we looked at EasyAuth you might be thinking this is along the same lines. The lower level details are probably different, but you can understand it as something similar on a high level - "something" outside the application code performs some magic that does something for you so there is less need for the identity specific secrets to be part of the code.
As always - it's better to illustrate with sample code. We employ the same base level as previous posts - we store Bicep code for IaC on GitHub and through federated credentials we allow GitHub Actions to reach into Azure and execute scripts.
Check out part 1 if you don't have the basics configured yet:
https://techcommunity.microsoft.com/t5/azure-developer-community-blog/getting-rid-of-credentials-in-azure-part-1/ba-p/3265205
You're obviously encouraged to check out part 2 & 3 as well if you haven't, but this post has no dependencies on those.
All the code can be found on GitHub:
https://github.com/ahelland/Bicep-Landing-Zones/tree/main/credless-in-azure-samples/part-4
What we want to provision here is an Azure Kubernetes Service cluster, install/configure workload identities, and install a sample app that uses a Kubernetes Service Account to authenticate against MS Graph. Wheh, that's a mouthful :)
It's sort of an elaborate process so I will walk through the Action step by step and explain along the way.
Create Azure AD AKS Admin Group
We need a group in Azure AD containing the admins that are allowed to manage the cluster. Note that this is for control plane access - not data level access.
We don't actually add members to the group; you can do that separately if you like; we just need a group present. As the owner of the sub I'm working with this is not a roadblock as I can still access the cluster resource.
Deploy Azure Kubernetes Service and Azure Container Registry
We need an Azure Container Registry and a cluster plus the network to attach it to, and this is deployed with Bicep:
We're not creating a cluster with a lot of complex configurations, hardened to the nines and things - the main thing is enabling the workload identity feature:
Assign AKS RBAC role to the GitHub Action user
The GitHub Action needs to be able to configure things in the cluster, not just create it, and needs data plane admin access to do so. This is done by assigning the "Azure Kubernetes Service RBAC Cluster Admin" role.
When you've reached this step you should also go into the Azure Portal to give yourself the same permission. Even if you are the subscription admin you will not be able to look at running containers, services, etc. and without that you are not able to retrieve the IP address of the sample app. It takes a couple of minutes for the role assignment to kick in so you might as well do it now while GitHub is still working on the rest. (If you happen to look at the logs live at least; no harm done if you wait for the process to complete and then do this.)
Get OIDCUrl
The tokens issued by Kubernetes are "regular" JWTs and one of the main things is having Azure AD trust the issuer (aka your cluster). We enabled an OIDC metadata endpoint with Bicep, and since we need the url for our later steps we retrieve the url. Note that it requires the aks-preview extension to work; if you don't have this installed you will not get a value in return when querying Azure and appear as you didn't configure this.
Get AKS Credentials
Perhaps not so surprising, but we need the credentials for the cluster to perform any actions at all.
The problem with the credentials we have retrieved are that they are intended for interative use. By default you will see in the live logs that there is a code you need to use to authenticate separately outside the Action, and as you probably understand that's sort of tricky doing in an automated fashion so we need to perform some extra steps to make it work in a pipeline.
I needed some help in figuring out the exact details of this so the next three steps are courtesy of Weinong Wang:
Get kubelogin
There's actually a tool for manipulating the kubeconfig file:
https://github.com/Azure/kubelogin
We install it with Brew in this step.
Convert kubeconfig for non-interactive use
There are several options when it comes to the output format you want/need, and we specify workloadidentity. (Check the kubelogin repo for more details.)
Retrieve id-token and store it
With a more appropriate kubeconfig we can set ourselves up with a token that will allow GitHub to not only talk to Azure in general, but our AKS cluster specifically:
Install Mutating Admission Webhook
Kubernetes can do automagic things when you submit yaml files, and since we neither want to (nor are we easily capable of) preprovisioning the token in our application specification we need to have a component handling the injection of the necessary variables into the running containers. This is handled by a mutating admission webhook - further explained here:
https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
Create Service Principal for Workload Identity
We need a separate service principal in Azure AD for our workload identity so we provision that accordingly:
Install Service Account
Using the attributes of the service principal we just created we provision a service account in our cluster:
Establish Federated Credential
Once we have the service principal in Azure AD and the service account in Kubernetes we create a federation between the two:
The step is set to continue on error. This is because after the initial run-through the API will throw an error since the credential has already been created. I didn't bother with making this more robust with conditionals.
If you delete the cluster you should also either delete the federated credential or the service principal itself (named sp-credless-aks) or you might run into other issues.
Retrieve name of Container Registry
We want our AKS cluster to pull from our own registry, and that requires us to know the name of the registry. (Since the name needs to be globally unique we have a random element in the name and we need to query Azure to figure out what the resource is actually called.)
Integrate ACR and AKS
Instead of supplying credentials in the deployment yaml files we set up ACR to trust our AKS cluster. This is technically done behind the scenes by creating managed identities and assigning a role, so we could embed it as part of our Bicep, but for our sample it's more flexible doing it with a simple Azure cli command.
Build and push backend container to ACR
This is where things get more exotic if you are used to using a client id and secret to acquire a token from Azure AD in your code.
Actually, let's rewind things one step, to explain what we want to do. We said that we wanted an application calling into the MS Graph deployed as part of this Action. Mainly as a proof things are working more than doing anything useful. For the purposes of this demo we do this API call as a backend task and create a separate C# solution for that. (You can run through the API project wizard in Visual Studio to get something similar to the code here.)
We perform some MSAL trickery using our federated identity to acquire a new token:
This is invoked when the frontend calls into the API, and we subsequently call into the MS Graph to acquire the organization name of the tenant:
This is based on a sample from the workload identity team:
https://github.com/Azure/azure-workload-identity/tree/main/examples/msal-net/akvdotnet
The code for the app is stored in the same repo as the rest of our setup, so what we do is have the Action also package up things into a Docker image and push to our container registry at the same time - no external dependencies here.
Build and push frontend container to ACR
The sample app is clearly not very advanced, so you could ask why we have a frontend and not just a single web app that does it all. There's always the benefit of verifying that the Kubernetes network is up to scratch of course, but on a feature level in AKS there are things that we have not configured - we're not doing SSL, which in turn means we're not able to do login for users, and we haven't set up DNS either ending up accessing the app over plain http with a public IP address. So, while it's not a security mechanism per se it's cleaner to do the auth pieces on the backend where it's less accessible. (Not that I recommend this to be used as-is outside PoC use anyways.)
So, we package up and upload the frontend image the same way as the backend.
Add Permissions
The federated identity is tied to the service principal we created, and using the Graph still requires having permission to do so. Since we perform this as a server side app we need an application permission (as opposed to a delegated permission when accessing things as a user) and I've chosen User.Read.All for this. The thing is that this requires admin consent to actually work and Azure doesn't like one non-interactive user granting access for another non-interactive user so running the script like this will not work out of the box.
There's two main workarounds for this - throw the GitHub Action user into something like the Global Admin role (the permission to read the Graph is not on the subscription level, but on the AAD tenant), or manually login to the portal and click the "OK" button :) (I chose option #2.)
The Action step will still go through without errors even it the permission isn't actually granted.
Deploy workload-identity-app
If everything checked out so far we're at the step where the sample app is deployed. Just replace the necessary variable replacements and off you go.
You can use your cli locally and get credentials for connecting to the cluster, but if you just want to browse the app it's probably quicker to go to the Azure Portal and have a look in the Services and Ingresses blade:
Since we had the manual step with granting permissions you might get lucky and see a response immediately, or it might take a little while before it ends up looking like this:
I don't know about you, but I think this is a pretty neat party trick :)
Closing remarks
Clearly you would want to separate the provisioning of the cluster and packaging & deploying an app if you want to take things further, but either way it shows how useful workload identities can be.
You might also interject that I have tons of things I'm not doing here (that I should be doing) like making it work with a service mesh, or do deployments with Flux, etc. I'd love to do something like that, but that's not a task for today :)
We have just created one workload identity and we haven't really done anything to prevent different apps from re-using the same identity. This could lead to very bad things depending on which permissions you grant to the service principal, so do make sure you have a handle on this before going live.
Are we done yet? With this post - yes. With the series - no :) This was mostly infra details with some magic in the background, but we'll need to work more on the user-facing pieces.
Published on:
Learn moreRelated posts
Azure Developer CLI (azd) – November 2024
This post announces the November release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) – November 2024 appeared...
Microsoft Purview | Information Protection: Auto-labeling for Microsoft Azure Storage and Azure SQL
Microsoft Purview | Information Protection will soon offer Auto-labeling for Microsoft Azure Storage and Azure SQL, providing automatic l...
5 Proven Benefits of Moving Legacy Platforms to Azure Databricks
With evolving data demands, many organizations are finding that legacy platforms like Teradata, Hadoop, and Exadata no longer meet their needs...
November Patches for Azure DevOps Server
Today we are releasing patches that impact our self-hosted product, Azure DevOps Server. We strongly encourage and recommend that all customer...
Elevate Your Skills with Azure Cosmos DB: Must-Attend Sessions at Ignite 2024
Calling all Azure Cosmos DB enthusiasts: Join us at Microsoft Ignite 2024 to learn all about how we’re empowering the next wave of AI innovati...
Query rewriting for RAG in Azure AI Search
Getting Started with Bicep: Simplifying Infrastructure as Code on Azure
Bicep is an Infrastructure as Code (IaC) language that allows you to declaratively define Azure resources, enabling automated and repeatable d...
How Azure AI Search powers RAG in ChatGPT and global scale apps
Millions of people use Azure AI Search every day without knowing it. You can enable your apps with the same search that enables retrieval-augm...