Best practices to harden your AKS environment

Hi,

AKS takes more and more space in the Azure landscape, and there are a few best practices that you can follow to harden the environment and make it as secure as possible. As a preamble, remember that containers all share the kernel through system calls, so the level of isolation in the container world is not as strong as with virtual machines, and even more as with physical hosts. Mistakes can quickly lead to security issues.

1. Hardening the application itself

This might sound obvious but one of the best ways to defend against malicious attacks, is to use bullet proof code. There is no way you'll be 100% bullet proof, but a few steps can be taken to maximize the robustness:

Try to use up-to-date libraries in your code (NuGet, npm, etc.), because as you know, most of your code is actually not yours.
Make sure that any input is validated, any memory allocation is well under control, should you not use frameworks with managed memory. Many vulnerabilities are memory-related (Buffer overflow, Use-after-free, etc.).
Rely on well-known security standards and do not invent your own stuff.
Use SAST tools to perform static code analysis using specialized software such as Snyk, Fortify, etc.
Try to integrate security-related tests in your integration tests

2. Hardening container images

I've seen countless environments where the docker image itself is not hardened properly. I wrote a full blog post about this, so feel free to to read it https://techcommunity.microsoft.com/t5/azure-developer-community-blog/hardening-an-asp-net-container-running-on-kubernetes/ba-p/2542224. I'll summarize it here, in a nutshell:

Do not expose ports below 1024, because this requires extra capabilities
Specify another user than root
Change ownership of the container's file system

3. Scanning container images

Most of the times, we are using base image to build our own images, and most of the times, these base images have vulnerabilities. Use specialized software such as Snyk, Falco, etc. Azure Defender for Containers (more on this later) has a built-in image scanning process leveraging Qualys behind the scenes. Once identified, you should:

Try to stick to the most up-to-date images as they often include security patches
Try to use a different base image. Usually light images such as Alpine-based ones are a good start because they embed less tools and libraries, so are less likely to have vulnerabilities.
Make a risk assessment against the remaining vulnerabilities and see if that's really applicable to your use case. A vulnerability does not automatically means that you are at risk. You might have some other mitigations in place that would prevent an exploit.

4. Hardening K8s deployments

In the same post as before (https://techcommunity.microsoft.com/t5/azure-developer-community-blog/hardening-an-asp-net-container-running-on-kubernetes/ba-p/2542224), I also explain how to harden the K8s deployment itself. In a nutshell,

Make sure to drop all capabilities and only add the needed ones if any
Do not use privileged containers nor allow privilege escalation (make values explicit)
Try to stick to a read only file system whenever possible
Specify user/group other than root

5. Request - Limits declaration

Although this might not be seen as a potential security issue, not specifying memory requests and limits can lead to an arbitrary eviction of other pods. Malicious users can take advantage of this to spread chaos within your cluster. So, you must always declare memory request and limits. You can optionally declare CPU requests/limits but this is not as important as memory.

6. Namespace-level logical isolation

K8s is a world where logical isolation takes precedence over physical isolation (more on this later). So, whatever you do, you should make sure to adhere to the least privilege principle through proper RBAC configuration and proper network policies to control network traffic within the cluster, and potentially going outside (egress). Remember that by default, K8s is totally open, so every pod can talk to any other pod, whatever namespace it is located in.

6.1 RBAC

Role-based access control can be configured for both humans and systems, thanks to Azure AD and K8s RBAC. There are multiple flavors available for AKS. Whichever one you use, you should make sure to:

Define groups and grant them permissions using K8s roles
Define service accounts and let your applications leverage them
Prefer namespace-scoped permissions rather than cluster-scope ones

6.2 Namespace-scoped & global network policies

Traffic can be controlled using plain K8s network policies or tools such as Calico. Network policies can be used to control pod-level ingress/egress traffic.

7. Layer 7 protection

Because defense-in-depth relies on multiple ways to validate whether an ongoing operation is legal or not, you should also use a layer-7 protection, such as a Service Mesh or Dapr, which has some overlapping features with service meshes. The main difference between Dapr and a true Service Mesh is that applications using Dapr must be Dapr-aware while they don't need to know anything about a service mesh. The purpose of a layer-7 protection is to enable mTLS and fine-grained authorizations, in order to specify who can talk to who (on top of network policies). Most solutions today allow for fine-grained authorizations targeting operation-level scopes, when dealing with APIs.

8. Azure Policy

Azure Policy is the corner stone of a tangible governance in Azure in general, and AKS makes no exception. With Azure Policy, you'll have a continuous assessment of your cluster's configuration as well as a way to control what can be deployed to the cluster. Azure Policy leverages Gatekeeper to deny non-compliant deployments. You can start smoothly by setting everything to Audit mode and switch to Deny once ready. Azure Policy also allows you to whitelist known registries to make sure images cannot be pulled from everywhere.

9. Defender for Containers

Microsoft recently merged Defender for Registries and Defender for Kubernetes into Defender for Containers. There is a little bit of overlap with Azure Policy, but Defender also deploys a DaemonSet that checks for real-time threats. All incidents are categorized using the popular MITRE ATT&CK framework.

10. Private API server

This one is an easy one. Make sure to isolate the API server from internet. You can easily do that using Azure Private Link. If you can't do it for some reasons, try to at least restrict access to authorized IP address ranges.

11. Cluster boundaries

11.1 Ingress

11.2 Egress

12. Keep consistence across clusters and across data centers

Published on: October 29, 2022

Learn more

Azure Developer Community Blog articles

Blog image