AKS Lessons Learned 2 of 2

Even though Azure Kubernetes Service (AKS) is a managed service, building a cluster is not trivial. For help resources, I would start with the webinar “Configure Your AKS cluster with Confidence” from April 2021, which focuses on a set of working best practices (convention over configuration) but obviously not every recommendation suits every use case. For a deeper technical tour, the John Savill’s Technical Training channel has good videos (from 2020) on AKS overview, high availability and networking. Lastly, there is also an AKS checklist to remind you of the implementation details to consider.

All the references aside, I need to write down some gotchas from my implementation experience in the last two month.

Identity and Access Management

AKS is a special type of Azure resource in the sense that it manages other Azure services on user’s behalf. Therefore the access management needs to consider several aspects:

Access TypeMechanisms involvedExample
User access Kubernetes resourceAzure AD, Azure RBAC and Kubernetes RBAC.
– Azure AD is for authentication
– Azure RBAC is for high level authorization (e.g. admin, reader, writer)
– Kubernetes RBAC is used for low level authorization (roles, clusterroles, rolebindings, clusterrolebindings)
A user connects to Kube-API server using kubectl
AKS access other Azure resourceThere are several identities that represents different components of AKS. For example, the AKS cluster, the node agent pool, and each add-on.
The AKS cluster can be represented as a service principal, or managed identity (system assigned or user assigned).
The node agent pool can be represented as a managed identity
AKS cluster connects to a VNet in a different resource group. (requiring cluster’s identity to have network contributor role on the target network resource group)
AKS node agent pulls images from ACR (requiring the node agent pool’s identity to have ArcPull role on the target ACR)
Pod access other Azure resourceAAD-Pod Managed IdentityBusiness workload connects to managed database service such as PostgreSQL on Azure.
Pod access Kubernetes resourceAccess Kubernetes API using Service Account. This issue is solved completely by Kubernetes native mechanisms. Roles and ClusterRoles defines permissions. RoleBindings and ClusterRoleBindings associates Service Accounts with permissionsWorkload access ConfigMap, Secret etc.

AKS cluster may use managed identity or service principal. Azure’s recommendation is managed identity over service principal. Managed Identity is a wrapper around Service Principal with less overhead. Managed Identity can be system assigned (created at the time of cluster creation), or user assigned (can be created ahead of time by Azure administrator and imported to the cluster’s context)


In Azure, a subnet can span across multiple availability zones. Therefore an AKS cluster can put its nodes on a single subnet with nodes evenly distributed across three AZs for high availability. The AZ of each node is indicated in the node label, and can be displayed with kubectl command.

Within a single AZ, a good practice to minimize latency between nodes is to place the nodes in a proximity placement group (PPG). However, only a single PPG can be associated with a node group. You can’t have three PPGs, one in each AZ, for a single subnet.

The AKS cluster will have a DNS name. By default, the A-record for the DNS name of the AKS cluster is published on a newly created DNS zone in the VNet. We can also tell it to post the A-record to a designated DNS zone (BYO DNS zone), with network contributor role on the zone. The VNet must be able to resolve DNS name for Azure cloud API to interact with the cluster during cluster creation. If the VNet uses an external DNS server, the DNS server must be able to forward DNS query to the DNS zone with the A record for AKS cluster.

Initial Service Account

When a cluster is created, an Azure AD user or group can be assigned as cluster administrator. For a CI/CD pipeline to interact with the newly created cluster, a service account in Kubernetes is needed. Suppose we use Terraform to create the AKS cluster, we can create such service account automatically with the Kubernetes provider. This requires that the Terraform execution environment to have network access to the cluster. If the AKS cluster is located in a private network, then the agent where Terraform CLI runs should also be on the network. Alternatively, use Terraform Enterprise hosted in an environment with access to the cluster’s network.

Integration with Azure KeyVault

Azure Key Vault can store several types of secrets, key value pair, X509 keys and certificate. When AKV is integrated with an AKS cluster, the Kubernetes workload can access the secrets as mounted volumes, using CRD named SecretProviderClass. Further, they can be presented as Kubernetes Secret, using a Pod to sync between mounted content and Secret.