AKS Lessons Learned 2 of 2

Even though Azure Kubernetes Service (AKS) is a managed service, building a cluster is not trivial. For help resources, I would start with the webinar “Configure Your AKS cluster with Confidence” from April 2021, which focuses on a set of working best practices (convention over configuration) but obviously not every recommendation suits every use case. For a deeper technical tour, the John Savill’s Technical Training channel has good videos (from 2020) on AKS overview, high availability and networking. Lastly, there is also an AKS checklist to remind you of the implementation details to consider.

All the references aside, I need to write down some gotchas from my implementation experience in the last two month.

Identity and Access Management

AKS is a special type of Azure resource in the sense that it manages other Azure services on user’s behalf. Therefore the access management needs to consider several aspects:

Access TypeMechanisms involvedExample
User access Kubernetes APIAzure AD, Azure RBAC and Kubernetes RBAC.
– Azure AD is for authentication
– Azure RBAC for Kubernetes
– Kubernetes RBAC
A user connects to Kube-API server using kubectl
AKS access other Azure resourceThere are several identities that represents different components of AKS. For example, the AKS cluster, the node agent pool, and each add-on.
The AKS cluster can be represented as a service principal, or managed identity (system assigned or user assigned).
The node agent pool can be represented as a managed identity
AKS cluster connects to a VNet in a different resource group. (requiring cluster’s identity to have network contributor role on the target network resource group)
AKS node agent pulls images from ACR (requiring the node agent pool’s identity to have ArcPull role on the target ACR)
Pod access other Azure resourceAAD-Pod Managed IdentityBusiness workload connects to managed database service such as PostgreSQL on Azure.
Pod access Kubernetes APIAccess Kubernetes API using Service Account. This issue is solved completely by Kubernetes native mechanisms. Roles and ClusterRoles defines permissions. RoleBindings and ClusterRoleBindings associates Service Accounts with permissionsWorkload access ConfigMap, Secret etc.

In the first access type, for RBAC with user to access Kubernetes API, there is an overlap between Azure RBAC and Kubnernetes RBAC. Azure RBAC has four built-in roles and three of them (reader, writer, admin) are namespaced. When you use Azure CLI to manage to assign one of those roles, the rolebinding and cluster rolebinding record stored in etcd will be updated accordingly.

RBAC mechansimUse case
Azure RBAC for KubernetesManage RBAC programmatically using Azure CLI, or infrastructure as code
Kubernetes RBACManage RBAC declaratively with more granularity for all types of Kubernetes resources including CRD

For ease of operation it is advised to use Kubernetes RBAC whenever possible. Azure RBAC is still used for RBAC at the level of Azure resource but not at the level of Kubernetes resource.

In the the second access type, AKS cluster may use managed identity or service principal. Azure’s recommendation is managed identity over service principal. Managed Identity is a wrapper around Service Principal with less overhead. Managed Identity can be system assigned (created at the time of cluster creation), or user assigned (can be created ahead of time by Azure administrator and imported to the cluster’s context).

The second access type can be further broken down because there are several components in AKS that uses their own identities. I list the related managed identities as below:

NamePurposeBYO identity with Terraform
cluster identityThis identity represents the clusterSpecify in identity block in kubernetes_cluster resource
agent pool identityThis identity represents kubelet running in the agent poolSpecify in kubelet_identity block.
addon: azurepolicyThis identity represents azure policy addon to access the policyN/A
addon: omsagentThis identity represents OMS agent to access monitoring etcSpecify in oms_agent_identity block
addon: secretThis identity represents to the secret addon, to access AKVSpecify in secret_identity block
addon: ingress gateway This identity represents the ingress application gatewayingress_application_gateway_identity block

By default, the system creates a new managed identity for each of the required identity above. For simplicity with identity management, we may create a managed identity and use it for all the occasions where an identity is needed and user assigned (BYO) identity is supported.

In the “az aks show” command return (a JSON document), the identity section (root level) reports the cluster identity, the identityProfile section (root level) reports the agent pool (kubelet) identity. Other identities such as omsagent, are reported in their own child document.

Node Networking

In Azure, a subnet can span across multiple availability zones. Therefore an AKS cluster can put its nodes on a single subnet with nodes evenly distributed across three AZs for high availability. The AZ of each node is indicated in the node label, and can be displayed with kubectl command.

Within a single AZ, a good practice to minimize latency between nodes is to place the nodes in a proximity placement group (PPG). However, only a single PPG can be associated with a node group. You can’t have three PPGs, one in each AZ, for a single subnet.

Pod Networking

The default Pod networking model is kubenet, which involves overlay network. Pod-to-Pod traffic across nodes requires Network Address Translation (NAT). To overcome this performance tax, Azure introduces Azure CNI which gives each Pod an routable IP address from the VNet’s CIDR. This requires advanced IP planning to prevent IP exhaustion. A risk introduced in Azure CNI is that all Pods are exposed on the V-net, which needs to be protected by Network Security Group and/or outbound firewall.

DNS

On the DNS side, when AKS cluster integrate with an external node network, it may create weird issues that are hard to troubleshoot. Another example is with DNS. If the V-Net uses an external DNS server (which is common for enterprises with hybrid network to use an on-premise DNS server), then the cluster creation failed with time-out with misleading error messages (for example, this comment). This is because the DNS name of the newly created cluster is not resolvable within the V-NET, which points to the on-prem DNS server. The fix to that is:

  1. Use a BYO DNS zone (in Azure) for AKS cluster creation;
  2. The AKS cluster will publish the A-record to the zone. To allow this to happen, the AKS cluster’s managed identity needs to have DNS contributor permission for the zone;
  3. Configure the on-prem DNS for conditional forwarding to the DNS zone

This fix will allow AKS to resolve its name and therefore confirm its own creation. Here is a good blog about the DNS zone options.

Another potential issue introduced with the use of on-prem DNS server, is the resolution of single-label hostname of the nodes. This is not just an issue in the context of AKS. It is a generic issue with VMs running on a V-Net pointing to on-prem DNS, as explained in detail here.

In this situation, we should use the fully qualified hostname instead of single-label hostname. The fully qualified hostname with DNS suffix can help the on-prem server to configure conditional forwarding. For example, when the DNS suffix is *.internal.cloudapp.net, then forward it to Azure’s virtual internal DNS server 168.63.129.16 which can resolve the hostname.

If only the Pods need to resolve those FQDNs, then we can configure Core-DNS with conditional forwarding, which will take effect only at the cluster level without the need for changing the on-prem DNS. The Core-DNS configuration looks like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom # this is the name of the configmap you can overwrite with your changes
  namespace: kube-system
data:
  cloudapp.override: | # you may select any name here, but it must end with the .override file extension
    log
    rewrite continue {
      name regex ^(.*[0-9]{7}-vmss[0-9]{6})$ {1}.internal.cloudapp.net
      answer name ^(.*)\.internal\.cloudapp\.net$ {1}
    }
    forward internal.cloudapp.net 168.63.129.16
  cloudapp.server: |
    internal.cloudapp.net:53 {
      errors
      log
      cache 10
      forward . 168.63.129.16
    }

Alternatively, use Pod DNS policy so that the Pod can use an external DNS server.

Initial Service Account

When a cluster is created, an Azure AD user or group can be assigned as cluster administrator. For a CI/CD pipeline to interact with the newly created cluster, a service account in Kubernetes is needed. Suppose we use Terraform to create the AKS cluster, we can create such service account automatically with the Kubernetes provider. This requires that the Terraform execution environment to have network access to the cluster. If the AKS cluster is located in a private network, then the agent where Terraform CLI runs should also be on the network. Alternatively, use Terraform Enterprise hosted in an environment with access to the cluster’s network.

Integration with Azure KeyVault

Azure Key Vault can store several types of secrets, key value pair, X509 keys and certificate. When AKV is integrated with an AKS cluster, the Kubernetes workload can access the secrets as mounted volumes, using CRD named SecretProviderClass. Further, they can be presented as Kubernetes Secret, using a Pod to sync between mounted content and Secret. AKV has three types of entries: key, certificate and secret (key-value). The certificate entry requires both key and certificate are stored, with optional certificate chain. In my opinion this is an over design. Unless we need Azure to manage the certificate (e.g. rotation) I would simply use secret to store my own X509 key and certificate.