Landing Zone in Azure – Introduction

I recently renewed my associate administrator certification, and feel it’s a good opportunity to brush up on Azure landing zone.

The lame part of this is the semantics. I found many similar terms across cloud service provider (CSPs). In the context of Azure, it makes sense to clarify the terms again for Cloud Adoption Framework (CAF) and Cloud Operating Models.

Cloud Adoption Framework (CAF)

Similar to AWS Cloud Adoption Framework (CAF), Azure also has the concept of CAF and it means the same thing. This part may feel lofty, but it’s in fact foundational. To get started on the cloud there are thousand ways to configure the foundation (right or wrong). The adopter needs CAF to navigate through the offerings and define what they can achieve.

The CAF documentation is good although length. The most “beefy” part is Ready section.

Cloud Operating Models

Every cloud company has some narrative about cloud operating model. For example, Here’s Hashicorp’s definition, and here’s AWS‘ white paper on it. In the context of Azure, the CAF document gives some guidance on developing your own operating model in alignment with the CAF. In addition, it also gives a few example cloud operating models:

  • Decentralized operations
  • Centralized operations
  • Enterprise operations
  • Distributed operations

There is a comparison table that highlights their differences as well as an accountability chart proposing team divisions. Another insightful table is the one that list out implementation starting point and typical path of iterations for each operating model. The table also suggests that Azure Landing Zone includes two implementation options: starting small and CAF enterprise-scale.

Landing Zone at High Level

Followed by Cloud Operating Model is the design and implementation of Azure Landing Zone. There are currently eight design areas:

  • Billing and Active Directory tenant: including Azure AD tenant
  • Identity and Access Management: including hybrid identity
  • Network Topology and Connectivity
  • Resource Organization: different levels of resource containers
  • Security
  • Management
  • Governance
  • Platform automation and DevOps

Out of the many design areas, I fell short off IAM and Network so I’ll try to discuss them in more details below in the next section. As for resource organization, apart from Resource Group and Subscription, it is also important to understand management group.

Most cloud engineers work with subscription and resource group. That is where a lot are going on. For enterprises however, Azure has to address the requirement for the capability of top-down enforcement. Management Group provides a governance scope above subscriptions, provided that all subscriptions trust a single Azure AD account. Management groups may form a hierarchy of up to six levels to help you configure policies and access, so that the all the subscriptions under each management group have unified policy and access configuration. At the very top is root management group. Any assignment of user access or policy on the root management group applies to all resources within the directory. Because of this, all customers should evaluate the need to have items defined on this scope.

We can apply policy guardrails (e.g Azure Policy) at management group level so that the policies are effect across subscriptions. Azure Policy can also address operational compliance considerations by monitoring configuration drift.

Identity and Access Management

First, we really need to distinguish AD DS on Windows Server, Azure AD and Azure AD DS. In an old post, I discussed what is a Windows domain, the key role of a domain controller (to manage user identity, as well computer identity), and the fact that Active Directory is a complete redesign of Windows Domain system since Windows 2000. So we can start with AD DS on Windows Server:

  • AD DS on Windows Server: In the good old days, some common network administrative activities were to configure Active Directory (including the X.500 compatible database, the OUs, domains and forests) on Windows Servers, joining computers to the company’s domain, configure group policy, configure LDAP and Kerberos, upgrading Domain controllers, etc. Over the years, Microsoft moved these activities to the cloud and offer them as a managed service, known as Azure AD DS.
  • Azure Active Directory Domain Service (Azure AD DS): allows you to use managed domain services (e.g. Windows Domain Join, group policy, LDAP, Kerberos authentication) without having to deploy, manage or patch domain controllers. It is a SaaS offering to manage your domain controllers in the cloud, with a pay-as-you-go model. The counterpart in AWS is “AWS Directory Service” which lets you run Microsoft Active Directory (AD) as an AWS managed service.

In summary, both AD DS on Windows Server (self-hosted) and Azure AD DS (managed service) are identity stores that operates on Windows domains. Even though the latter is a managed service, it supports LDAP or Kerberos as integration protocol for third party applications (usually on-premis) to use. Both LDAP and Kerberos came around prior to the cloud era and they are not optimized for cloud connectivity. For example, insecure bind (on port 389) in LDAP is still prevalent. Kerberos is fairly complex to configure. However, they are not phased out right away because of their established presence as well as the domain’s awareness to authenticate devices.

Many organizations have to keep their domain service and when they move to cloud so they still have to use Active Directory as identity store. For this, Azure has Azure AD connect. On the AWS side, there is also an AD connector tool to allow on-prem users to log into AWS applications and services. With AD connector you can also join EC2 instance to existing AD domain.

Now let’s examine Azure AD.

  • Azure AD: is an IAM solution. It contains an identity store (with users and groups in a flat directory structure) but more importantly it integrates with external identity stores (including Domain Service, self-hosted or SaaS managed), which gives it hybrid-identity capability. A company can even sync their own on-prem identity store to Azure AD using Azure AD Connect. As an IAM solution, Azure AD also allows a company to tie their identity store to applications using modern protocols such as SAML and OAuth. Azure AD treats applications as objects, and they can represents either Microsoft Applications (Office 365, Dynamics 365, Azure) or third-party ones (Slack, Salesforce) as long as they use the supported protocol for SSO. The closest AWS counterpart of Azure AD is Amazon Cognito (arguably), even though their capabilities are not identical in every aspect.

Compared to Domain Service, Azure AD alone doesn’t have the concept of domain. Therefore you cannot join a server or PC to a domain and configure group policy. Azure AD’s native identity store is a flat directory structure without OUs or forests. Azure AD is NOT a replacement of domain service, either self-hosted or managed.

Now coming back to the Azure landing zone literature, the document lays out the key decision to make about identity:

A critical design decision for enterprise organizations adopting Azure is whether to extend current on-premises identity domains into Azure or to create new identity domains.

Azure Active Directory (Azure AD) and hybrid identity

The document even includes a comprehensive identity decision guide. After this decision, we’ll know what identity store to use. Then we can address the problem of platform access vs workload access. In other words, IAM of management traffic vs business traffic, which opens up topics such as RBAC, service principle and managed identities.

Networking

Back in 2017, Azure published a white paper about V-Net and it focuses on mesh network and hub-and-spoke. Back then Azure customers run multiple lines of business (LOB) on different V-Nets. The V-Net peering feature allows early cloud adopters to organize all their V-Nets in a mesh topology, ensuring all peers have access to all other peers, or a hub-and-spoke topology to aggregate shared resources in hubs so they can be shared by the spokes in the network.

When setting up a landing zone, network topology is a big decision. In the landing zone document today, clients need to consider the followings:

  • Traditional Azure networking topologies, including:
    • large flat V-Net
    • multiple V-Nets connected with multiple Azure ExpressRoute circuits/connections
    • hub-and-spoke
    • full mesh
    • hybrid
  • Microsoft managed networking topology (on top of Virtual WAN)

From the 2017 white paper, most organizations at that time solve their need for network isolation and connectivity by creating a mesh architecture among various V-Nets. All nodes in the network are interconnected so network traffic is fast and can be easily redirected. However, mesh topology has significant disadvantages because it requires too many connections as the footprint expands, making it very costly to operate and quick to reach limit of number of peering links. It is not scalable. The white paper is to advocate the use of hub-and-spoke topology, which I will discuss in the next section.

It is worth-noting that, today (Jan 2023) one can create both topologies with Azure Virtual Network Manager. It is currently a preview service but I can foresee it will eventually get integrated with landing zone.

No matter which topology, another issue to address is connectivity to on-prem network, and to Azure PaaS services. If the traffic is light, we can use VPN gateway to configure IPSec tunnel that goes over public internet encrypted. It is simple to configure with a good aggregate bandwidth. This connection requires a VPN device on premise as well. A faster alternative is Azure ExpressRoute, which runs a private connection with a third-party connectivity provider. ExpressRoute is more complex and expensive to set up, but it supports much higher bandwidth with direct access and better SLA. In reality, many clients configures ExpressRoute with VPN failover for connectivity to on-prem network. For connectivity to PaaS services, options are service endpoint and private link endpoint.

Hub-and-spoke topology

In the traditional topologies, hub-and-spoke network topology is popular as the hub network provides a central point of management. Also it overcomes subscription limits and institutes a separation of concerns. The Azure documentation recommends hub-and-spoke architecture for larger cloud adoption efforts. If the footprint is even massive, we can even extend the model to a cluster of hubs and spokes.

A cluster of multiple hub-and-spoke

We can connect multiple hubs using:

  • V-Net peering
  • Azure ExpressRoute
  • Azure Virtual WAN
  • Site-to-site VPN

Within a single hub-and-spoke model, the Hub V-Net hosts shared services and acts as central point of connectivity (to many spoke V-Nets). Often in the Hub V-Net are Azure Bastion, Azure Firewall and VPN Gateway or ExpressRoute gateway. The spoke V-Nets (in same or different subscriptions) isolates and manage workloads in prod, non-prod, etc. Since a single V-Net cannot traverse subscription boundaries, you have to use V-Net peering (preferred), ExpressRoute circuit, or VPN Gateways. V-Net peering works across regions, and across Azure AD tenants. It is low-latency but isn’t transitive.

In some cases we also configure perimeter networks (aka DMZs) in the hub-and-spoke architecture, to handle external traffic. Perimeter networks host services such as External Load balancer, Azure Firewall, Azure Application Firewall on Azure Application Gateway or on Azure FrontDoor) , network virtual appliances (NVAs), IDS, IPS, and other security appliances. Incoming packets flow through the security appliances before reaching back-end servers. Internet-bound packets from workloads must also flow through security appliances in the perimeter network before they can leave the network. The document gives an example of a DMZ hub V-Net with two perimeter networks.

Virtual WANs

This page discusses what is WAN and SD-WAN. WAN connects multiple LANs in different geographic areas and is common with companies with multiple offices in different regions. WAN infrastructure may be privately owned or leased as a service from a third-party service provider (hybrid WAN). Companies may use IPSec VPN, SSL VPN or direct connection to build their WANs. Software-defined WAN (SD-WAN) leverages virtualization technologies, network overlays, on-site SD-WAN devices and software platforms to build hybrid WANs.

Azure Virtual WAN (similar to AWS cloud WAN) is a managed service to build a virtual WAN with a single operational interface that brings many networking, security and routing functionalities together. It simplifies end-to-end network connectivity (within Azure, between Azure and on-prem) by creating a hub-and-spoke architecture.

Virtual WAN

Virtual WAN is essentially an integrated connectivity solutions (in hub and spoke), with a global transit network architecture. The configurations, including spoke setup) is automated and troubleshooting is more intuitive. Global transit network configures multiple virtual WAN hubs with hub-to-hub connectivity, which ultimately enables any-to-any connectivity, with different paths discussed here.

The landing zone document recommends Virtual WAN for new large or global network deployments in Azure where you need global transit connectivity across Azure regions and on-premises locations.

Summary

Landing Zone configuration involves many components and there is no way to discuss everything thoroughly. In this post I put down my notes reading Azure landing zone documentation. Overall, working on landing zones requires learning a variety of services by the CSP.