Virtualization 4 of 4 – Networking

Virtual LAN (VLAN)

Although VLAN emerged before virtualization and is technically not part of virtualization topic. I’d just like to start from here as a refresher. Suppose we have computers from finance department and computers from sales department all connected to a single layer-2 switch. There are at least three problems: 1) too many devices on the same broadcast domain causes traffic congestion; 2) security can be compromised and 3) each department might have several physical locations. We introduce multi-layer switch to address these with two main features: 1) the VLAN feature can map ports to logical networks, so that all hosts are physically connected to a single switch, but logically to their own network (VLAN) 2) the SVI (switch virtual interface) feature allows inter-VLAN routing at layer 3. Such multi-layer switch is sometimes referred to as layer-3 switch.

VLAN is local to a switch and a tag is required in ethernet frame in order to pass VLAN info across switches. This link between switches is called a trunk. IEEE 802.1q (aka dot1q) is the networking standard for VLAN, which standardizes the tagging traffic between switches to tell which traffic belongs to which VLAN. The dot1q trunk (aka dot1q link) provides VLAN IDs fro frames traversing between switches. A trunk can be configured between two switches, or between a switch and a router. Trunking is the process of traversing different VLAN traffic over the trunk. The ports on each switch need to be configured to enable trunking. While Cisco calls such ports trunk port, others call them tagged port. Their function is to add the VLAN tag to ethernet frame. In contrast, regular ports that send and receive frames without VLAN tag are called access port or untagged port. Trunk port carries traffic for multiple VLANS whereas access port carries traffic for a single VLAN. A network device connected to access port has no idea about its VLAN belonging. VLAN creation and management are the responsibility of the switch. Common trunking protocols include VTP (VLAN trunking protocol) and DTP (dynamic trunking protocol)

This video and this video have good explanations on VLAN.

Virtual Extensible LAN (VXLAN)

VXLAN is an overlay protocol. Remember that in the standard TCP/IP stack, you normally encapsulate layer-3 IP datagram into a layer-2 ethernet frame. With the VXLAN encapsulation technique however, layer-2 frames can be encapsulated within layer-4 UDP packet.

VXLAN allows you to stretch layer 2 connection over an intervening layer 3 network. VXLAN tunnel endpoints (VTEPs) are the endpoint device that terminate VXLAN tunnels and it can be either virtual or physical switch ports. It encapsulate VXLAN traffic and de-encapsulate the traffic when it leaves the VXLAN tunnel.

The VXLAN encapsulation includes the followings:

  • Outer Ethernet Header (source and dest MAC for underlay VTEPs)
  • Outer IP header (source and dest IP on underlay network)
  • Outer UDP header (including source and dest ports, 4789 default)
  • VXLAN Header (including VNI)
  • Inner Ethernet Frame (with source and dest MAC for overlay interfaces)

The VNI (VXLAN network identifier, aka VNID) included in the VXLAN header is 24-bit long. It is conceptually similiar to VLAN ID in VLAN but only with 12-bit length.

VXLAN Enapsulation

The VXLAN protocol is documented in RFC7348. Its specification was originally created by VMware, Cisco and Arista. As it became more common in network virtualization (with data centre virtualization, and application containerization) several other players joined the list of contributors and they manufacture switches that support VXLAN. This is a section on VXLAN from the document of Huawei Cloud Engine 5800 switch.

Open vSwitch is an example of a software-based virtual network switch that supports VXLAN overlay networks.

In summary, the main benefits of VXLAN over VLAN are:

  • VXLAN scales up to 16 million logical networks, thanks to the 24-bit length of VNI
  • VXLAN supports layer 2 adjacency across IP networks. A VM belonging to existing layer 2 domain can be created in different data centre (where more computing resources are available), without being constrained by layer 2 boundaries, or being forced to create geographically stretched layer 2 domains (stretched VLAN).

Virtual Machine Networking

For VM to connect to each other, within or across hosts, we need not only vNIC on VM, but also vSwitch to connect vNICs. A vSwitch (aka bridge) is a logically defined layer-2 device that passes frames between vNICs. On the same host, vNICs are directly connected to vSwitch, which is then connected to the physical NIC. Each vSwtich connects a broadcast domain. When we setup vNIC there are three modes:

Bridged networking: VM connect to outside network using host’s physical NIC, which acts as a bridge between vNIC and outside network. The VM is a full participant in the network as if it were a physical computer on the network. i.e. it obtain IP addressing information from a DHCP server on the outside (physical) network. The VM’s IP address is also visible and directly accessible by other computers on the network. bridge networking is common for servers as VMs.

NAT networking: The VM relies on the host to act as NAT device to make outgoing network connection. The IP address of VM is assigned by virtual DHCP server on host. The guest VMs form a private network and computers on the outside network are external. The host translates private IP address into the host’s IP address on the way out, and listens for returning traffic. Outside network sees traffic from VM guest as if it were from the host. This network mode is common when the VMs are mainly used as a client workstation.

Host-only networking: creates a network that is completely contained within the host computer. The vSwtich is the hub of the private network and the physical NIC on the host is not involved. The VM will not have access to the outside network. This mode is useful when the VMs needs to be isolated from outside network, and only need to communicate with peers on the same host.

The difference between NAT networking and host-only networking is the exposure of VM guest to external network. All of these networking modes are available on VMWare fusion, for example.

Advanced virtualization platform such as vSphere usually support multi-hosting. Multiple host can also be configured to form a distributed vSwitch, such as vSphere Distributed Switch, in addition to standard switches.

Docker Networking

I had a brief on Docker network covering three modes. Out of the three modes, single-host bridge network is the equivalent of host-only networking. MacVLAN driver is similar to bridged networking, in the sense that container may connect to external network, using host NIC as a bridge. However, the external network is still bound by physical location. This is when overlay network comes in handy.

CNM and CNI

In the Docker networking, container needs to map its own port to host, of which the port resource is implemented by IP tables, which limits the scale and performance of the solution. Also, those networking modes do not address the problem of multi-host networking. As multi-host networking became a real need for containers, the industry started looking into different solutions. Container project favour a model where networking is decoupled from the container runtime. This also greatly improves application mobility. In this model, networking is handled by a ‘plugin’ or ‘driver’ that manages the network interface, and how the containers are connected to the network. The plugin also assigns the IP address to the container’s network interfaces. In order for this model to succeed, there needs to be a well-defined interface or API between the container runtime and the network plugins.

Docker, the company behind the Docker container runtime, came up with the Container Network Model (CNM). Around the same time, CoreOS, the company responsible for creating the rkt container runtime, came up with the Container Network Interface (CNI). Kubernetes originally seeks to use CNM for its plugins, but they eventually decided to go with CNI. The primary reason was that CNM was still seen as something designed with Docker container runtime in mind and was hard to decouple from it. After this decision, several other open source project also turned to CNI for their container runtimes.

This article expands further into the difference between CNM and CNI.