Kubernetes with Multiple CPU Architectures 1 of 2 – Container Image

While working on a multi-arch Kubernetes cluster, I came across quite a few issues with image logistics and decided to put these issues in a separate post here. The goal is to supply images with multi-arch support in a standard way. Therefore it is not advisable to rely only on arbitrary image tags to distinguish CPU architecture. If that was the case, each team may use different tag keys and values. The image consumers would have to build custom logics to pull the right image. The standard way is to use Manifest List (Docker’s term), or Image Index (OCI’s term) on the image registry, to announce the multi-platform support of the images in the repository, and to ensure that a container runtime can pick the appropriate index entry that matches the local node’s architecture and platform.

Introducing Image Index

For image index/manifest, there have been three standards, with two currently active:

  • Docker Image Manifest V2, Schema 1: deprecated in 2019
  • Docker Image Manifest V2, Schema 2: supports attribute for platform specific image
  • OCI Image Specification: supports attribute for platform specific image

In 2016, Docker contributed the Docker V2 Image specification as the basis of the OCI image specification. Therefore, the two current active formats (V2.2 and OCI) still look alike today. They are both JSON document and shares many attribute names. However, I take OCI as the standard since it is adopted in CNCF (e.g. containerd, quay.io). Most of the registries (e.g. ECR, ACR and even Docker Hub) support both formats, and you can tell the format by requesting manifest document. When working with Kubernetes we should expect OCI format index even though the Docker Manifest List v2.2 will most likely be compatible.

When examining an image manifest we should look for the following structure:

Manifest list and Manifest, source: https://www.opensourcerers.org/2020/11/16/container-images-multi-architecture-manifests-ids-digests-whats-behind/

In the Docker lingo, a “fat manifest” is simply a list of manifests, equivalent to image index in OCI terms. A manifest document consists of several attributes in sha256 digest, as the diagram shows. There are four types of digests and they should not be mixed:

  • index-sha256: for the index or fat manifest
  • manifest-sha256: for the manifest of a single container image
  • config-sha256: for the config section
  • layer-sha256: for the image layer files

If two images share a layer, then from their respective manifests, we should be able to find a layer with the same sha256 digest. If an index (fat manifest) references other manifest, we should also find that from the sha256 digest. When you reference an image, you should use the sha256 digest for the entire index or manifest list.

To view image index, “docker inspect” is insufficient. It is executed within Docker daemon, against the image pulled based on the CPU architecture of Docker daemon’s server. So let’s look at some tools to check index from remote registry.

Tools to view image index

The most popular tool is Docker’s experimental manifest inspect command. However, it is still not mature. For example, we have to use Docker CLI version 23.0.0 or later with OCI compatibility issue fixed. Prior to v23.0.0 (Feb 2023), Docker CLI were not able to correctly display a list of OCI-format manifests, and it simply says “no such manifest”. Even after 23.0.0, I still find it clunky. For example, it does not display the sha256 digest of the OCI image index itself (the digest on the far left of the diagram above).

To troubleshoot the reason, I borrowed some idea from this post, and have my shell script as below, to check the index for my colorapp image:

#!/bin/sh

ref="${1:-digihunch/colorapp:v0.1}"
sha="${ref#*@}"
if [ "$sha" = "$ref" ]; then
  sha=""
fi
wosha="${ref%%@*}"
repo="${wosha%:*}"
tag="${wosha##*:}"
if [ "$tag" = "$wosha" ]; then
  tag="latest"
fi
apio="application/vnd.oci.image.index.v1+json"
apiol="application/vnd.oci.image.manifest.v1+json"
apid="application/vnd.docker.distribution.manifest.v2+json"
apidl="application/vnd.docker.distribution.manifest.list.v2+json"
token=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:${repo}:pull" \
        | jq -r '.token')
curl -H "Accept: ${apio}" -H "Accept: ${apiol}" -H "Accept: ${apid}" -H "Accept: ${apidl}" \
     -H "Authorization: Bearer $token" \
     -w '\nResponseCode:%{http_code}\nResponseHeader:\n%{header_json}\n' \
     -s "https://registry-1.docker.io/v2/${repo}/manifests/${sha:-$tag}"

It appears that the sha256 digest of the OCI image index itself is provided in the response header, instead of response payload. So docker manifest tool misses the header!

Luckily, there are some alternatives, such as skopeo or manifest-tool. I wasn’t able to get the former to work with OCI index. The latter displays my OCI index in a pretty format and I was able to view a few other indexes that I wasn’t able to with Docker manifest. So I have a good impression of it. So in summary, for the four tools to view image index, my recommendations are:

  • Docker manifest: still glitch as of 2023 but pretty widespread
  • manifest-tool: pretty solid, the output is pretty format, requires manual install
  • skopeo: not straightfoward to use. easy to install
  • Self-scripting in bash: only for troubleshooting

For the rest of this post, I’ll however continue to use docker manifest, given its popularity and availability. When it fails to display an index, I’ll try manifest-tool.

Inspecting Image Index

With docker manifest command (v23.0.1), let’s take a peak at three images:

docker manifest inspect --verbose osimis/orthanc:22.12.2
docker manifest inspect --verbose public.ecr.aws/amazonlinux/amazonlinux:2.0.20230207.0
docker manifest inspect --berbose ubuntu:23.04

In the JSON document return by the first command (osimis/orthanc image), we see the following structure:

{
  "Ref": "docker.io/osimis/orthanc:22.12.2",
  "Descriptor": {
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    "digest": "sha256:20413096878fb56bf8d09af08cf4055993dbcf507526f0561b26fc4d0ed7affc",
    "size": 11227,
    "platform": {
      "architecture": "amd64",
      "os": "linux"
    }
  },
  "Raw": "......",
  "SchemaV2Manifest": {
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    "schemaVersion": 2,
    "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "digest": "sha256:9de20d8a006c6377b85dba9f817d47048982bd0f15fac7daacb64f42060d4b6d",
      "size": 16518
    },
    "layers":[
      {
        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
        "digest": "sha256:025c56f98b679f70b7a54241917e56da7b59ab9d2defecc6ebdb0bf2750484bb",
        "size": 31412852
      },
      ......
      ......
      ......
      {
        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
        "digest": "sha256:a24cf4c71e6977b692bbf517eba7bf6f454d41ceab24c1b2694c3303bc718a1c",
        "size": 174739
      }
    ]
  }
}

We can see that the return is a single manifest. The Descriptor key suggests that it is built for amd64 architecture. The mediaType, along with SchemaV2Manifest and schemaVersion, suggest that it is a Docker Manifest (v2.2) format. Other attributes are summarized here. In the response from the second command(public.ecr.aws/amazonlinux/amazonlinux), we see the structure below:

[
  {
    "Ref": "public.ecr.aws/amazonlinux/amazonlinux:2.0.20230207.0@sha256:260907696498cbf078abc2f3428bf8d19faf77cded5d5459900997a1bc29903d",
    "Descriptor": {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:260907696498cbf078abc2f3428bf8d19faf77cded5d5459900997a1bc29903d",
      "size": 529,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    "Raw": "......",
    "SchemaV2Manifest": {
      "schemaVersion": 2,
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "config": {
        "mediaType": "application/vnd.docker.container.image.v1+json",
        "size": 1478,
        "digest": "sha256:d27c2e45784db13b0b2bc89a52be6661aa1d53bd25c070b41626768c9c563c3d"
      },
      "layers": [
        {
          "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
          "size": 62386320,
          "digest": "sha256:d78505e615251c4f4af6eaa9507b67917d263d23551dcc5a1eed3c012d32a54d"
        }
      ]
    }
  },
  {
    "Ref": "public.ecr.aws/amazonlinux/amazonlinux:2.0.20230207.0@sha256:7fb3183b38e1a9859374a343e72dc43731aeccaf26507da94ebc310067f39fed",
    "Descriptor": {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:7fb3183b38e1a9859374a343e72dc43731aeccaf26507da94ebc310067f39fed",
      "size": 529,
      "platform": {
        "architecture": "arm64",
        "os": "linux",
        "variant": "v8"
      }
    },
    "Raw": "......",
    "SchemaV2Manifest": {
      "schemaVersion": 2,
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "config": {
        "mediaType": "application/vnd.docker.container.image.v1+json",
        "size": 1493,
        "digest": "sha256:a1ea533a0632c6501d7848c7ed481e8fb0398c3277c0d9fddf0b0fdcd5731c09"
      },
      "layers": [
        {
          "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
          "size": 64003805,
          "digest": "sha256:71343c2791199c6e2c19c308cff6493497a02f57e225c11405e1934dc7428b3c"
        }
      ]
    }
  }
]

Comparing this structure with the first return, we noticed that the return is a list of two manifests, each with its own platform architecture. So the amazonlinux image has multi-architecture capability. It can serve as base image for custom images for both architectures. On the “image tags” tab of registry page, we can see a list of published tags. Each is tied to either a single image manifest, or a manifest list:

Each image tag is associated with a single manifest or a manifest list (aka “fat manifest”)

When you click on “image manifest”, you can see both “Image manifest media type” and “Artifact media type” values. When you click on “manifest list”, you see the “Image manifest media type” value, because the “fat manifest” does not point to a single artifact.

Now, let’s review the third command (ubuntu:23.04) response:

[
  {
    "Ref": "docker.io/library/ubuntu:23.04@sha256:52293638ba652a2e8f9e1c1cfcc905839b1f2a9e671ddcc9bf77909b6bf527d0",
    "Descriptor": {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:52293638ba652a2e8f9e1c1cfcc905839b1f2a9e671ddcc9bf77909b6bf527d0",
      "size": 424,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    "Raw": "......",
    "OCIManifest": {
      "schemaVersion": 2,
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "config": {
        "mediaType": "application/vnd.oci.image.config.v1+json",
        "size": 2299,
        "digest": "sha256:beb2152822b716b4deac2996f16bc84db0a14b7cbc549579635590438f9c0e1d"
      },
      "layers": [
        {
          "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
          "size": 26638886,
          "digest": "sha256:db781b8aed497363312ef32499cbfac28821e0494db7f0cadc4e716853e02a12"
        }
      ]
    }
  },
  {
    "Ref": "docker.io/library/ubuntu:23.04@sha256:0c8e3367a3fe9b703c759e1c148c5809df1a2734f8f37529bd11fbcfd34b1d1c",
    "Descriptor": {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:0c8e3367a3fe9b703c759e1c148c5809df1a2734f8f37529bd11fbcfd34b1d1c",
      "size": 424,
      "platform": {
        "architecture": "arm64",
        "os": "linux",
        "variant": "v8"
      }
    },
    "Raw": "......",
    "OCIManifest": {
      "schemaVersion": 2,
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "config": {
        "mediaType": "application/vnd.oci.image.config.v1+json",
        "size": 2316,
        "digest": "sha256:eb2d2fb228861107934403e776544a3f516bc7123a1275d52f1992bada8e94d6"
      },
      "layers": [
        {
          "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
          "size": 25802344,
          "digest": "sha256:29d183ded65aecf549f39ef891c21feb9034b5b10f341533b4af297bb5c60bb8"
        }
      ]
    }
  },
  ......,
  ......,
  ......
]

I readapted the response document for simplicity. Notice that the document is also a “fat manifest” except that the mediaType, OCIManifest and schemaVersion keys suggest that it is an OCI format. The image supports more platforms (combination of OS and CPU architectures). The attributes for OCI index is available here.

Single-platform image Build

At the beginning of the supply chain, we build platform-specific image with CI/CD jobs. Traditionally, the docker build process can only produce images for the platform where the build command run. For the process to work in multiple platforms, we’d need multiple build agents of different platforms. Each agent runs build process and pushes artifact to the registry (with “docker push”). At the end we create a fat manifest that combines the images for all platforms, with “docker manifest create” command. A blog post on Docker from April 2020 refers to this as the “hard way”.

Since then, Docker introduced a new client Docker Buildx, a CLI plugin that extends the docker command with the full support of the features provided by BuildKit builder toolkit. One such feature is the ability to produce multi-platform images in one command run. There is also a push switch that helps you generate manifest behind the scene. In my own testing, I am able to build image for both amd64 and arm64 on MacOS (M1).

Multi-platform image build

Take the colorapp Python application as a simple example, I followed this guide to generate corss-CPU-architecture build. First, we can examine if we already have a build instance that supports our desired platform and whether that is already selected:

$ docker buildx ls
NAME/NODE          DRIVER/ENDPOINT             STATUS  BUILDKIT PLATFORMS
vibrant_hypatia *  docker-container
  vibrant_hypatia0 unix:///var/run/docker.sock running v0.11.3  linux/arm64, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
default            docker
  default          default                     running 20.10.22 linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
desktop-linux      docker
  desktop-linux    desktop-linux               running 20.10.22 linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

In the example above, I’ve already got one named vibrant_hypatia that supports linux/arm64 and linux/amd64 and it is selected. If that’s not the crease, we can create a new build instance and tell Docker to use it. Then

$ docker buildx create --use

Then we can run the actual build command. In the build command we specify the platforms and tags. We also specify –push so we can push the entire binary at the same time.

$ docker buildx build --platform linux/amd64,linux/arm64 --push -t digihunch/colorapp:v0.2 ./colorapp

Note that in this single command, the building for both platforms is completed and pushed with the correct OCI-compliant manifest. There is no need to build them separately and work separately on the manifest. We can use the manifest command from last section to verify the image manifest.

The behaviour of defaulting to OCI-compliant image index is a change in the buildx version 0.10 in Jan 2023. With —provenance=false, we can control manifest format with oci-mediatypes=true. In buildx 0.10, the default value for provenance changed from false to true, which always makes OCI the image manifest format. This change of default, along with the glitch of “docker manifest” prior to v23.0.0, had pretty big impact and raised confusions (such as this bug report from Ubuntu, and the issue in this post). However, since I take OCI-compliant format as the standard, I do not have a problem with this change. We can verify the image manifest with:

$ docker manifest inspect --verbose digihunch/colorapp:v0.1. ## docker CLI version > 23

Platform-specific image

Our Dockefile looks like this, with a base image.

FROM public.ecr.aws/amazonlinux/amazonlinux:2
RUN yum update -y && yum install -y python3 && yum clean all && rm -rf /var/cache/yum
COPY serve.py ./
RUN chmod +x ./serve.py
CMD ["python3", "-u", "./serve.py"]

To build platform-specific images, we need the base image(amazonlinux) to support multi-arc too, so that the build process picked the correct platform specific image as base. We can verify this is the case by looking at the sha256 digest of the first layer of each colorapp variant. They are d78505 for the amd64 image, and 71343c for the arm64 image. We can find the same layer digest from the amazonlinux manifest.

We now have mutli-architecture images, as well as the index. Docker Hub shows both digests with their architecture, although unlike ECR it does not show whether the tag is a manifest list or a manifest:

In real life, our supply chain usually has multiple levels of base images or multiple base images. It is important to ensure platform-specific image are available by examining their manifests.

In the Dockerfile, we consider it a best practice to reference base image by digest instead of by tag. We just need to make sure the digest actually points to an image index (manifest list), with each manifest points to the platform-specific image.

Summary

Now we’ve build the first part of our supply chain with arm64 capability. We have build platform specific images that can use their own platform-specific base images. We’ve also created an OCI-compliant image manifest to announce the image supports multi-architecture. Next, we will examine how containerd on different CPU architectures consumes the platform-specific image.