Kubernetes Operator

Kubernetes has a number of tools to automate the deployment of a single workload. In previous posts, we had covered Helm and Kustomize. What are left unresolved is how to maintain the status of workload after deployment is completed. In this post, I will give an introduction to Kubernetes Operator. Compared with Helm (templating approach) and Kustomize (patching approach), Kubernetes Operator follows the operator pattern. Operators are usually provided by the developer of the application.

Operator Pattern

In Kubernetes, we know that a controller takes care of routine tasks to ensure that desired state expressed by Kubernetes resource types matches the current state. One example is that the Deployment controller ensures the number of pods running matches the amount specified in the replica field. Controller is the key to ensure that resources can be managed by declarative manifests for Kubernetes resources.

Kubernetes makes use of controller pattern throughout its own design. One of its key component, Controller Manager, is a collection of many controllers. Each controller is in charge of a control loop, responsible for listening the object it manages. Another component, Kube-scheduler, is also a special type of Controller. The kube-scheduler monitors unscheduled Pod and health of nodes and determines the best Node to schedule the new Pod to. Then it writes the decision to etcd store for kubelet to execute.

This controller pattern is fairly successful in what it does and we can extend the use of it. Beyond the built-in resource types, we can create our own custom resource definitions (CRDs), and create controllers that watches for the manifest that declares custom resources (CRs). The controller ensures that the resource status matches their specifications. This is also known as reconciliation, which is implemented as a control loop. Operator pattern can be illustrated in the diagram below:

Operator Design Pattern
Operator Pattern

Technically, there is no difference between a controller and an operator. What makes an Operator (used to install workload) different than a native Kubernetes controller, are two things. First, an Operator usually needs CRDs because the built-in resource types are insufficient. Second, the operator reflects the domain knowledge to keep the target workload running. For example, stateful workloads such as database needs their operational steps executed in certain orders.

On Operator Pattern, CNCF published a whitepaper with a deeper review. This white paper is the best reference for a good understanding of the Operator Pattern.

Custom Resource Definition

The built-in controllers work with built-in objects (pre-defined APIs). Custom operators usually need their own APIs to function. To extend Kubernetes API, we define the schema of these APIs in the form of CRDs (custom resource definitions) using OpenAPIv3 standard. Then, we can declare Custom Resources (CRs) in compliance with the schema. The OpenAPIv3 schema in the CRD resource tells validating web hook (admission control) how to validate the schema when we send an CR in to API server.

When we work with third-party operators, they usually provide CRDs along with the operator implementation. For example, in my operator example project, we have a minimalist CRD WordPress with one property: sqlRootPassword and we can declare a CR as in this example. For a more realistic use case, we can take a look at Kiali CRD. In the next section, we’ll use it along with Kiali operator to install Kiali.

Operator Usage

Like Artifact Hub to Helm, OperatorHub is a public registry of most used Kubernetes Operators. In this section, we will take an example of using Operators. We will install Kiali as an add-on to Istio using Kiali CR and operator, which also depends on Prometheus to be installed using Prometheus Operator first. Note that the Kiali installation outlined in this section is not the the quick-start install manifests from Istio’s sample directory. For Kiali on production system we have to customize the installation.

Suppose we have installed Istio, we can then install Prometheus operator using Helm. The Prometheus operator will install Prometheus. Then we use Helm again to install Kiali operator. The Kiali operator will watch for creation of Kiali CRD, to deploy services:

$ helm install -f prometheus-values.yaml --namespace istio-system --repo https://prometheus-community.github.io/helm-charts --version 13.6.0 istio-prometheus prometheus --insecure-skip-tls-verify
$ helm install -f kiali-operator-values.yaml --namespace kiali-operator --repo https://kiali.org/helm-charts --version 1.45.0 kiali-op kiali-operator --create-namespace
$ kubectl apply -f kiali-cr.yaml

I include example content for each file in the commands above on Github gist (prometheus-values.yalm, kiali-operator-values.yaml and kiali-cr.yaml). For more options for installing Kiali, refer to their documentation.

I use this example to install Kiali and it includes two Operators, the Prometheus Operator and the Kiali Operator. The Prometheus Operator is one of the first ever written Kubernetes Operator. As soon as the operator is deployed, it starts to deploy the operator service. For the Kiali operator, we need to deploy Kiali CR after the Kiali Operator has been deployed. Both are valid patterns.

Operator Development

Operator is powerful. However, authoring an Operator is not a trivial effort. One usually start with a framework. A framework creates a body of boiler plate code that has the pattern implemented and allows developers to enrich the functions following the pattern. The white paper introduced three frameworks:

  • CNCF Operator Framework – aims at Operator Developers with an SDK, a scaffolding tool and a test harness. It currently supports three project types: Golang, Helm and Ansible. CNCF Operator framework consists of SDK and OLM.
  • Kopf (Kubernetes Operator Pythonic Framework) – an easy-to-use framework in Python that abstracts away most of the low-level Kubernetes API communications hassle.
  • kubebuilder – helps build a Manager similar to the native kube-controller-manager. For difference with OperatorSDK, read here.
  • Metacontroller: lightweight Kubernetes Controller as a Service

In CNCF Operator Framework, the Operator SDK supports development using Ansible, Helm and Golang. The author of this post makes a general comparison as follows:

Type Best use caseUnderlying technologyAmt of Effort
HelmStateless workloadHelm ChartsMed
AnsibleStateless workloadAnsible Roles and PlaybooksMed
GolangStateful workloadCode developed in GolangHigh

The aforementioned Kiali operator is an example of Operator developed in Ansible. The prometheus operator, is developed in Golang as the workload can be stateful depending on configuration. One needs to know how to develop operator in Golang in order to tackle the most complicated situations. This is requires some serious development effort. The documentation with a quick start section is available here. Even that is not very straightforward. RedHat, the maintainer of the CNCF Operator framework has a good blog post on how to develop an Operator in Golang.

The example requires some development knowledge to go through. On my MacOS (Intel) I have to configure the following prerequisites:

  1. Install gcc, using command: xcode-select –install
  2. Install the right version of golang. You can find the version here. The MacOS has a version of golang installed already so I had to install version 1.17 and link to it: brew install [email protected] && brew link –force [email protected]
  3. Install operator-sdk with home brew: brew install operator-sdk
  4. When you run “operator-sdk version”, ensure the result shows a golang version that matches your installation.
  5. If you need to push docker image, also connect to docker registry by running: docker login

Then we can create our working directory, initialize the repository and create boilerplate code (scaffolding) with these commands:

$ mkdir wordpress-operator && cd wordpress-operator
$ operator-sdk init --domain digihunch.com --repo github.com/digihunch/wordpress-operator
$ operator-sdk create api --group wordpress --version v1 --kind WordPress --resource --controller

With the repo initialized, we can go to the section “Defining the API” and “Implementing the Controller”. The blog post does not cover every code editing needed to bring up wordpress. You are supposed to go to the author’s repository to fit the changes into your own repo. The author’s repo has a few more controllers such as common.go and mysql.go.

At the end of the lab, you should be able to run the controller and bring up wordpress. I used my own repository for this lab and have made the code changes for this lap in a couple commits. To test locally with the code:

$ git clone [email protected]:digihunch/wordpress-operator.git
$ cd wordpress-operator
$ make install run

Then we can validate wordpress install from a new terminal as the instruction shows:

$ kubectl create -f config/samples/wordpress_v1_wordpress.yaml
$ minikube service wordpress --url

For Developers that requires more details, RedHat has an eBook for Kubernetes Operators, in supplement to the documentation. As DevOps professional, I’m mainly concerned with understanding how Operator works and using Operators correctly.

Too many Tools?

Now we seem to have too many choice of tools when it comes to deploying workload on Kubernetes. Kustomize and Helm can deploy simple workloads. Operator can deploy stateful workloads, as well as keep the workload status in check. Further, we have FluxCD and ArgoCD based on GitOps workflow.

When assessing a tool, we should think about the complexity of the workload deployed. If it is a single stateless workload, Kustomize or Helm should be sufficient. If it is not very simple but still stateless, we can consider using Helm charts developed by the community. For multiple workloads, we can build our own top-level chart to combine existing sub-charts created by the community.

Helm is essentially a package manager. It does not follow controller pattern and therefore will not monitor the current status of deployment. Helm has other limitations compared to Operator. For example, as a templating scheme, it reaches limitation when dealing with complex logic, even with the help of its helper functions. It is also hard to reason through the template code when we have to troubleshoot a deployment. Refer to this blog post for the author’s experience with Helm.

If we want our deployment to be fully declarative and continuous, then we will follow the Operator pattern by using a Kubernetes Operator. When we have many workloads of different levels of complexity, we can combine them with GitOps tool. Operator is one of the underlying technologies behind GitOps.

Workload profileJust InstallationInstallation and Maintain Status
Single stateless workloadHelm or KustomizeOperator (using Ansible or Helm)
Single stateful workloadHelm or KustomizeOperator (using Golang)
Multiple workloadsHelm (e.g. build parent chart)GitOps in combination with Operator, Helm and Kustomize

The table above helps refine deployment requirement. It’s not a recommendation, but rather a model of analyzing deployment requirement.