Kubernetes Operator

Kubernetes has a number of tools to automate the deployment of a single workload. In previous posts, we had covered Helm and Kustomize. What are left unresolved is how to maintain the status of workload after deployment is completed. In this post, I will give an introduction to Kubernetes Operator. Compared with Helm (templating approach) and Kustomize (patching approach), Kubernetes Operator follows the operator pattern. Operators are usually provided by the developer of the application.

Operator Pattern

In Kubernetes, we know that a controller takes care of routine tasks to ensure that desired state expressed by Kubernetes resource types matches the current state. One example is that the Deployment controller ensures the number of pods running matches the amount specified in the replica field. Controller is the key to ensure that resources can be managed by declarative manifests for Kubernetes resources.

This controller pattern is fairly successful in what it does and we can extend the use of it. Beyond the built-in resource types, we can create our own custom resource definitions (CRDs), and create controllers that watches for the manifest that declares custom resources (CRs). The controller ensures that the resource status matches their specifications. This is also known as reconciliation, which is implemented as a control loop. Operator pattern can be illustrated in the diagram below:

Operator Design Pattern
Operator Pattern

Technically, there is no difference between a controller and an operator. What makes an Operator (used to install workload) different than a native Kubernetes controller, are two things. First, an Operator usually needs CRDs because the built-in resource types are insufficient. Second, the operator reflects the domain knowledge to keep the target workload running. For example, stateful workloads such as database needs their operational steps executed in certain orders.

On Operator Pattern, CNCF published a whitepaper with a deeper review. This white paper is the best reference for a good understanding of the Operator Pattern.

Operator Usage

Like Artifact Hub to Helm, OperatorHub is a public registry of most used Kubernetes Operators. In this section, we will take an example of using Operators. We will install Kiali as an add-on to Istio using Kiali CR and operator, which also depends on Prometheus to be installed using Prometheus Operator first. Note that the Kiali installation outlined in this section is not the the quick-start install manifests from Istio’s sample directory. For Kiali on production system we have to customize the installation.

Suppose we have installed Istio, we can then install Prometheus operator using Helm. The Prometheus operator will install Prometheus. Then we use Helm again to install Kiali operator. The Kiali operator will watch for creation of Kiali CRD, to deploy services:

$ helm install -f prometheus-values.yaml --namespace istio-system --repo https://prometheus-community.github.io/helm-charts --version 13.6.0 istio-prometheus prometheus --insecure-skip-tls-verify
$ helm install -f kiali-operator-values.yaml --namespace kiali-operator --repo https://kiali.org/helm-charts --version 1.45.0 kiali-op kiali-operator --create-namespace
$ kubectl apply -f kiali-cr.yaml

I include example content for each file in the commands above on Github gist (prometheus-values.yalm, kiali-operator-values.yaml and kiali-cr.yaml). For more options for installing Kiali, refer to their documentation.

I use this example to install Kiali and it includes two Operators, the Prometheus Operator and the Kiali Operator. The Prometheus Operator is one of the first ever written Kubernetes Operator. As soon as the operator is deployed, it starts to deploy the operator service. For the Kiali operator, we need to deploy Kiali CR after the Kiali Operator has been deployed. Both are valid patterns.

Operator Development

Operator is powerful. However, authoring an Operator is not a trivial effort. One usually start with a framework. A framework creates a body of boiler plate code that has the pattern implemented and allows developers to enrich the functions following the pattern. The white paper introduced three frameworks:

  • CNCF Operator Framework – aims at Operator Developers with an SDK, a scaffolding tool and a test harness. It currently supports three project types: Golang, Helm and Ansible. CNCF Operator framework consists of SDK and OLM.
  • Kopf (Kubernetes Operator Pythonic Framework) – an easy-to-use framework in Python that abstracts away most of the low-level Kubernetes API communications hassle.
  • kubebuilder – helps build a Manager similar to the native kube-controller-manager. For difference with OperatorSDK, read here.
  • Metacontroller: lightweight Kubernetes Controller as a Service

In CNCF Operator Framework, the Operator SDK supports development using Ansible, Helm and Golang. The author of this post makes a general comparison as follows:

Type Best use caseUnderlying technologyAmt of Effort
HelmStateless workloadHelm ChartsMed
AnsibleStateless workloadAnsible Roles and PlaybooksMed
GolangStateful workloadCode developed in GolangHigh

The aforementioned Kiali operator is an example of Operator developed in Ansible. The prometheus operator, is developed in Golang as the workload can be stateful depending on configuration. One needs to know how to develop operator in Golang in order to tackle the most complicated situations. This is requires some serious development effort. The documentation with a quick start section is available here. Even that is not very straightforward. RedHat, the maintainer of the CNCF Operator framework has a good blog post on how to develop an Operator in Golang.

The example requires some development knowledge to go through. On my MacOS (Intel) I have to configure the following prerequisites:

  1. Install gcc, using command: xcode-select –install
  2. Install the right version of golang. You can find the version here. The MacOS has a version of golang installed already so I had to install version 1.17 and link to it: brew install go@1.17 && brew link –force go@1.17
  3. Install operator-sdk with home brew: brew install operator-sdk
  4. When you run “operator-sdk version”, ensure the result shows a golang version that matches your installation.
  5. If you need to push docker image, also connect to docker registry by running: docker login

Then we can create our working directory, initialize the repository and create boilerplate code (scaffolding) with these commands:

$ mkdir wordpress-operator && cd wordpress-operator
$ operator-sdk init --domain digihunch.com --repo github.com/digihunch/wordpress-operator
$ operator-sdk create api --group wordpress --version v1 --kind WordPress --resource --controller

With the repo initialized, we can go to the section “Defining the API” and “Implementing the Controller”. The blog post does not cover every code editing needed to bring up wordpress. You are supposed to go to the author’s repository to fit the changes into your own repo. The author’s repo has a few more controllers such as common.go and mysql.go.

At the end of the lab, you should be able to run the controller and bring up wordpress. I used my own repository for this lab and have made the code changes for this lap in a couple commits. To test locally with the code:

$ git clone git@github.com:digihunch/wordpress-operator.git
$ cd wordpress-operator
$ make install run

Then we can validate wordpress install from a new terminal as the instruction shows:

$ kubectl create -f config/samples/wordpress_v1_wordpress.yaml
$ minikube service wordpress --url

For Developers that requires more details, RedHat has an eBook for Kubernetes Operators, in supplement to the documentation. As DevOps professional, I’m mainly concerned with understanding how Operator works and using Operators correctly.

Too many Tools?

Now we seem to have too many choice of tools when it comes to deploying workload on Kubernetes. Kustomize and Helm can deploy simple workloads. Operator can deploy stateful workloads, as well as keep the workload status in check. Further, we have FluxCD and ArgoCD based on GitOps workflow.

When assessing a tool, we should think about the complexity of the workload deployed. If it is a single stateless workload, Kustomize or Helm should be sufficient. If it is stateless, we can consider using Helm charts developed by others. For multiple workloads, we can build our own parent chart to combine existing charts created by others.

If we want our deployment to be fully declarative, then we will follow the Operator pattern by using a Kubernetes Operator. When we have many workloads of different levels of complexity, we can combine them with GitOps tool. Operator is one of the underlying technologies behind GitOps.

Workload profileJust InstallationInstallation and Maintain Status
Single stateless workloadHelm or KustomizeOperator (using Ansible or Helm)
Single stateful workloadHelm or KustomizeOperator (using Golang)
Multiple workloadsHelm (e.g. build parent chart)GitOps in combination with Operator, Helm and Kustomize

The table above helps refine deployment requirement. It’s not a recommendation, but rather a model of analyzing deployment requirement.