Knative Serving Introduction


As per IBM‘s definition, Knative enables serverless workloads to run on Kubernetes clusters, and makes building and orchestrating containers with Kubernetes faster and easier. It has drawn a lot of attention recently. It released version 1.0 in November 2021, and was accepted as a CNCF incubating project in March 2022.

Glories aside, the value it delivers is the ability to go serverless on a Kubernetes platform. Originally, Knative was built with three components: build, serving and eventing. The build component was deprecated in favour of the Tekton project. Tekton is cloud native CI/CD pipeline. It connects to source code repo and build artifacts. It can also deploy applications with pipeline as code. Tekton’s documentation includes a quality tutorial here linked to interactive terminal. On the other hand, we’ll focus on Knative Serving in this post, and Knative Eventing in the next.

Knative can be seen as a serverless framework with a number of open-source technologies as building blocks, such as Istio for ingress gateway, Kafka or Google Pub/Sub as event-streaming engine, and Prometheus for observability to name a few. Despite of using the same CLI tool (named kn), Knative Serving and Eventing are considered separate capabilities using different groups of CRDs. They are installed separately, using their respective YAML files, or operator.

Serving vs Eventing ?

After reading the article “Did we market Knative wrong” from Ahmet Balkan, my impression is the two are not related. They could have been two separate projects without sharing a common name. As the author puts, these two shared some core logics. But beyond that, they don’t have anything in common.

Serving is lightweight and requires Istio. It provides the capability to scale from N to 0 when the system is idle, and from 0 to N when requests come in. It is the missing serving layer for running microservices on Kubernetes and Ahmet position it as the first thing that people installed after creating a cluster.

The target user of Eventing is much smaller. Eventing is for event-driven architecture and is more complex than Serving. Ahmet admits that they over-estimated how many people on the planet want to build a Heroku-like PaaS layer on top of Knative. There are a couple of dozen companies who would work through the complexity and build their own Kubernetes-based internal PaaS or even public-facing FaaS using Knative. They are the niche audience of Knative Eventing.

For most platform builders who just need to run micro-service, Knative Serving is all they need.

A Demo on Serving

The definition of term “serverless” is very loose. Instead, Knative documentation promotes the ability to scale to zero and refer to it as “some people call this Serverless”. This suggests that “the ability to scale to zero” should be the better term to describe this capability. To deploy it, we need the Service object with apiVersion Below is a guide for a quick hands-on, with some steps modified from the Knative Serving tutorial and installation guide, with optional steps (extensions, DNS) skipped.

Follow the first part of this post or this guide in real-quicK-cluster project, to install minikube with metal LB and configure istio (using istioctl as per instruction). Then Knative serving can be installed with two YAML manifests:

$ kubectl apply -f
$ kubectl apply -f

Then we can install Knative istio controller to integrate them so we can see the knative gateways:

$ kubectl apply -f
$ kubectl -n knative-serving get gateway

Now we can install the dummy service, either using kn CLI tool or using YAML manifest. The manifest is given below:

kind: Service
  name: hello
  namespace: default
      name: hello-world
      annotations: "" "rps" "50" "0" "10"
      containerConcurrency: 0
      - env:
        - name: TARGET
          value: World
        name: user-container
        - containerPort: 8080
          protocol: TCP

We can notice that in the annotations there are some settings to overwrite the default configuration. They are explained in the autoscaling section of the Knative serving documentation. Here we use request per second (rps) as metric, and have 50 as target. The scale range is between 3 and 10 replicas. Once the resource is created, we can validate it with one of the two commands:

$ kubectl get
$ kn service describe hello

The creation of knative service also created a number of Kubernetes resources, such as deployment and services:

$ kubectl get svc
NAME                  TYPE           CLUSTER-IP       EXTERNAL-IP                                            PORT(S)                                      AGE
hello                 ExternalName   <none>           knative-local-gateway.istio-system.svc.cluster.local   80/TCP                                       52s
hello-world           ClusterIP   <none>                                                 80/TCP                                       108s
hello-world-private   ClusterIP     <none>                                                 80/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP   108s
kubernetes            ClusterIP        <none>
$ kubectl get deploy
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
hello-world-deployment   1/1     1            1           71s

Before testing, let’s mock the DNS entry to resolve to the Istio ingress service IP like below in /etc/hosts:

Now we can confirm that the service returns the expected result (after a brief pause):

$ curl
Hello World!

Wait for a short period without curl or connection from browser, the service should scale down to zero:

$ kubectl get deploy -l -w
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
hello-world-deployment   0/0     0            0           78s

From a separate terminal, emulate HTTP request with hey command, so the service receives more than 50 rps per pod:

$ hey -n 200 -c 20 -z 20s -m GET

We can watch the deployment get scaled up to 10. Once the hey command completes its task, the deployment will scale back down:

$ kubectl get deploy -l -w
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
hello-world-deployment   0/0     0            0           78s
hello-world-deployment   0/1     0            0           3m54s
hello-world-deployment   0/1     0            0           3m54s
hello-world-deployment   0/1     0            0           3m54s
hello-world-deployment   0/1     1            0           3m54s
hello-world-deployment   1/1     1            1           4m2s
hello-world-deployment   1/10    1            1           4m4s
hello-world-deployment   1/10    1            1           4m4s
hello-world-deployment   1/10    1            1           4m4s
hello-world-deployment   1/10    10           1           4m5s
hello-world-deployment   2/10    10           2           4m9s
hello-world-deployment   3/10    10           3           4m10s
hello-world-deployment   4/10    10           4           4m10s
hello-world-deployment   7/10    10           7           4m10s
hello-world-deployment   9/10    10           9           4m10s
hello-world-deployment   10/10   10           10          4m11s
hello-world-deployment   10/5    10           10          5m18s
hello-world-deployment   10/5    10           10          5m18s
hello-world-deployment   5/5     5            5           5m18s
hello-world-deployment   5/2     5            5           5m20s
hello-world-deployment   5/2     5            5           5m20s
hello-world-deployment   2/2     2            2           5m20s
hello-world-deployment   2/1     2            2           5m22s
hello-world-deployment   2/1     2            2           5m22s
hello-world-deployment   1/1     1            1           5m22s
hello-world-deployment   1/0     1            1           5m52s
hello-world-deployment   1/0     1            1           5m52s
hello-world-deployment   0/0     0            0           5m52s

Autoscaling in Knative serving is backed by KPA (Knative Pod Autoscaler, used in the example above) or HPA. Also note that the trigger to scale up and down is client connection (RPS). This is different than KEDA. KEDA also supports the capability to scale to zero. However, with KEDA, it is an event that scales a deployment up from zero. In Knative serving, it is the connection to the Service itself that wakes up the service.

Cost of scale-to-zero

As discussed, both KEDA and Knative serving supports the ability to scale to zero and wake up from zero. The trigger to wake up from zero is different. With KEDA, workload wakes up by an event. With Knative Serving, workload has to wake up upon receiving a connection to the service. In the design to solve this problem, Knative has four sub-components :

  • Activator – When a service is scaled to zero, its request are routed to the activator which waits for a Pod to become ready (wake up) and proxies the traffic while editing the underlying Endpoints to route the traffic directly to the Pod(s)
  • Autoscaler – KPA
  • Controller – The main component responsible for watching API objects for Knative CRDs (KService, Configuration, Route, Revision) and manage their lifecycle, create the underlying Kubernetes resources and garbage-collect old objects.
  • Webhook – a Kubernetes Admission Webhook acting both as validating admission controller and mutating admission controller.

The key player is activator, which receives requests when a service is scaled to zero. The wake-up process is summarized as follows:

  • After receiving request, activator will buffer (hold onto) the request
  • Then, it will look at request’s hostname to find which KService it is for.
  • Then, it will scale up the Kubernetes Deployment and wait for a Pod to become ready.
  • In the meanwhile, it updates Kubernetes Service to point to Pod IP addresses (so that activator gets out of the network path if the KService is awake).
  • Finally, activator proxies the request to the started pod

So the “plumbing” of the service connection managed by the Activator pod is the cost of scale to zero. This is also called Load Balancing, and the behaviour can be tweaked using two parameters: activator capacity and target burst capacity:

  • Target burst capacity: once the target deploy has one more more Pod, service requests may still route via activator or bypassing the activator. Target burst capacity determines at what point service request should start to bypass activator.
  • Activator capacity: determines how many requests can activator hold on to. Considering the service-to-service connection also routes via Service CRD, there can be a lot of connections via Activator so its capacity needs to be adjusted.

The additional overhead of managing and optimizing the “Load Balancer” is also part of operational cost for the ability to scale to zero.

In comparison, KEDA now has an HTTP add-on still at beta but allows connection-based wake-up. The design is a little different. As discussed in this previous post.


Knative consists of two disparate components: Serving and Eventing. Serving uses KPA to provide scaling based on service request, and the ability to scale to zero. It is often compared with KEDA, a single-purpose lightweight tool for autoscaling. Here is a blurb on their difference by KEDA. The takeaways is that KEDA is more focused on scalability, whereas Knative serving covers more aspects. For example, the Service object of Knative also supports Traffic Splitting. Knative serving can integrate with istio.