Service Proxy – from Nginx to Envoy

Update (Nov 20, 2022): 1. Envoy’s configuration schema can be hard to get used to. It is lacking examples because the documentation is mostly generated. Use its examples directory to find real-life configuration examples. 2. the configuration file at the bottom this blog post has been updated. See the current revision here.

Envoy proxy is the underlying technology for Istio, as well as a number of other service mesh products, such as AppMesh (AWS), Consul (Hashicorp) and OpenServiceMesh (Azure). Most of the capabilities of Isito is ultimately provided by Envoy proxy. Envoy has a page outlining its differences with similar technologies. I decided to take a look into Envoy by replacing Nginx with it.

Rate limiting and Circuit Breaker

In a previous post, I used Nginx to front my Flask-based rest API. In most SDLC, it is application developers that create backend APIs or server applications. Most developers specializes in application features, and cannot fathom all the nuances with the TCP/IP network stack. Nginx allows them to push network concerns (non-business features) to a dedicated proxy to handle the dynamics in network connection. Nginx can be configured as both a reverse proxy (handling incoming connection on behalf of the process) and a forward proxy (handling outgoing connection on behalf of the process). This is the prototype of sidecar pattern, an important idea behind service mesh. For example, when a sudden increase in connection to the server-side application, the server process could be either unresponsive (refer to “the queuing knee“, and Little’s Law), or OOM killed. When such interruptions are not automatically recovered, a downtime is caused. This traditionally requires some congestion control strategy for TCP/IP queue but two features provided by a network proxy can help circumvent this situation: rate limiting, and circuit breaking. Rate limiting keeps more requests above threshold from entering the queue. Circuit breaker releases downstream pressure by cutting out existing in-queue request. Nginx added both over the years but Circuit breaker still remains a premium feature exclusive to Nginx Plus. Envoy on the other hand has them free when it was launched.

Envoy also supports other advanced traffic management such as traffic shaping, and mirroring. It is on top of those features that Istio introduces its own abstraction such as virtual service, destination rules to its users. In that sense, we can think of Istio as a configurator (control plane) for Envoy proxy (data plane), similar to Ansible to Nginx proxy instances.

Dynamic Configuration via API

I used Nginx previously with traditional environment and loved its flexibility. As the system grows, I started to feel the pain of management overhead. With one of the production system, there were 25 + instances of Nginx each running on a VM and I managed configuration files with Ansible. Ansible pushes out configuration files and triggers a reload from each Nginx instance. In the cloud-native era where Pods are ephemeral, this kind of overhead would snowball to an unmanageable level. Envoy was designed for cloud-native applications, with all these kinds of problems in mind. Envoy has dynamic configuration. The majority of the configurations can be pulled from xDS API, or file system. Updating configuration drains connections gracefully without runtime having to reload the file. The idea of centrally managing Nginx instances with Ansible, also evolved into the concept of control plane.

TLS origination

In the Orthweb project, I used Nginx to proxy TLS and HTTP traffic, and performed TLS termination on both ports. This is know as TLS offloading. The traffic between the proxy and the upstream service takes place in the clear, even though they do not travel across different network interfaces in most cases. For a true end-to-end encryption, it is helpful to also encrypt the traffic between proxy and upstream server. This requires the capability of securing TCP traffic to upstream server. With Nginx, the ability to secure HTTP traffic to upstream server is offered in open-source. The ability to secure TCP traffic is available in Nginx Plus, or with self-compiled binary. In Envoy, both are available using the UpstreamTlsContext configuration.

More pros

In another post, I also discussed Nginx as a LDAP proxy to front services such as Kibana and Nifi. It requires a proxy service (ldap-auth in this case), to defer auth to third party. Envoy has this capability using a filter with extension for external authorization. Istio also exposes this capability, an enabler for the configuration proposed in my previous post. Envoy also uses WebAssembly for its extensibility.

Another useful feature is protocol detection. It can use filters to detect protocol (TLS or regular TCP) and route traffic to predefined destination.

Performance wise, this benchmark from 2018 ran a comparison among the popular options where Envoy leads by a margin.

Observability (logging, metrics and tracing) are well supported in Envoy. User can configure format of logs that takes effect immediately. There are many metrics that works with Prometheus and they are expandable using filters. On the tracing side, Envoy supports integration with jaeger, zipkin and datadog.

Basics of Envoy

The configuration of Envoy is more involving. There is an Envoy course by Tetrate, as well as two blog entries for envoy 101: Envoy as gateway proxy and File-based dynamic configuration. Another good way to get started is the Sandboxes projects, which covers a number of different areas of configuration. The admin port (by default at port 9901. Istio’s default is 15000) provides helpful information. If we need to turn on debug on some features, we can do so with curl:

curl -X POST http://localhost:9901/logging?client=debug

Stats are exposed at the same port:

curl -X GET http://localhost:9901/stats

When packets are received at a listener, the is first processed by listener filters. Then, depending on filter match, one or more network filter chains will further process the packet, including further actions, as illustrated below:

Envoy supports dynamic configuration, which uses a set of discovery services (xDS) APIs. Some of the important xDS APIs include:

  • LDS (Listener Discovery Service) – allows you to add listeners dynamically while Envoy is running
  • RDS (Route Discovery Service) – allows you to dynamically update routes for HTTP connection managers
  • CDS (Cluster Discovery Service) – allows you to update cluster definitions dynamically
  • EDS (Endpoint Discovery Service) – allows you to add or remove endpoints dynamically
  • Secret DS

The relation can be illustrated in this diagram below:

Envoy  - xDS configuration API overview

This post from Tetrate has more examples.

Nginx to Envoy

For the advantages of Envoy, I decided to migrate from Nginx to Envoy on my Orthweb project. Using Envoy as service proxy is not where Envoy is mostly used (as sidecar), but it is how Envoy was originally used at Lyft to replace ELB in 2015.

The original Nginx configuration was referenced in this old blog post. The Envoy setup also covers both TCP (DICOM) and HTTP (HTTPS) traffic. For HTTP traffic, it also encrypts the traffic to upstream. Below is what it looks like:

admin:
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
  listeners:
  - name: https_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 443
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager 
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager 
          codec_type: AUTO
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: app
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: service-https
          http_filters:
          - name: envoy.filters.http.router
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_certificates:
            - certificate_chain: {"filename": "/etc/ssl/certs/site.pem"}
              private_key: {"filename": "/etc/ssl/certs/site.pem"}
  - name: dicomtls_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 11112
    filter_chains:
    - filters:
      - name: envoy.filters.network.tcp_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          stat_prefix: downstream_cx_total
          cluster: service-dicomtls 
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_certificates:
            - certificate_chain: {"filename": "/etc/ssl/certs/site.pem"}
              private_key: {"filename": "/etc/ssl/certs/site.pem"}
            validation_context:
              allow_expired_certificate: true
              trusted_ca: {"filename": "/etc/ssl/certs/site.pem"}
          require_client_certificate: false
  clusters:
  - name: service-https
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service-https
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: orthanc-backend 
                port_value: 8042
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
  - name: service-dicomtls
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service-dicomtls
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: orthanc-backend 
                port_value: 4242
layered_runtime:
  layers:
  - name: static_layer_0
    static_layer:
      envoy:
        resource_limits:
          listener:
            https_listener:
              connection_limit: 1000
      overload:
        global_downstream_max_connections: 5000

From Nginx to Envoy, to achieve nearly the same functionalities, it takes 100 lines of configuration instead of less than 30. The configuration also appears more abstract, which is one of the cons of Envoy to consider before the migration.