Monitoring the Kubernetes Nginx Ingress with the Nginx InfluxDB Module
By Lorenzo Fontana / Feb 26, 2018 / InfluxDB, Community, Telegraf, Developer
In the process of moving some of our container workloads to Kubernetes we deployed the ingress-nginx project to have an Ingress controller that can instrument Nginx for incoming traffic to exposed services.
The project itself is pretty well crafted, and it met all the expectations we had for a project under the Kubernetes dome. Overall we like the idea that it has a controlling Daemon (the controller) controlling nginx by managing, configuring and scaling it.
However, after a few weeks of usage we noted that the whole thing was lacking one of the most important features in terms of observability: the ability to track down in real time all of the incoming requests in terms of local requests and proxied ones.
In fact, it was very easy for us to pull aggregated metrics on the status of the controller thanks to a mix of usage between the Prometheus endpoint and the Telegraf plugin. On the other hand, for certain situations we noted that it was very useful for us to keep track of every single request, pushing it as soon as it happens directly to InfluxDB.
We want to be able to do that for three main reasons:
- Spot as soon as possible any proxy backend error or unexpected status code;
- Understand how clients are connecting to our services, http methods, type of connection, requested endpoints;
- Taking action (with Kapacitor) on consistent streams of raw data. In this case, it is more effective because of the nature of the data itself. For example, I want an alert if the requests are not processed and not going back to the client that made them, and then, after the alert, I want to take action on that specific request.
When a bad request happens, doing a query like this is the ideal situation:
SELECT * FROM FROM nginx_requests.threedays WHERE "status" = '502' 1518524349994255769 173 0 text/html 152 GET 35 myserver 502 /bad 1518524349994714916 173 0 text/html 152 GET 35 myserver 502 /bad
After some searching in the interwebs, we wrote a module (nginx-influxdb-module) that acts as a filter on each request in a non-blocking fashion and sends out the processed data to an InfluxDB backend using UDP and line protocol.
Kubernetes Ingress and Telegraf as Sidecar
After writing the Nginx module to serve our purpose, we needed a way to connect it to the Kubernetes Ingress Controller. To do so, we actually forked the ingress project to compile the module inside its nginx.
Why forking? We needed to fork for a few reasons:
- Crafting a full pull request to deeply integrate the module (with the configmaps and everything) with the controller is an effort that requires a fork;
- Nginx supports dynamic modules, but has strict runtime requirements that do not allow just dropping in the module shared objects compiled in some other environment;
To use the module in the Kubernetes Nginx ingress controller, you have two options:
- Plain usage with direct UDP connection
- Connection using Telegraf as sidecar proxy
Plain Usage with Direct UDP Connection
Plain Usage with Direct UDP Connection
You can follow the official steps by just replacing the controller image in
without-rbac.yml with our fork’s controller.
Nota bene: Our fork is mirroring the actual tags of the official ingress controller. For each tag starting from
nginx-0.10.2 you will find the equivalent ones here.
We don’t have a deeply integrated set of specific parameters for now, but we rely on the nginx.ingress.kubernetes.io/configuration-snippet annotation:
kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/configuration-snippet: | influxdb server_name=yourappname host=your-influxdb port=8089 measurement=nginx enabled=true;
A full example using the annotation to configure the InfluxDB module would look like this:
--- apiVersion: v1 kind: Namespace metadata: name: caturday --- apiVersion: v1 kind: Service metadata: name: caturday namespace: caturday labels: app: caturday spec: selector: app: caturday ports: - name: caturday port: 8080 protocol: TCP --- apiVersion: extensions/v1beta1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/configuration-snippet: | influxdb server_name=acceptance-ingress host=127.0.0.1 port=8094 measurement=nginx enabled=true; name: caturday namespace: caturday spec: rules: - host: kittens.local http: paths: - backend: serviceName: caturday servicePort: 8080 path: / --- apiVersion: apps/v1beta2 kind: Deployment metadata: name: caturday namespace: caturday labels: app: catrday spec: replicas: 3 selector: matchLabels: app: caturday template: metadata: labels: app: caturday spec: containers: - name: caturday image: docker.io/fntlnz/caturday:latest resources: limits: cpu: 0.1 memory: 100M
Connection Using Telegraf as Sidecar Proxy
This configuration is a bit different than the official one because it involves the deployment of a Telegraf container as a sidecar proxy in every Nginx controller pod. To do this, we need a different controller definition, a configmap to configure Telegraf’s environment variables and a secret for InfluxDB urls, username, password and database.
--- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-ingress-controller namespace: ingress-nginx spec: replicas: 1 selector: matchLabels: app: ingress-nginx template: metadata: labels: app: ingress-nginx annotations: prometheus.io/port: '10254' prometheus.io/scrape: 'true' spec: serviceAccountName: nginx-ingress-serviceaccount initContainers: - command: - sh - -c - sysctl -w net.core.somaxconn=32768; sysctl -w net.ipv4.ip_local_port_range="1024 65535" image: docker.io/alpine:3.6 imagePullPolicy: IfNotPresent name: sysctl securityContext: privileged: true containers: - name: nginx-ingress-controller image: quay.io/fntlnz/nginx-ingress-controller:kubernetes-controller-8b30ff6 args: - /nginx-ingress-controller - --default-backend-service=$(POD_NAMESPACE)/default-http-backend - --configmap=$(POD_NAMESPACE)/nginx-configuration - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services - --udp-services-configmap=$(POD_NAMESPACE)/udp-services - --annotations-prefix=nginx.ingress.kubernetes.io env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace ports: - name: http containerPort: 80 - name: https containerPort: 443 livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 - name: nginx-telegraf-collector image: docker.io/telegraf:1.5.2 ports: - name: udp containerPort: 8094 env: - name: HOSTNAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: ENV valueFrom: secretKeyRef: name: telegraf key: env - name: MONITOR_RETENTION_POLICY valueFrom: secretKeyRef: name: telegraf key: monitor_retention_policy - name: MONITOR_USERNAME valueFrom: secretKeyRef: name: telegraf key: monitor_username - name: MONITOR_PASSWORD valueFrom: secretKeyRef: name: telegraf key: monitor_password - name: MONITOR_HOST valueFrom: secretKeyRef: name: telegraf key: monitor_host - name: MONITOR_DATABASE valueFrom: secretKeyRef: name: telegraf key: monitor_database volumeMounts: - name: config mountPath: /etc/telegraf volumes: - name: config configMap: name: telegraf
The Telegraf's ConfigMap
--- apiVersion: v1 kind: ConfigMap metadata: name: telegraf namespace: ingress-nginx labels: k8s-app: telegraf data: telegraf.conf: |+ [global_tags] env = "$ENV" [agent] interval = "10s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "" debug = false quiet = false logfile = "" hostname = "$HOSTNAME" omit_hostname = false [[outputs.influxdb]] urls = ["$MONITOR_HOST"] database = "$MONITOR_DATABASE" retention_policy = "$MONITOR_RETENTION_POLICY" write_consistency = "any" timeout = "5s" username = "$MONITOR_USERNAME" password = "$MONITOR_PASSWORD" [[inputs.socket_listener]] service_address = "udp://:8094"
The configmap needs to be configured in order to allow Telegraf to connect to the InfluxDB backend. To do so:
kubectl create secret -n ingress-nginx generic telegraf \ --from-literal=env=acc \ --from-literal=monitor_retention_policy="threedays" \ --from-literal=monitor_username="" \ --from-literal=monitor_password="" \ --from-literal=monitor_host=http://your-influxdb:8086 \ --from-literal=monitor_database=nginx_ingress
In the above example, we used “threedays” as the retention policy. You can leave the retention policy empty, but if you want to keep three days as in the example you can simply do this query:
CREATE RETENTION POLICY threedays ON nginx_ingress DURATION 3d REPLICATION 1
The general plan is to stabilize the module’s API and tag a 1.0 release. While doing this, we will complete the integration with the Kubernetes Ingress and send a PR to the upstream project to allow people to optionally enable the module in their Ingress controllers.
Is This Ready for Production?
We’re using all of this in production; however, we don’t recommend you to do that at this stage. If you have a staging/acceptance environment, you can test it and contribute to the module to allow us to move forward with a 1.0 release and help us contribute to the kubernetes/ingress-nginx project.
Even if writing this kind of module is an easy pick, it still requires effort. Fortunately, the journey has been made easier thanks to the nginx/nginx-tests repo that allowed us to spot bad and edge behaviors. Also, kudos to the Ingress controller that is highly customizable and exposes the
nginx.ingress.kubernetes.io/configuration-snippet, which turned to be really useful at this stage.