Monitoring the Kubernetes Nginx Ingress with the Nginx InfluxDB Module

Lorenzo Fontana (InfluxData) — Mon, 26 Feb 2018 12:47:32 -0700

In the process of moving some of our container workloads to Kubernetes we deployed the ingress-nginx project to have an Ingress controller that can instrument Nginx for incoming traffic to exposed services.

The project itself is pretty well crafted, and it met all the expectations we had for a project under the Kubernetes dome. Overall we like the idea that it has a controlling Daemon (the controller) controlling nginx by managing, configuring and scaling it.

However, after a few weeks of usage we noted that the whole thing was lacking one of the most important features in terms of observability: the ability to track down in real time all of the incoming requests in terms of local requests and proxied ones.

In fact, it was very easy for us to pull aggregated metrics on the status of the controller thanks to a mix of usage between the Prometheus endpoint and the Telegraf plugin. On the other hand, for certain situations we noted that it was very useful for us to keep track of every single request, pushing it as soon as it happens directly to InfluxDB.

We want to be able to do that for three main reasons:

Spot as soon as possible any proxy backend error or unexpected status code;
Understand how clients are connecting to our services, http methods, type of connection, requested endpoints;
Taking action (with Kapacitor) on consistent streams of raw data. In this case, it is more effective because of the nature of the data itself. For example, I want an alert if the requests are not processed and not going back to the client that made them, and then, after the alert, I want to take action on that specific request.

When a bad request happens, doing a query like this is the ideal situation:

SELECT * FROM FROM nginx_requests.threedays WHERE "status" = '502'

1518524349994255769 173             0                     text/html                                                          152         GET    35             myserver    502    /bad
1518524349994714916 173             0                     text/html                                                          152               GET    35             myserver    502    /bad

After some searching in the interwebs, we wrote a module (nginx-influxdb-module) that acts as a filter on each request in a non-blocking fashion and sends out the processed data to an InfluxDB backend using UDP and line protocol.

Kubernetes Ingress and Telegraf as Sidecar

After writing the Nginx module to serve our purpose, we needed a way to connect it to the Kubernetes Ingress Controller. To do so, we actually forked the ingress project to compile the module inside its nginx.

Why forking? We needed to fork for a few reasons:

Crafting a full pull request to deeply integrate the module (with the configmaps and everything) with the controller is an effort that requires a fork;
Nginx supports dynamic modules, but has strict runtime requirements that do not allow just dropping in the module shared objects compiled in some other environment;

To use the module in the Kubernetes Nginx ingress controller, you have two options:

Plain usage with direct UDP connection
Connection using Telegraf as sidecar proxy

Plain Usage with Direct UDP Connection

You can follow the official steps by just replacing the controller image in with-rbac.yml or without-rbac.yml with our fork’s controller.

Nota bene: Our fork is mirroring the actual tags of the official ingress controller. For each tag starting from nginx-0.10.2 you will find the equivalent ones here.

We don’t have a deeply integrated set of specific parameters for now, but we rely on the nginx.ingress.kubernetes.io/configuration-snippet annotation:

kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/configuration-snippet: |
  influxdb server_name=yourappname host=your-influxdb port=8089 measurement=nginx enabled=true;

A full example using the annotation to configure the InfluxDB module would look like this:

---
apiVersion: v1
kind: Namespace
metadata:
  name: caturday
---
apiVersion: v1
kind: Service
metadata:
  name: caturday
  namespace: caturday
  labels:
    app: caturday
spec:
  selector:
    app: caturday
  ports:
    - name: caturday
      port: 8080
      protocol: TCP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/configuration-snippet: |
      influxdb server_name=acceptance-ingress host=127.0.0.1 port=8094 measurement=nginx enabled=true;
  name: caturday
  namespace: caturday
spec:
  rules:
    - host: kittens.local
      http:
        paths:
          - backend:
              serviceName: caturday
              servicePort: 8080
            path: /
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: caturday
  namespace: caturday
  labels:
    app: catrday
spec:
  replicas: 3
  selector:
    matchLabels:
      app: caturday
  template:
    metadata:
      labels:
        app: caturday
    spec:
      containers:
      - name: caturday
        image: docker.io/fntlnz/caturday:latest
        resources:
          limits:
            cpu: 0.1
            memory: 100M

Connection Using Telegraf as Sidecar Proxy

This configuration is a bit different than the official one because it involves the deployment of a Telegraf container as a sidecar proxy in every Nginx controller pod. To do this, we need a different controller definition, a configmap to configure Telegraf’s environment variables and a secret for InfluxDB urls, username, password and database.

The Controller

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ingress-nginx
  template:
    metadata:
      labels:
        app: ingress-nginx
      annotations:
        prometheus.io/port: '10254'
        prometheus.io/scrape: 'true'
    spec:
      serviceAccountName: nginx-ingress-serviceaccount
      initContainers:
      - command:
        - sh
        - -c
        - sysctl -w net.core.somaxconn=32768; sysctl -w net.ipv4.ip_local_port_range="1024 65535"
        image: docker.io/alpine:3.6
        imagePullPolicy: IfNotPresent
        name: sysctl
        securityContext:
          privileged: true
      containers:
      - name: nginx-ingress-controller
        image: quay.io/fntlnz/nginx-ingress-controller:kubernetes-controller-8b30ff6
        args:
          - /nginx-ingress-controller
          - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
          - --configmap=$(POD_NAMESPACE)/nginx-configuration
          - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
          - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
          - --annotations-prefix=nginx.ingress.kubernetes.io
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
      - name: nginx-telegraf-collector
        image: docker.io/telegraf:1.5.2
        ports:
        - name: udp
          containerPort: 8094
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: ENV
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: env
        - name: MONITOR_RETENTION_POLICY
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_retention_policy
        - name: MONITOR_USERNAME
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_username
        - name: MONITOR_PASSWORD
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_password
        - name: MONITOR_HOST
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_host
        - name: MONITOR_DATABASE
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_database
        volumeMounts:
        - name: config
          mountPath: /etc/telegraf
      volumes:
      - name: config
        configMap:
          name: telegraf

The Telegraf's ConfigMap

---

apiVersion: v1
kind: ConfigMap
metadata:
  name: telegraf
  namespace: ingress-nginx
  labels:
    k8s-app: telegraf
data:
  telegraf.conf: |+
    [global_tags]
      env = "$ENV"
    [agent]
      interval = "10s"
      round_interval = true
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      collection_jitter = "0s"
      flush_interval = "10s"
      flush_jitter = "0s"
      precision = ""
      debug = false
      quiet = false
      logfile = ""
      hostname = "$HOSTNAME"
      omit_hostname = false
    [[outputs.influxdb]]
      urls = ["$MONITOR_HOST"]
      database = "$MONITOR_DATABASE"
      retention_policy = "$MONITOR_RETENTION_POLICY"
      write_consistency = "any"
      timeout = "5s"
      username = "$MONITOR_USERNAME"
      password = "$MONITOR_PASSWORD"
     [[inputs.socket_listener]]
      service_address = "udp://:8094"

The configmap needs to be configured in order to allow Telegraf to connect to the InfluxDB backend. To do so:

kubectl create secret -n ingress-nginx generic telegraf \
  --from-literal=env=acc \
  --from-literal=monitor_retention_policy="threedays" \
  --from-literal=monitor_username="" \
  --from-literal=monitor_password="" \
  --from-literal=monitor_host=http://your-influxdb:8086 \
  --from-literal=monitor_database=nginx_ingress

In the above example, we used “threedays” as the retention policy. You can leave the retention policy empty, but if you want to keep three days as in the example you can simply do this query:

CREATE RETENTION POLICY threedays ON nginx_ingress DURATION 3d REPLICATION 1

Next Steps

The general plan is to stabilize the module’s API and tag a 1.0 release. While doing this, we will complete the integration with the Kubernetes Ingress and send a PR to the upstream project to allow people to optionally enable the module in their Ingress controllers.

Is This Ready for Production?

We’re using all of this in production; however, we don’t recommend you to do that at this stage. If you have a staging/acceptance environment, you can test it and contribute to the module to allow us to move forward with a 1.0 release and help us contribute to the kubernetes/ingress-nginx project.

Conclusions

Even if writing this kind of module is an easy pick, it still requires effort. Fortunately, the journey has been made easier thanks to the nginx/nginx-tests repo that allowed us to spot bad and edge behaviors. Also, kudos to the Ingress controller that is highly customizable and exposes the nginx.ingress.kubernetes.io/configuration-snippet, which turned to be really useful at this stage.

InfluxData Blog - Lorenzo Fontana