Monitoring Kubernetes Architecture

Navigate to:

There are two important points you need to think about when monitoring a Kubernetes architecture. One is about the underlying resources, the bare metal Kubernetes is running on. The second is related to every service, ingress, and pod that you deployed. To have good visibility into your clusters you need to get metrics from both so that you can compare and reference these metrics. I am writing this article because at InfluxData, we are getting our hands dirty with Kubernetes and I think it is time to share some of the practices that we applied to our clusters to get your feedback (also because they are working pretty well). You should have a totally dedicated namespace for monitoring. We called it monitoring:

kubectl create namespace monitoring

Do not deploy it on the default namespace. In general, the default namespace should be always empty. I am assuming that you are able to deploy InfluxDB and Chronograf on Kubernetes here, or this article will become an unreadable crappy YAML file.

Just a note about persistent volumes. InfluxDB, Kapacitor, and Chronograf store data on disk. This means that we need to be careful about how we manage them. Otherwise, our data will go away with the container. Kubernetes has a resource called Persistent Volume that helps you mount volumes based on where you are running your cluster. We are using AWS and we claim EBS volumes to manage /var/lib/influxdb and the other directories.

Now that you have your system running, we can use a DeamonSet to deploy Telegraf on every node. This Telegraf agent will take care of resources like iops, network, cpu, memory, disk and other services from the host. In order to do that, we need to share some directories from the host or Telegraf will end up monitoring the container instead of the host.

DaemonSet is a Kubernetes’ resource that distributes containers across all the nodes automatically. It’s very powerful if you need to deploy collectors for metrics or logs like we are doing now.

apiVersion: v1
kind: ConfigMap
metadata:
  name: telegraf
  namespace: monitoring
  labels:
    k8s-app: telegraf
data:
  telegraf.conf: |+
    [global_tags]
      env = "$ENV"
    [agent]
      hostname = "$HOSTNAME"
    [[outputs.influxdb]]
      urls = ["$MONITOR_HOST"] # required
      database = "$MONITOR_DATABASE" # required

      timeout = "5s"
      username = "$MONITOR_USERNAME"
      password = "$MONITOR_PASSWORD"
      
    [[inputs.cpu]]
      percpu = true
      totalcpu = true
      collect_cpu_time = false
      report_active = false
    [[inputs.disk]]
      ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
    [[inputs.diskio]]
    [[inputs.kernel]]
    [[inputs.mem]]
    [[inputs.processes]]
    [[inputs.swap]]
    [[inputs.system]]
    [[inputs.docker]]
      endpoint = "unix:///var/run/docker.sock"
    [[inputs.kubernetes]]
      url = "http://1.1.1.1:10255"

---
# Section: Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: telegraf
  namespace: monitoring
  labels:
    k8s-app: telegraf
spec:
  selector:
    matchLabels:
      name: telegraf
  template:
    metadata:
      labels:
        name: telegraf
    spec:
      containers:
      - name: telegraf
        image: docker.io/telegraf:1.5.2
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 500m
            memory: 500Mi
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: "HOST_PROC"
          value: "/rootfs/proc"
        - name: "HOST_SYS"
          value: "/rootfs/sys"
        - name: ENV
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: env
        - name: MONITOR_USERNAME
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_username
        - name: MONITOR_PASSWORD
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_password
        - name: MONITOR_HOST
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_host
        - name: MONITOR_DATABASE
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: monitor_database
        volumeMounts:
        - name: sys
          mountPath: /rootfs/sys
          readOnly: true
        - name: docker
          mountPath: /var/run/docker.sock
          readOnly: true
        - name: proc
          mountPath: /rootfs/proc
          readOnly: true
        - name: docker-socket
          mountPath: /var/run/docker.sock
        - name: utmp
          mountPath: /var/run/utmp
          readOnly: true
        - name: config
          mountPath: /etc/telegraf
      terminationGracePeriodSeconds: 30
      volumes:
      - name: sys
        hostPath:
          path: /sys
      - name: docker-socket
        hostPath:
          path: /var/run/docker.sock
      - name: proc
        hostPath:
          path: /proc
      - name: utmp
        hostPath:
          path: /var/run/utmp
      - name: config
        configMap:
          name: telegraf

As you can see, there are some environment variables required by the config-map and Telegraf, so I used secret to inject them. To create it, run this command, replacing the options with your needs:

kubectl create secret -n monitoring generic telegraf --from-literal=env=prod --from-literal=monitor_username=youruser --from-literal=monitor_password=yourpassword --from-literal=monitor_host=https://your.influxdb.local --from-literal=monitor_database=yourdb

There is a parameter called env that is set on prod for this example. We set this variable on every instance of Telegraf and it identifies the cluster. If you replicate environments as we do, you can create the same dashboard on Chronograf and use template variables to switch between clusters.

Now if you did everything right, you will be able to see hosts and points stored on InfluxDB and Chronograf. This is just the first phase: we now have visibility into the hosts, but we don’t know anything about the services that we are running.

Telegraf Sidecar

There are different ways to address this problem, but the one we are using is called sidecar. This term became popular recently in networking and routing mesh, but it’s similar to what we are doing.

Let’s assume that you need etcd because one of your applications uses it as storage. On k8s it will be StatefulSet like this:

apiVersion: v1
data:
  telegraf.conf: |+
    [global_tags]
      env = "$ENV"
    [[inputs.prometheus]]
      urls = ["http://localhost:2379/metrics"]
    [agent]
      hostname = "$HOSTNAME"
    [[outputs.influxdb]]
      urls = ["$MONITOR_HOST"]
      database = "mydb"
      write_consistency = "any"
      timeout = "5s"
      username = "$MONITOR_USERNAME"
      password = "$MONITOR_PASSWORD"
kind: ConfigMap
metadata:
  name: telegraf-etcd-config
  namespace: myapp
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  namespace: "myapp"
  name: "etcd"
  labels:
    component: "etcd"
spec:
  serviceName: "etcd"
  # changing replicas value will require a manual etcdctl member remove/add
  # command (remove before decreasing and add after increasing)
  replicas: 3
  template:
    metadata:
      name: "etcd"
      labels:
        component: "etcd"
    spec:
      volumes:
        - name: telegraf-etcd-config
          configMap:
            name: telegraf-etcd-config
      containers:
      - name: "telegraf"
        image: "docker.io/library/telegraf:1.4"
        volumeMounts:
          - name: telegraf-etcd-config
            mountPath: /etc/telegraf
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: MONITOR_HOST
          valueFrom:
            secretKeyRef:
              name: monitor
              key: monitor_host
        - name: MONITOR_USERNAME
          valueFrom:
            secretKeyRef:
              name: monitor
              key: monitor_username
        - name: MONITOR_PASSWORD
          valueFrom:
            secretKeyRef:
              name: monitor
              key: monitor_password
        - name: ENV
          valueFrom:
            secretKeyRef:
              name: monitor
              key: env
      - name: "etcd"
        image: "quay.io/coreos/etcd:v3.2.9"
        ports:
        - containerPort: 2379
          name: client
        - containerPort: 2380
          name: peer
        env:
        - name: CLUSTER_SIZE
          value: "3"
        - name: SET_NAME
          value: "etcd"
        command:
          - "/bin/sh"
          - "-ecx"
          - |
            IP=$(hostname -i)
            for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do
              while true; do
                echo "Waiting for ${SET_NAME}-${i}.${SET_NAME} to come up"
                ping -W 1 -c 1 ${SET_NAME}-${i}.${SET_NAME} > /dev/null && break
                sleep 1s
              done
            done
            PEERS=""
            for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do
                PEERS="${PEERS}${PEERS:+,}${SET_NAME}-${i}=http://${SET_NAME}-${i}.${SET_NAME}:2380"
            done
            # start etcd. If cluster is already initialized the `--initial-*` options will be ignored.
            exec etcd --name ${HOSTNAME} \
              --listen-peer-urls http://${IP}:2380 \
              --listen-client-urls http://${IP}:2379,http://127.0.0.1:2379 \
              --advertise-client-urls http://${HOSTNAME}.${SET_NAME}:2379 \
              --initial-advertise-peer-urls http://${HOSTNAME}.${SET_NAME}:2380 \
              --initial-cluster-token etcd-cluster-1 \
              --initial-cluster ${PEERS} \
              --initial-cluster-state new \
              --data-dir /var/run/etcd/default.etcd

As you can see, there is a lot more. There is a config map and there are two containers deployed under the same pod: etcd and Telegraf. Containers under the same pod share the network namespace, so resolving etcd from Telegraf is as easy as calling http://localhost:2379/metrics. etcd exposes Prometheus-like metrics and you can use the Telegraf input plugin to grab them.

apiVersion: v1
data:
  telegraf.conf: |+
    [global_tags]
      env = "$ENV"
    [[inputs.prometheus]]
      urls = ["http://localhost:2379/metrics"]
    [agent]
      hostname = "$HOSTNAME"
    [[outputs.influxdb]]
      urls = ["$MONITOR_HOST"]
      database = "mydb"
      write_consistency = "any"
      timeout = "5s"
      username = "$MONITOR_USERNAME"
      password = "$MONITOR_PASSWORD"
kind: ConfigMap
metadata:
  name: telegraf-etcd-config
  namespace: myapp

Let’s assume that your Go application pushes metrics to InfluxDB using our sdk. What you can do is to deploy on the same pod as we did for etcd a telegraf that uses the http listener input plugin. This plugin is powerful because it exposes a compatible InfluxDB http layer, and when you point your app to localhost:8086 you don’t need to change anything—you will end up speaking with Telegraf without touching code.

Telegraf as a middleman between your app and InfluxDB is a plus because it will batch requests, optimizing network traffic and load on InfluxDB. Another optimization, although it requires a bit of code, is to move your application from tcp to udp. The sdk supports both methods, and you can use the socket_listener_plugin from Telegraf.

It means that your application will speak over upd to Telegraf and they share the network namespace, so packet loss will be minimized, your application will be faster, Telegraf will communicate the points over tcp to InfluxDB, and you can be sure that everything will land in InfluxDB. Bonus: If Telegraf goes down for some reason, your application won’t crash because udp doesn’t care! Your application will work as usual, but won’t store any points. If this scenario works for you, that’s great!

The benefit of using Telegraf as a sidecar to monitor distributed applications on Kubernetes is that the monitoring configuration for your services will be close to the application specification, so deployment is simple and sharing the same pod service discovery is easy, just like calling localhost.

This is usually a problem in this environment because if you have one collector, the containers change and you won’t know where they will be. Configuration can be tricky, but using this architecture, Telegraf will follow your application forever.