Scaling Kubernetes Deployments with InfluxDB & Flux

Navigate to:

This article was written by InfluxDB Community member and InfluxAce David Flanagan

Eighteen hours ago, I was meeting with some colleagues to discuss our Kubernetes initiatives and grand plan for improving the integrations and support for InfluxDB running on Kubernetes. During this meeting, I laid out what I felt was missing for InfluxDB to really shine on Kubernetes. I won’t bore you with the details, but one of the things that I insisted we needed was a metrics server integration to provide horizontal pod autoscaling (HPA) based on data within InfluxDB. As I proposed the options we could take to bootstrap this quickly, my wonderful colleague Giacomo, chirped in:

“That already exists.”

TL;DR

  • You can deploy kube-metrics-adapter to your cluster, which supports annotating your HPA resources with a Flux query to control the scaling of your deployment resources.
  • InfluxData has a Helm Charts repository that includes a chart for InfluxDB 2
  • Telegraf can be used as a sidecar for local metric collection
  • InfluxDB 2 has a component called pkger that allows for a declarative interface, through manifests (like Kubernetes), for the creation and management of InfluxDB resources.

Scaling your deployments with Flux

Giacomo continued with a great explanation of what was built, but I’m going to keep this brief. It turns out that a former colleague of ours, Lorenzo Affetti, submitted some PRs to Zalandos metrics-adapter project at the beginning of the year. The pull requests he submitted have since been merged, and we can actually use this project to scale our deployments by annotating said deployments with a Flux query.

How does it work? It’s rather simple. Let me show you.

Deploy InfluxDB

This article assumes you already have InfluxDB 2 running within your cluster. If you don’t, you can use our Helm Chart to deploy InfluxDB in 30s. I’ll start the clock now …

If you’re feeling brave, you can drop this into a terminal and hope for the best.

kubectl create namespace monitoring 
helm repo add influxdata https://helm.influxdata.com/ 
helm upgrade --install influxdb --namespace=monitoring influxdata/influxdb2

Deploy Metrics Adapter

The first thing we need to do is deploy the metrics-adapter to our Kubernetes cluster. Zalando don’t provide a Helm chart for doing this, but Banzai Cloud do. Unfortunately, the Banzai Cloud chart needs a few tweaks to support the InfluxDB Collector; so for today, we’re going to deploy this with custom manifests. I know it’s not great, but you only need to do it once. ????

The Manifests

A word of caution, before you blindly copy and paste this to your cluster: there’s 3 hard-coded variables in the args section of the Deployment resource. If you plan to roll this out to production, please use Secrets and mount them as files or environment variables, rather than taking the haphazard approach that I use in this demo.

The 3 hard-coded variables are:

  • InfluxDB URL
  • Organization Name
  • Token
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: custom-metrics-apiserver
  namespace: custom-metrics-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-server-resources
rules:
  - apiGroups:
      - custom.metrics.k8s.io
    resources:
      - "*"
    verbs:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-metrics-server-resources
rules:
  - apiGroups:
      - external.metrics.k8s.io
    resources:
      - "*"
    verbs:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-reader
rules:
  - apiGroups:
      - ""
    resources:
      - namespaces
      - pods
      - services
    verbs:
      - get
      - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-collector
rules:
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - list
  - apiGroups:
      - apps
    resources:
      - deployments
      - statefulsets
    verbs:
      - get
  - apiGroups:
      - extensions
      - networking.k8s.io
    resources:
      - ingresses
    verbs:
      - get
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hpa-controller-custom-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-server-resources
subjects:
  - kind: ServiceAccount
    name: horizontal-pod-autoscaler
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hpa-controller-external-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-metrics-server-resources
subjects:
  - kind: ServiceAccount
    name: horizontal-pod-autoscaler
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
  - kind: ServiceAccount
    name: custom-metrics-apiserver
    namespace: custom-metrics-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
  - kind: ServiceAccount
    name: custom-metrics-apiserver
    namespace: custom-metrics-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics-resource-collector
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-resource-collector
subjects:
  - kind: ServiceAccount
    name: custom-metrics-apiserver
    namespace: custom-metrics-server
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: kube-metrics-adapter
    namespace: custom-metrics-server
  version: v1beta1
  versionPriority: 100
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.external.metrics.k8s.io
spec:
  group: external.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: kube-metrics-adapter
    namespace: custom-metrics-server
  version: v1beta1
  versionPriority: 100
---
apiVersion: v1
kind: Service
metadata:
  name: kube-metrics-adapter
  namespace: custom-metrics-server
spec:
  ports:
    - port: 443
      targetPort: 443
  selector:
    app: kube-metrics-adapter
---
apiVersion: v1
kind: Namespace
metadata:
  name: custom-metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: kube-metrics-adapter
  name: kube-metrics-adapter
  namespace: custom-metrics-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-metrics-adapter
  template:
    metadata:
      labels:
        app: kube-metrics-adapter
    spec:
      containers:
        - args:
            - --influxdb-address=http://influxdb.monitoring.svc:9999
            - --influxdb-token=secret-token
            - --influxdb-org=InfluxData
          image: registry.opensource.zalan.do/teapot/kube-metrics-adapter:v0.1.5
          name: kube-metrics-adapter
      serviceAccountName: custom-metrics-apiserver

The big demo

Now that we have InfluxDB and Metrics Adapter running in our cluster, let’s scale some pods!

In the interest of keeping this demo quite complete, I’m going to cover using Telegraf as a sidecar, to scrape the metrics from nginx, and using pkger to create the bucket for our metrics using a Kubernetes concept called initContainers. In order to accomplish both these steps, we need to inject a ConfigMap to provide a Telegraf configuration file and a pkger manifest. Our nginx configuration is also included, which enables the status page.

You SHOULD read the comments above each file key within the YAML.

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-hpa
data:
  # This is our nginx configuration. It enables the status (/nginx_status) page to be scraped from Telegraf over the shared interface within the pod.
  default.conf: |
    server {
        listen       80;
        listen  [::]:80;
        server_name  localhost;

        location / {
            root   /usr/share/nginx/html;
            index  index.html index.htm;
        }

        location /nginx_status {
          stub_status;
          allow 127.0.0.1;	#only allow requests from localhost
          deny all;		#deny all other hosts
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   /usr/share/nginx/html;
        }
    }

  # This is our Telegraf configuration. It has the same hard coded values we mentioned earlier. You'll want to move them to secrets for a production deployment,
  # but I'm keeping that out of scope for this demo. We configure Telegraf to pull metrics from nginx and write to our local InfluxDB 2 instance.
  telegraf.conf: |
    [agent]
      interval = "2s"
      flush_interval = "2s"

    [[inputs.nginx]]
      urls = ["http://localhost/nginx_status"]
      response_timeout = "1s"

    [[outputs.influxdb_v2]]
      urls = ["http://influxdb.monitoring.svc:9999"]
      bucket = "nginx-hpa"
      organization = "InfluxData"
      token = "secret-token"

  # Finally, we need a bucket to store our metrics. You don't need a long retention, as it's only used for HPA.
  buckets.yaml: |
    apiVersion: influxdata.com/v2alpha1
    kind: Bucket
    metadata:
      name: nginx-hpa
    spec:
      description: Nginx HPA Example Bucket
      retentionRules:
      - type: expire
        everySeconds: 900

Now I’m going to deploy nginx to the cluster. I’ve chosen nginx because it’s very easy to cause a scaling event with the vast array of HTTP load testing tools available; I’m going to use baton.

Our nginx manifest looks like so. Again, please remember to extract the hard-coded values and use secrets!

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hpa
spec:
  selector:
    matchLabels:
      app: nginx-hpa
  template:
    metadata:
      labels:
        app: nginx-hpa
    spec:
      volumes:
        - name: influxdb-config
          configMap:
            name: nginx-hpa
      initContainers:
        - name: influxdb
      image: quay.io/influxdb/influxdb:2.0.0-beta
      volumeMounts:
        - mountPath: /etc/influxdb
      name: influxdb-config
      command:
        - influx
      args:
        - --host
        - http://influxdb.monitoring.svc:9999
        - --token
        - secret-token
        - pkg
        - --file
        - /etc/influxdb/buckets.yaml
        - -o
        - InfluxData
        - --force
        - "true"
      containers:
        - name: nginx
          image: nginx:latest
          volumeMounts:
            - mountPath: /etc/nginx/conf.d/default.conf
              name: influxdb-config
              subPath: default.conf
          ports:
            - containerPort: 80
        - name: telegraf
          image: telegraf:1.16
          volumeMounts:
            - mountPath: /etc/telegraf/telegraf.conf
              name: influxdb-config
              subPath: telegraf.conf

Finally, let’s take a look at the HorizontalPodAutoscaler manifest that completes our demonstration.

We’ve added an annotation, metric-config.external.flux-query.influxdb/interval, that allows us to specify the Flux query we wish to execute in order to get the metrics we require to determine if this deployment should be scaled up. Our Flux query fetches the waiting field from our nginx measurement, which with a greater-than-zero value, is a strong indicator that we need to scale horizontally to handle the current flow of traffic.

Our goal is to keep that waiting number as close to 0 / 1 as possible. We can also use another annotation, metric-config.external.flux-query.influxdb/interval, to define how frequently we want to check for traffic and scaling events. We’re going to use 5s intervals.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  annotations:
    metric-config.external.flux-query.influxdb/interval: "5s"
    metric-config.external.flux-query.influxdb/http_requests: |
      from(bucket: "nginx-hpa")
        |> range(start: -30s)
        |> filter(fn: (r) => r._measurement == "nginx")
        |> filter(fn: (r) => r._field == "waiting")
        |> group()
        |> max()
        // Rename "_value" to "metricvalue" for letting the metrics server properly unmarshal the result.
        |> rename(columns: {_value: "metricvalue"})
        |> keep(columns: ["metricvalue"])
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-hpa
  minReplicas: 1
  maxReplicas: 4
  metrics:
    - type: External
      external:
        metric:
          name: flux-query
          selector:
            matchLabels:
              query-name: http_requests
        target:
          type: Value
          value: "1"

That’s it! Easy when you know how, right?

If you want to explore this in more detail, or want to know more about monitoring Kubernetes with InfluxDB — please check out my examples repository with many more goodies for you to peruse.