Docker Monitoring Tutorial – How to Monitor Docker with Telegraf and InfluxDB
Jan 11, 2023
This article was priginally published on the CNCF blog and is written by Cameron Pavey. Scroll down for the author’s bio.
Docker is an increasingly popular choice for businesses dealing with containerized applications. However, as with any new technology, Docker introduces complexities that need to be managed. Some of these complexities relate to infrastructure and application monitoring. Due to the abstraction offered by containers, traditional monitoring solutions might not be suitable for Docker-based workloads.
Thankfully, tools like InfluxDB and Telegraf can help you mitigate this complexity, letting you monitor Docker-based applications easily. This tutorial will teach you how to set up and configure InfluxDB and Telegraf to collect metrics from a Docker installation.
Why monitor Docker
As with traditional non-containerized workloads, there are several reasons why it’s crucial to monitor your Docker-based applications. Keeping an eye on the critical metrics of your application allows you to be proactive when it comes to things like performance issues and resource allocation optimizations. If your application is routinely consuming far more resources than you would expect, you need to know about this sooner rather than later. If left unattended, these issues can result in unexpected bills from cloud providers or support requests from unsatisfied customers (if there are underlying performance issues).
If you have detailed metrics of how your application behaves at runtime, it puts you in a position to solve problems early. Typically, you want to collect metrics like CPU and memory usage, as well as Disk and Network I/O, and any number of other, more specific metrics depending on your application’s needs.
However, collecting large volumes of this kind of data can quickly become tricky. You need to collect the data and store it in a meaningful way and have a mechanism to pull the data back out for analysis and visualization. This is where InfluxDB and Telegraf come in.
InfluxDB is a purpose-built time series database with a lot of powerful features. In particular, the built-in data visualization tools and the powerful Flux data scripting language are appealing additions to an already robust time series database offering. InfluxDB can then be paired with Telegraf, a server-based agent for collecting metrics from various systems. Thanks to a flexible plugin architecture, Telegraf already supports over 300 plug-ins for collecting metrics from different endpoints, including Docker.
Before you begin this tutorial, you’ll need to have Docker installed on your system and configured correctly. You can refer to the official documentation for steps to install Docker on your operating system of choice. If you’re using a Unix-based OS, you can also add your user to the Docker group to let you work with Docker as a non-root user.
Once you have Docker installed and configured, you can verify that it’s working as expected by running the following command:
docker run hello-world
This command should print some output to confirm that things are functioning correctly:
You should also note that this tutorial assumes you are using a Linux-based OS, like Ubuntu, for the purposes of code snippets and commands. There is one section later in the tutorial where a command will fail if you’re using macOS, but rest assured, there is a simple solution if this happens.
Monitoring Docker with InfluxDB and Telegraf
With Docker installed, you can now look to install and configure InfluxDB and Telegraf. While you could install both of these pieces of software directly on your machine, this tutorial will show you how to install and integrate them as Docker containers.
Before creating the containers, you need to create a Docker network so that they can communicate with each other. You can do this with the following command:
docker network create --driver bridge influxdb-net
Now you need to create a new influx-data/ directory. You’ll mount this directory to the Docker volume used by this container, which means that your data will be able to persist even if the container stops running:
mkdir influx-data && cd influx-data
After entering the directory, you can run the following command to pull the influxdb Docker image and create a container using it (with your influx-data/ directory mapped to the container’s volume):
docker run \
--name influxdb \
-p 8086:8086 \
--volume $PWD:/var/lib/influxdb2 \
Running this command will start an instance of InfluxDB in a container, accessible at your host’s IP address on port 8086. Before configuring Telegraf, you need to set up your InfluxDB installation and set some parameters, like your bucket name.
Navigate to port 8086 on your Docker host’s IP address (if you are running Docker locally, try http://127.0.0.1:8086/.) Now, you should see the InfluxDB welcome screen. Click Get started. And you’ll be prompted to create your initial user. Make a note of the Initial Organization Name and Initial Bucket Name you provide here, as you will need them shortly for the Telegraf config:
Once you’ve filled out these details, click Continue, and on the final screen, choose Configure Later. Then you’ll be taken back to the main dashboard. From there, select Load your data, and on the subsequent page, select the Telegraf tab:
On this tab, select + Create Configuration. You’ll be asked what kind of system you want to monitor. Select Docker and click Continue. Give the configuration a semantic name (like “Docker”) and click Create and verify. Then InfluxDB will present you with an API token and a URL that will give you a partial config file for Telegraf:
Take note of this API key, as you will need it shortly.
With this API key, you have the final piece you need to configure your Telegraf container and start logging some metrics. For this tutorial, you don’t need the provided configuration file because one will be provided for you that includes a few extra pieces.
Back in your terminal, if you’re still in the influx-data/ directory, you can leave this directory and create a new one called telegraf-data/. This directory will hold the Telegraf config file that Docker will mount into the container:
# If you’re still in the influx-data/ directory
mkdir telegraf-data && cd telegraf-data
In this directory, create a file called telegraf.conf and set its contents as follows:
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
endpoint = "unix:///var/run/docker.sock"
gather_services = false
container_names = 
source_tag = false
container_name_include = 
container_name_exclude = 
timeout = "5s"
perdevice = true
total = false
docker_label_include = 
docker_label_exclude = 
tag_env = ["JAVA_HOME", "HEAP_SIZE"]
urls = ["http://influxdb:8086"]
token = "$INFLUX_TOKEN"
organization = "influx-docker-demo"
bucket = "influx-docker-demo"
You might need to edit the last block in this file if you used a different organization or bucket name. Similarly, if you gave your InfluxDB container a name other than influxdb, you will need to reflect that change in the value for the urls key.
Once you’ve created this file, you can run the following command from your telegraf-data/ directory to create the container:
docker run -d --name=telegraf \
-v $(pwd)/telegraf.conf:/etc/telegraf/telegraf.conf \
-v /var/run/docker.sock:/var/run/docker.sock \
--user telegraf:$(stat -c '%g' /var/run/docker.sock) \
--env INFLUX_TOKEN="your_api_key" \
Note: This command may fail if you use macOS because the stat command varies slightly between Mac and Linux. If this happens, try substituting the -c flag for an -f flag like so: –user telegraf:$(stat -f ‘%g’ /var/run/docker.sock) </em>.
The previous command does a few important things for it to work correctly. Here it is line by line:
- The first -v flag mounts the config file you just created.
- The second -v flag mounts the host’s Docker socket, which is what allows Telegraf to collect details about your Docker server.
- The –net flag ensures that this container and the previously created InfluxDB container are both on the same network, which is how the config file you created can refer to the InfluxDB container by its DNS name – influxdb.
- The –user flag is necessary because the Telegraf image internally uses a user and group named telegraf, so you must map this to a user who has permission to access the docker.sock file.
- The –env flag specifies an environment variable containing the API key for your InfluxDB instance used in the config file you created.
After running this command, you can check the Docker logs to see if there are any unexpected issues:
docker logs telegraf
If everything is working, you should see some output like this, without errors:
If all looks well, you’re ready to visualize your data in InfluxDB.
If you see errors about permission being denied when trying to access /var/run/docker.sock, you can resolve this error by running the following command, which will alter the permission on the socket in the container, allowing the telegraf user to access it properly:
docker exec -u root -it telegraf /bin/sh -c "chmod 666 /var/run/docker.sock"
Visualizing the data
Navigate back to InfluxDB in your browser, and select Data from the sidebar. Next, select the Buckets tab and then the name of the bucket that you configured Telegraf to use:
This will take you to the Data Explorer page, where you should be able to see the data that Telegraf is feeding into InfluxDB. By default, there are quite a few metrics captured, including CPU and memory usage and networking statistics. You can click through the different filters available at the bottom of the screen to see all the other available data:
For more information about how to derive meaningful visualizations from your data, you can refer to the official documentation to learn more.
As a time series database, InfluxDB is perfectly positioned to store and visualize the kind of metrics that application monitoring often deals with, as there are usually lots of data points at regular intervals. With large volumes of data like this, you must have a mechanism to visualize, search, and understand the data to derive insights. InfluxDB fits the bill in this regard thanks to its easy-to-configure visualizations and the powerful Flux data scripting language that allows you to query and analyze your data.
About the author
Cameron is a full-stack dev living and working in Melbourne. He’s committed himself to the never-ending journey of understanding the intricacies of quality code, developer productivity, and job satisfaction.