Running InfluxDB 2.0 and Telegraf Using Docker

While the Docker buzz has faded a bit, replaced by new words like “Kubernetes” and “Serverless”, there is no arguing that Docker is the default toolchain for developers looking to get started with Linux containers, as it is fairly ubiquitous and tightly integrated with a variety of platforms.

Linux containers are an abstraction built from several pieces of underlying Linux functionality like namespaces and cgroups, which together provide a type of OS-level virtualization; to your applications, it seems like each one of them is running alone on their own copy of the OS. As a result, Docker provides a variety of benefits over running software directly on a host machine; it isolates your applications from the rest of your system, and each other, and makes it easier to deploy applications across a variety of operating systems. In general, it just keeps things clean, tidy and well-partitioned.

If you’re interested in learning more about containers, I highly recommend “What even is a container” by Julia Evans.

Personally, I run Docker on the desktop and deploy as many of my applications there as possible; not only does it keep cruft off my host — I often need to bring up a new copy of the stack to test a particular issue, develop a new feature, or present a demo, which Docker makes exceptionally easy.

Linux containers, and Docker, are great tools that should be part of every developer’s arsenal. So how can we use them to run InfluxDB 2.x?

Table of Contents

The Docker Pieces

You’ll need Docker installed on your local machine. The Docker website has documentation for installation on macOS and Windows. There are also instructions for several varieties of Linux in the Docker CE installation documentation.

Since Docker is an implementation of Linux containers, it requires a Linux OS to run. When you install Docker for Windows or macOS, it will set up and manage a virtual machine with the Linux kernel which will run all of your containers. On macOS, the virtual machine is set up by using HyperKit, while on Windows virtualization is provided by Hyper-V. Unfortunately, Hyper-V is only supported on Windows 10 Enterprise, Professional, and Education; users running Windows Home will need to use the legacy Docker Toolbox.

To run InfluxDB 2.0 and Telegraf in containers, you’ll need:

Running Docker Containers

There are a few ways to interact with Docker and manage your containers. The first is using the docker executable to issue commands like docker run and docker stop.

If you followed the installation instructions for Docker through to the end, you should have already launched the hello-world container using the following command:

$ docker run hello-world

When running more complex software, we often have to provide additional arguments to the docker run command to configure the container environment. This can include exposing ports in the container to the Docker host, or mounting volumes for persistent storage.

Docker Networks

First, though, we’re going to set up a new Docker network. Docker comes with a built-in network that all containers are attached to by default, but creating a new network for our InfluxDB 2.0-r.c and Telegraf deployment will keep them isolated while allowing them to communicate with one another. Isolation is good for a number of things, but this can be especially helpful for bringing up multiple instances of the stack for testing or development. You can read more about the differences between the default network and a user-defined one in the documentation.

We’ll create a new network using the following command:

$ docker network create --driver bridge influxdb-telegraf-net

which creates a new network using the bridge driver, and gives it the name influxdb-telegraf-net.

When we execute our docker run command to start the container, we’ll add the following argument to connect it to our newly created network:

--network influxdb-telegraf-net

We’ll also want to expose a port from the container to the Docker host, so we can communicate with the applications in our containers from the outside world. We’ll use InfluxDB v2 as an example. The database communicates over port 8086 by default. Use publish flag, --publish or -p, and add the following two lines to our docker run command to expose the port:

-p 8086:8086

Persistent Storage

Next up, we’ll need to make some decisions about where we want our configuration files to live, and where we’re going to store container data for those containers that need it.

Continuing to build out our InfluxDB run command, the database expects that it will have access to one file location on Linux, /root/.influxdb2, where it stores its data and configs. It’s possible to customize these locations, but for us the defaults are good enough.

We’re going to mount local folders into those locations in the container by adding the volumes flag, --volume or -v, to docker run. If you wanted, you could let Docker manage a volume for you, but binding a folder on the local machine ensures that we’ll be able to access the data from our host OS.

The bind points on the Docker host will vary based on the machine you’re running on. For the sake of example, we’ll pretend we have a folder in the user’s home directory called influxdb, and two subfolders within that are called data and config. We’ll provide the following arguments to mount those two folders inside the container:

-v  /tmp/testdata/influx:/root/.influxdb2

Putting it all together: running InfluxDB in a container over a network

We’re just about ready to fire up our first container. We just need to add a few more things to our docker run command. We’re going to use the -d flag to start the container in “detached” mode, which runs it in the background, and we’re going to give it a name with the --name parameter. Finally, we’ll run the most recent version of the influxdb container.

docker run -d --name=influxdb \
 -p 8086:8086 \
 -v  /tmp/testdata/influx:/root/.influxdb2 \
      --net=influxdb-telegraf-net \
      influxdb:2.0

After you run the command successfully, you should see a generated Container ID like:

8d868d437d5444c4ce0db04f208939843113db651a3729ce3c924bd11df19d38

Next we’ll run a Telegraf container in a similar fashion. But before we do, we need to configure our InfluxDB instance.

Configuring your InfluxDB 2.0 Instance

The easiest way to configure your InfluxDB 2.0-r.c. instance is through the UI. Visit http://localhost:8086 to complete the setup and gather the necessary authorization credentials. To write data to InfluxDB with Telegraf, you’ll need to:

You can also complete the setup with the InfluxDB CLI. To launch the CLI within the container itself, you need the docker exec command. It enables you to run an interactive session. Provide the -io and -t arguments to create an interactive session and allocate a pseudo TTY, respectively. The influx setup command allows you to set up your InfluxDB instance interactively with the CLI.

Use the following command to execute the influx CLI within our influxdb container:

$ docker exec -it influxdb influx setup

Managing Your Telegraf Configuration

After collecting all the credentials above, we can configure the output portion of our Telegraf configuration. It looks like this:

[[outputs.influxdb_v2]]
 ## The URLs of the InfluxDB cluster nodes.
 ##
 ## Multiple URLs can be specified for a single cluster, only ONE of the
 ## urls will be written to each interval.
 ## urls exp: http://127.0.0.1:8086
 urls = ["http://influxdb:8086"]

 ## Token for authentication.
 token = "$DOCKER_INFLUXDB_INIT_ADMIN_TOKEN"

 ## Organization is the name of the organization you wish to write to; must exist.
 organization = "$DOCKER_INFLUXDB_INIT_ORG"

 ## Destination bucket to write into.
 bucket = "$DOCKER_INFLUXDB_INIT_BUCKET"

Also note that we must change the url from localhost to influxdb, the name of our container that’s running InfluxDB 2.0-r.c. This change will allow our telegraf container to communicate with the influxdb container over the net.

Mounting Telegraf Configuration Files

Next up, we’ll need to make some decisions about where we want our configuration files to live, and where we’re going to store container data for those containers that need it.

Starting to build out our telegraf run command, the agent expects that it will have access to the file locations on Linux, /etc/telegraf/telegraf.conf, where it stores its telegraf config. It’s possible to customize these locations, but for us the defaults are good enough.

We’re going to mount local folders into those locations in the container by adding the volumes flag, --volume or -v, to docker run. If you wanted, you could let Docker manage a volume for you, but binding a folder on the local machine ensures that we’ll be able to access the data from our host OS.

The bind points on the Docker host will vary based on the machine you’re running on. For the sake of example, we’ll pretend we have a Telegraf configuration in the lizzo user’s home directory. We’ll provide the following argument to mount this configuration inside the container running Telegraf:

-v /mytelegrafconfigsdir/telegraf.conf:/var/lib/influxdb

Putting It All Together: Adding A Container Running Telegraf To Our Net

We’re just about ready to fire up our first container. We just need to add a few more things to our docker run command. We’re going to use the -d flag to start the container in “detached” mode, which runs it in the background, and we’re going to give it a name with the –name parameter. Finally, we’ll run the most recent version of the influxdb container.

docker run -d --name=telegraf \
      -v /mytelegrafconfigsdir/telegraf.conf:/var/lib/influxdb \
      --net=influxdb-telegraf-net \
      telegraf

Go ahead and run that at the command line. If it succeeds, it should return a unique hash that identifies the running container like before.

Operating Containers

Now that we have a running container, let’s talk about some additional operational details. The first thing we generally need to check is whether our containers are running. For this, we can use the docker ps command:

$ docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS                          NAMES
e0be3cb0de4b        telegraf                             "/entrypoint.sh tele…"   10 minutes ago      Up 10 minutes       8092/udp, 8125/udp, 8094/tcp   telegraf
8d868d437d54        quay.io/influxdb/influxdb:2.0.0-rc   "/entrypoint.sh infl…"   12 minutes ago      Up 12 minutes       0.0.0.0:8086->8086/tcp         influxdb

Great! Our containers are up and running. But maybe they’re not behaving like we expect? We might want to dig in and take a look at the logs. We can get those using the docker logs command and providing the name of the container. We’ll also add a -f argument. This follow command will continue streaming the new output from the container’s STDOUT and STDERR. Receiving all the Telegraf logs can be very useful during debugging. Additionally, you might benefit from setting debug=true in the agent portion of your Telegraf config if you are having trouble writing data to InfluxDB. The command looks like this:

$ docker logs telegraf -f

Docker Compose

Docker Compose is “a tool for defining multi-container Docker applications”. It lets us bring up multiple containers and connect them together automatically, and we can use it to help us roll out and manage the complete TICK Stack more easily. We’ll start by defining the various services that are part of the TICK Stack within a docker-compose.yml file, then deploy those services using the Compose command line tool.

Below is an example docker-compose.yml (using Compose file format version 3) which defines three services, influxdb, influxdb_cli, and telegraf:

version: '3'
services:
  influxdb:
    image: influxdb:latest
    volumes:
      # Mount for influxdb data directory and configuration
      - /Users/anaisdotis-georgiou/temp/influxdb2:/var/lib/influxdb2:rw
    ports:
      - "8086:8086"
  # Use the influx cli to set up an influxdb instance. 
  influxdb_cli:
    links:
      - influxdb
    image: influxdb:latest
    volumes:
      # Mount for influxdb data directory and configuration
      - /Users/anaisdotis-georgiou/temp/influxdb2:/var/lib/influxdb2:rw
      - ./ssl/influxdb-selfsigned.crt:/etc/ssl/influxdb-selfsigned.crt:rw
      - ./ssl/influxdb-selfsigned.key:/etc/ssl/influxdb-selfsigned.key:rw
    environment: 
       # Use these same configurations parameters in your telegraf configuration, mytelegraf.conf.
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=myusername
      - DOCKER_INFLUXDB_INIT_PASSWORD=passwordpasswordpassword
      - DOCKER_INFLUXDB_INIT_ORG=myorg
      - DOCKER_INFLUXDB_INIT_BUCKET=mybucket
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=mytoken
      - INFLUXD_TLS_CERT=/etc/ssl/influxdb-selfsigned.crt
      - INFLUXD_TLS_KEY=/etc/ssl/influxdb-selfsigned.key
    entrypoint: ["./entrypoint.sh"]
    restart: on-failure:10
    depends_on:
      - influxdb
  telegraf:
    image: telegraf
    links:
      - influxdb
    volumes:
      # Mount for telegraf config
      - ./telegraf/mytelegraf.conf:/etc/telegraf/telegraf.conf
    env_file:
      - ./influxv2.env
    environment: 
      - DOCKER_INFLUXDB_INIT_ORG=myorg
      - DOCKER_INFLUXDB_INIT_BUCKET=mybucket
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=mytoken
    depends_on:
      - influxdb_cli
volumes:
  influxdb2:

To deploy these services, run docker-compose up -d (like docker run, the -d argument starts the containers in headless “detached” mode). docker-compose uses the directory where it is executed to name the various components it manages, so putting your docker-compose.yml file into a well-named directory is a good idea. For this example, we’ll place our docker-compose.yml file in a directory called influxv2.

Running docker-compose up -d will do a number of things: first, it will create a new Docker network named influxv2_default, and then it will bring up a container for each of the services we defined, naming them influxv2_influxdb_1,influxv2_influxdb_cli_1, and influxv2_telegraf_1 respectively.

$ docker-compose up -d
Creating network "influxv2_default" with the default driver
Creating influxv2_influxdb_1 ... done
Creating influxv2_influxdb_cli_1 ... done
Creating influxv2_telegraf_1     ... done

$ docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS                          NAMES
9a61af93d70d        telegraf                           "/entrypoint.sh tele…"   2 minutes ago       Up 2 minutes        8092/udp, 8125/udp, 8094/tcp   influxv2_telegraf_1
04e381afe410        influxdb:2.0   "/entrypoint.sh infl…"   2 minutes ago       Up 2 minutes        0.0.0.0:8086->8086/tcp         influxv2_influxdb_1

You should be able to navigate to the InfluxDB UI by visiting http://localhost:8086  and login by passing in your username and password (myu as defined by the influx setup command in the entrypoint definition in the influxdb_cli service. Specifically, you will use myusername and passwordpasswordpassword for the username and password, respectively. Please keep in mind that your password must be a minimum of 8 characters.

Finally, ensure that your InfluxDB setup parameters match your Telegraf configuration parameters. For example, the output portion of mytelegraf.conf would look like this:

# Output Configuration for telegraf agent
[[outputs.influxdb_v2]]	
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ## urls exp: http://127.0.0.1:8086
  urls = ["http://influxdb:8086"]

  ## Token for authentication.
  token = "$DOCKER_INFLUXDB_INIT_ADMIN_TOKEN"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "$DOCKER_INFLUXDB_INIT_ORG"

  ## Destination bucket to write into.
  bucket = "$DOCKER_INFLUXDB_INIT_BUCKET"

  insecure_skip_verify = true

How to Use AWS Serverless Functions

So far we’ve talked about getting things set up and configured properly on your local machine. But what about when it’s time to upload application code for a production environment? You can use the AWS console to deploy AWS serverless functions

These functions are a part of the AWS serverless application model. They are ideal for time-series applications. Under this model, you are only charged when your code runs. That way you aren’t wasting money running a server that sits idle between requests. These functions are also beneficial for time-series solutions because they automatically scale. Time-series data is often intermittent and prone to large influxes of data at one time. This approach to running code means that the application can scale seamlessly to handle the incoming load. Check out our full tutorial on how to use AWS Lambda functions with InfluxDB.

Wrapping Up

Don’t forget to clean up any containers you created while following along with this blog post! Use docker stop and docker rm commands to do so. Another thing to keep track of while using Docker is the container images themselve. A Docker image isn’t automatically deleted, and so over time as you upgrade the versions of the software you’re using, old containers can accumulate and start eating up disk space. There are some scripts out there which will manage this for you, but for the most part just being aware of the potential issue and cleaning up manually every now and then will be good enough.

The code associated with this blog can be found in this repo.

I hope you find this blog useful. If you have any questions or product feedback, please post them on the community site, Slack channel, or tweet us @InfluxDB. Thanks!

FAQ

What are AWS serverless functions?

Serverless functions are a serverless computing service provided by AWS. It is an event-driven, computing platform that lets you upload code to build serverless applications without the need to manage infrastructure. In a serverless architecture all infrastructure maintenance such as operating system configuration, patching and security updates is handled by the AWS cloud provider. There are several benefits to serverless functions:

Cost Savings: Serverless functions are cost-effective because you are only charged for data processing when an event takes place and the code runs.

Security: Serverless functions are run in containers that isolate their runtime environment from other serverless applications and external sources.

Speed: Serverless functions have high performance because they allow for automatic scaling up or down depending on the application’s traffic.

Simplicity: Instead of worrying about infrastructure management, capacity provisioning and hardware maintenance, teams can use aws services to focus on application design, deployment, and delivery.

Reliability: Serverless functions are less prone to failure. Serverless applications are a series of interconnected services in the cloud that make them naturally redundant.

Scalability: Instead of increasing the total resource allocation for your entire serverless application, you can use a Function-as-a-Service (FaaS) model to scale up individual processes as needed. This flexibility reduces unnecessary costs for data processing and improves overall app efficiency.

Which services are serverless in AWS infrastructure?

  • AWS Lambda Functions: AWS Lambda Functions are a part of the serverless framework. These functions run event-driven serverless applications where compute resources are automatically managed by the cloud provider.
  • Amazon API Gateway: Amazon API Gateway is an AWS service in the serverless framework that lets developers build, test, deploy and monitor APIs.
  • Amazon DynamoDB: Amazon DynamoDB is a part of the serverless stack that is a fully managed NoSQL database service that lets you offload administrative tasks such as setup, configuration, scaling
  • Amazon S3: Amazon Simple Storage Service (S3) is the Amazon Web Services (AWS) storage platform that provides S3 storage for objects.
  • Amazon Kinesis: An Amazon Kinesis is a real-time event based streaming service
  • Amazon Aurora: Amazon Aurora is a fully managed Amazon Relational Database Service (RDS) that handles all management tasks. It provides granular point-in-time recovery.
  • AWS Fargate: AWS Fargate is a serverless compute engine that uses a pay-as-you-go model where compute time is charged on your compute instance only when you are using the service.
  • Amazon SNS: Amazon Simple Notification Service (SNS) is a simple, fast, reliable, highly scalable messaging service

How does the AWS Lambda API work and why is it used with AWS resources?

AWS Lambda is a serverless event based service for running code. Lambda applications automatically manage compute resources for you. It differs from Fargate in the pricing model. Lambda charges per invocation and duration of each event. Fargate, on the other hand, charges for CPU and memory per second.

Are AWS Step Functions serverless?

Yes. Step Functions are serverless. It is an orchestration service that lets you create functions to build serverless workflows for a web application by combining AWS Lambda and other AWS services. Serverless workflows help you monitor each step in the workflow to verify that it runs in the order expected.