Using A Telegraf Gateway

By Rick Brown / Use Cases, Product, Developer
Nov 18, 2019

Navigate to:

If you meet InfluxData at a trade show, or we provide you with a demo, you might be shown the dashboard called “Rick’s House”. I’m that “Rick”, and this is how I’m sending my data to InfluxDB Cloud, which currently looks like this:

Sending data to InfluxDB Cloud

My home network contains numerous servers (many of them LXC containers in Proxmox), lots of software components (Node-Red, MQTT, MariaDB, etc), and a variety of smart-home sensors (SmartThings, Xiaomi, IKEA). It’s grown over many years, and I now find that I want to monitor the whole set-up in one place, because without monitoring everything I can’t visualize what’s happening, and without visualizing what’s happening, it’s difficult to understand what automations to execute or what maintenance I might need to do. It’s been a long-running vanity project, and I discovered InfluxDB during my journey. It’s partly because I found my “time to awesome” was so fast when implementing the software that I applied to join InfluxData!

So, I want to visualize my home network from data stored in InfluxDB. I want everything to write directly into my local instance of InfluxDB OSS, but I also want to be able to write to remote instances, such as InfluxDB Cloud. I also want this to be usable for larger use cases than mine. To do this, there are some restrictions:

InfluxDB Cloud can only accept HTTP ingest. Any UDP messages (Proxmox writes UDP to an InfluxDB endpoint) will be dropped before they get through the Internet.
Usernames and passwords might change, so I will need to edit every Telegraf configuration file on every machine on my LAN.
Access tokens might change over time (especially as I migrate to InfluxDB Cloud v2), so I will need to edit every Telegraf configuration file on every machine on my LAN.
I've had instances in corporate networks where I had to go through long security processes to open an outgoing TCP port from a host in a data centre. If I need to do this for many hosts in order to get Telegraf data sent to a remote InfluxDB, the elapsed time for security clearance would be correspondingly longer.
I want to be efficient sending data to remote instances of InfluxDB. This means that I should look at concepts like batching my outputs. If I write to InfluxDB with a Telegraf Gateway, batching and associated settings can be configured in that collator, and those settings are used for all writes. If I write directly to InfluxDB from client libraries, each one might have a different mechanism for batching, so I'll need to configure it in multiple places.

If I can collate all my updates into one place, all the administration of the points above goes away.

Therefore, implementing an interstitial instance of Telegraf a Telegraf Gateway is a good solution for these use cases. To create it, I’ll need to do some things:

1. Install a container for Telegraf
2. Enable Telegraf inputs for UDP & HTTP
3. Enable Telegraf outputs for wherever I'll be writing my data

So, I need to create a configuration for my Telegraf gateway. Telegraf reads default configurations from /etc/telegraf/telegraf.conf and /etc/telegraf/telegraf.d/.conf, so let’s create these.

Firstly, here is my /etc/telegraf/telegraf.conf for global settings

[[global_tags]]

[[agent]]
  interval = "10s"
  round_interval = true

  metric_batch_size = 1000   
  metric_buffer_limit = 10000

  collection_jitter = "3s"
  flush_interval = "10s"
  flush_jitter = "5s"

  precision = ""
   
  debug = false
  quiet = false
   
  logfile = "/var/log/telegraf/telegraf.log"   
  logfile_rotation_interval = "0d"
  logfile_rotation_max_size = "1MB"
  logfile_rotation_max_archives = 5

  hostname = ""

Now some specific config files for providing endpoints for Telegraf to act as an InfluxDB server:

/etc/telegraf/telegraf.d/socket_listener.conf

[[inputs.socket_listener]]
  service_address = "udp://:8089"

/etc/telegraf/telegraf.d/influxdb_listener.conf

# Influx HTTP write listener
[[inputs.influxdb_listener]]
  service_address = ":8086"

  read_timeout = "10s"
  write_timeout = "10s"

  max_body_size = "500MiB"
  max_line_size = "64KiB"

  database_tag = "bucket_name"

Note the use of database_tag = "bucket_name" in the config above. When I write directly to an InfluxDB HTTP endpoint using its URL, which I do from Node-Red, I select a database to write to. I don’t want to lose this information, so Telegraf stores it in a tag called bucket_name. Of course, I don’t want that sent as a tag to InfluxDB, because it’ll increase my cardinality for no purpose, so the output configuration will need to remove it. Telegraf does this by the configuration line exclude_database_tag = true. Here’s the output configuration:

/etc/telegraf/telegraf.d/output_influxdb.conf

# Local InfluxDB
[[outputs.influxdb]]
  urls = ["http://ip_address_of_local_InfluxDB_server:8086"]
  database_tag = "bucket_name"
  exclude_database_tag = true
   
# SE Cloud
[[outputs.influxdb]]
  urls = ["https://FQDN_of_Influx_Cloud:8086"]
  database = "database_name_to_write_to"
  username = "my_username"
  password = "my_password"   
  timeout = "30s"

# Cloud 2 instance
[[outputs.influxdb_v2]]
  urls = ["https://FQDN_of_Influx_Cloud_2"]
  token = "The_Token_Generated_within_Influx_Cloud_2"   
  organization = "my_registered_email_address_on_Influx_Cloud_2"   
  bucket = "my_bucket_to_write_to"

At this point, this gives me a working Telegraf gateway. However, it also gives me an extra thing to monitor on my LAN, so capturing internal Telegraf metrics seems like the thing to do. As this is a LXC container, and I want to monitor all my containers in the same way, I should add my usual set of OS monitoring configurations. I shouldn’t enable SMART monitoring or temperature monitoring, as those should come from the underlying server on which I’m running the container. Additionally, if this was a Windows machine rather than a LXC container, I would want to replace “processes” and “system” with “win_processes” and “win_perf_counters”. These are the defaults that I try and use with every server I include in my dashboards:

/etc/telegraf/telegraf.d/cpu.conf

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false

/etc/telegraf/telegraf.d/disk.conf

[[inputs.disk]]  
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

[[inputs.diskio]]

/etc/telegraf/telegraf.d/internal.conf

[[inputs.internal]]  
  collect_memstats = true

/etc/telegraf/telegraf.d/kernel.conf

[[inputs.kernel]]  
  # no configuration

/etc/telegraf/telegraf.d/mem.conf

[[inputs.mem]]  
  # no configuration

/etc/telegraf/telegraf.d/net.conf

[[inputs.net]]

/etc/telegraf/telegraf.d/swap.conf

[[inputs.swap]]  
  # no configuration

The final two sections are for capturing Linux metrics - see the equivalents mentioned above for Windows metrics.

/etc/telegraf/telegraf.d/processes.conf

[[inputs.processes]]  
  # no configuration

/etc/telegraf/telegraf.d/system.conf

[[inputs.system]]  
  ## Uncomment to remove deprecated metrics.  
  # fielddrop = ["uptime_format"]

So now Telegraf will write to all three of my outputs, collating data from all three of the types of input I commonly use (Telegraf, HTTP custom endpoint, UDP).

This is what my Telegraf gateway looks like, showing a system view:

Telegraf gateway

I’ll be running my Telegraf implementation like this for a while, to see if this meets my needs as-is, or if it needs extra configuration. If you deploy Telegraf like this, or use alternative ways to get your data through Telegraf into InfluxDB, let me know - I’m interested in Telegraf deployment patterns!

Navigate to:

Try InfluxDB Cloud

Stop flying blind

Using A Telegraf Gateway

By Rick Brown / Use Cases, Product, Developer
Nov 18, 2019

Navigate to:

Try InfluxDB Cloud

Learn more about InfluxDB

Performance Benchmarking: InfluxDB 3.0 vs. InfluxDB Open Source

InfluxDB for Industrial IoT:  
A Live Demonstration

How Time Series Databases and Data Lakes Work Together

Data Warehousing

Network Monitoring

Time Series Data Analysis: Definitions and Best Techniques in 2024

Navigate to:

Try InfluxDB Cloud

Stop flying blind

Get Updates

Using A Telegraf Gateway

By Rick Brown / Use Cases, Product, Developer Nov 18, 2019

Navigate to:

Try InfluxDB Cloud

Learn more about InfluxDB

Performance Benchmarking: InfluxDB 3.0 vs. InfluxDB Open Source

InfluxDB for Industrial IoT: A Live Demonstration

How Time Series Databases and Data Lakes Work Together

Data Warehousing

Network Monitoring

Time Series Data Analysis: Definitions and Best Techniques in 2024

By Rick Brown / Use Cases, Product, Developer
Nov 18, 2019

InfluxDB for Industrial IoT:  
A Live Demonstration