How to Parse JSON with Telegraf into InfluxDB Cloud

Navigate to:

In Telegraf 1.19 we released a new JSON parser (json_v2). The original parser suffered from an inflexible configuration, and there were a handful of pretty common cases where data could not be parsed. While a lot of edge cases for parsing can be resolved using the Starlark processor, it is still a more advanced approach that requires writing scripts. We have made a lot of enhancements to the new JSON parser that can help you easily read in your JSON data into InfluxDB.

What is included in the new JSON v2 parser?

The json_v2 parser offers an improved approach to gather metrics from JSON but still supports all the features of the original JSON parser (which will be referred to as json_v1 to avoid confusion).

Here is a sample configuration of what your json_v2 would look like.

[[inputs.file]]
    files = []
    data_format = "json_v2"
    [[inputs.file.json_v2]]
        measurement_name_path = "" 
        timestamp_path = "" 
        timestamp_format = "" 
        [[inputs.file.json_v2.object]]
            path = "" 
            timestamp_key = "" 
            timestamp_format = "" 
            disable_prepend_keys = false
            [inputs.file.json_v2.object.renames] 
                key = "new name"
            [inputs.file.json_v2.object.fields] 
                key = "int"

Independent GJSON queries

A limitation with the json_v1 parser was that you could only provide a single GJSON query, and you could only use the returned data from that query to build line protocol. The json_v2 parser lets you provide multiple independent GJSON queries so that you can gather the timestamp (timestamp_path), measurement name (measurement_name_path), and field/tags from anywhere in the JSON.

More verbose but clearer Telegraf configuration

The configuration no longer follows the format of needing a prepended json_ for the config key, but instead uses TOML sub-tables ([[inputs.file.json_v2.object]]). The goal with this change is to provide an easier to read configuration file that you can clearly see the distinct difference between the plugin config and the parser config. While this does lead to a more verbose config file, the improved ability to write/read the file should make up for this.

More control over the field type

When using json_v1 you were limited to using json_string_fields to specify a field as a string. In json_v2 you can now specify any type that InfluxDB line protocol supports for a field, even converting between types if possible (setting type = "int" to convert all values to integers). Read the section Types in the parser README for more details.

Control over the resulting tag/fields names

The json_v2 parser provides you the ability to rename all tags/fields without requiring a post-processing step like the json_v1 parser. You also have access to the setting disable_prepend_keys

which you can enable to automatically provide cleaner field/tag names that don’t include the parent keys prepended when gathering from nested arrays/objects.

Configuring Telegraf to ingest JSON

Weather example

For the configuration example, we’ll use current weather JSON data from openweathermap.org. You’ll need to sign up and receive an API key to access the data, but current weather would look like the following.

{
   "coord":{
      "lon":-0.1257,
      "lat":51.5085
   },
   "weather":[
      {
         "id":802,
         "main":"Clouds",
         "description":"scattered clouds",
         "icon":"03n"
      }
   ],
   "base":"stations",
   "main":{
      "temp":16.15,
      "feels_like":16.22,
      "temp_min":15.31,
      "temp_max":16.98,
      "pressure":998,
      "humidity":92
   },
   "visibility":10000,
   "wind":{
      "speed":4.12,
      "deg":210
   },
   "clouds":{
      "all":40
   },
   "dt":1628202887,
   "sys":{
      "type":2,
      "id":2019646,
      "country":"GB",
      "sunrise":1628137828,
      "sunset":1628192550
   },
   "timezone":3600,
   "id":2643743,
   "name":"London",
   "cod":200
}

JSON path syntax

The query syntax used in the Telegraf JSON v2 parser is GJSON path syntax. Most of the configuration options in the JSON v2 parser will require you to list a GJSON query path, so it’s really helpful as you’re configuring to use the GJSON playground in developing and debugging your query.

Before you immediately jump into configuring your parser, be sure that you understand the format of your JSON and what you want as your timestamps, measurements, tags, and fields.

1. Set the input data format in your plugin

JSON is currently one of the many supported input data formats for Telegraf. This means that any input plugin containing the data_format option can be set to json_v2 and you can begin parsing JSON data using the json_v2 parser.

data_format = "json_v2"

2. Set measurement name (optional)

The input plugin you are using will most likely already have a default measurement name, very often it will be the name of the plugin. If you would like to override that default name you can hardcode it with the measurement_name option in the parser. If you want your measurement to be set by a value in your JSON document, you can set a GJSON query in the measurement_name_path option. The query must return a single data value or the parser will use the default measurement name. The measurement_name_path option takes precedence over measurement_name if you happen to set both.

3. Determine in your JSON document if you're going to use field and tag subtables or objects (or both!)

4. Set the value you want as your timestamp and its format (optional)

If you have timestamps in your JSON document that you would like to parse, you can use the timestamp_path and timestamp_format options together. You will configure timestamp_path with the GJSON query path to your time value. Then the timestamp_format must be set to unix, unix_ms, unix_us, unix_ns, or the Go “reference time” which is defined to be the specific time: Mon Jan 2 15:04:05 MST 2006.

When parsing times that don’t include a timezone specifier, times are assumed to be UTC. To default to another timezone, or to local time, specify the json_timezone option. This option should be set to a Unix TZ value, such as America/New_York or to Local to utilize the system timezone. If you are using timestamp_timezone, timestamp_format and timestamp_path must be configured.

If none of these timestamp options are configured, by default Telegraf will use the current time for all created metrics.

5. Set the tags you want from your JSON data

To designate the values in your JSON you want as your tags, you will configure a .tag subtable like [inputs.http.json_v2.tag]. In your subtable you will add the path to your GJSON query in the path option. If you define the GJSON path to return a single value then you will get a single resulting tag metric. However, if you define the GJSON path to return an array of values, then each tag will be put into a separate metric (this will work for the fields in the next step as well). As a reminder, tags are always stored as strings no matter what the original data type is in your JSON document.

For our weather data, I will add the city and ID as tags. In the weather JSON, the location field is called name. I want a more descriptive tag name so in my parser I can set rename in the subtable to rename it to city. You can only set one tag per subtable, but if you want multiple tags you can include multiple subtables in your parser.

[[inputs.http.json_v2.tag]]
          path = "name"
          rename = "city"    
      [[inputs.http.json_v2.tag]]
          path = "id"

6. Add the fields you want from your JSON data

You’ll want to add the fields you’ll be reading in from your JSON data. This is very similar to how you would configure your tags by using a .field subtable with the GJSON query to your JSON value set as path. You can easily rename any field names with the rename option and change the data type of a field with type. Data types can be set as float, int, uint, string, or bool. Unlike in the json_v1 parser, there does not need to be a special configuration in order to read in string fields in the new JSON parser.

In our weather example, we could technically read in the entire JSON document but I only want a few values, such as temperature, wind speed, and lat/long (to visualize my data in a map!!). The parser by default removes prepended keys so my wind speed would be read in just as speed so I’m going to rename it to wind_speed. I’ll show you later in this blog how to retain the prepended keys when we’re reading in entire objects.

I’m going to convert the temperature data type from a float to an integer by setting type = "int" in its subtable.

[[inputs.http.json_v2.field]]
          path = "weather.#.description"
      [[inputs.http.json_v2.field]]
          path = "main.temp"
          type = "int"
      [[inputs.http.json_v2.field]]
          path = "wind.speed"
          rename = "wind_speed"
      [[inputs.http.json_v2.field]]
          path = "coord.lon"
      [[inputs.http.json_v2.field]]
          path = "coord.lat"

7. Setting objects you want to parse

Your JSON document will often contain full JSON objects with keys and values that you want to parse. Instead of designating each individual key using the field or tag subtables discussed in the step before, you can use the object subtable to read in all of a portion of your JSON objects. Let’s continue using the weather JSON for this configuration. For now we’ll read in the entire JSON document and show how you can easily select keys to include in your JSON.

a. Set the path of your JSON object

Using the GJSON playground to determine the path to your JSON object is incredibly helpful in making sure your query pulls the entire JSON object you want to read in. Since we want to read in our entire weather document I am going to set path = "@this". If we wanted to parse just the generic weather statistics under the object main, we would set path = "main" and it reads in the following data.

{
"temp": 16.15,
"feels_like": 16.22,
"temp_min": 15.31,
"temp_max": 16.98,
"pressure": 998,
"humidity": 92
}

However, for the outline of the rest of our examples we’re going to follow reading in the entire JSON document with @this.

b. Set your timestamp and its format and timezone as outlined in Step 4 above

If there is a timestamp value in your JSON object that you would like to have set as your timestamp, you need to set the timestamp parameters under the object subtable. They follow the same rules as the global timestamp settings discussed above.

c. Disable or enable prepended keys from your nested data

This one setting is a pretty exciting new feature to the JSON v2 parser that can help clean up your data in the right scenario. You can now easily remove parent keys attached to a nested key name by setting disable_prepend_keys to true. In the weather data, if this setting is set to false our field names would be coord_lat, coord_long, main_temp. But by setting this to true, we have cleaner names lat, long, and temp that still give us the understanding of the value.

It is important to note that if this is enabled, duplicate names will overwrite each other. For example, you’ll often have different nested keys titled name. If this setting is enabled and you are reading in multiple name keys without the parent keys, they will not all be included. So it is important for you to look at the keys of your JSON before you enable this setting. This setting is defaulted to false for that reason.

d. Define the list of keys you would like to include or exclude from your JSON object

When you’re dealing with a large JSON object to parse and there are only a few fields you want to include or on the other hand exclude, you can easily do that with included_keys = [] or excluded_keys = [] settings. I’m going to keep my configuration simple so I’m going to just grab latitude, longitude, temperature, and wind speed from the object with the following configuration.

included_keys = ["coord_lat", "coord_lon", "main_temp", "wind_speed"]

e. Define tags from your JSON objects

You will also want to define the keys in your JSON that you wish to become tags. Tags are always strings, so no matter what value they are originally in your JSON, they will be stored as strings. Tags are normally metadata / informative type of data so for my weather data I’m going to set the city name and ID as tags.

tags = ["name", "id"]

Weather example configurations

Using field + tags subtables

[[inputs.http]]
  urls = [
   "https://api.openweathermap.org/data/2.5/weather?q=london&appid=$API_KEY&units=metric"
  ]
  tagexclude = ["url", "host"]
  data_format = "json_v2"
  [[inputs.http.json_v2]]
      measurement_name = "openweather"
      [[inputs.http.json_v2.tag]]
          path = "name"
          rename = "city"    
      [[inputs.http.json_v2.tag]]
          path = "id"         
      [[inputs.http.json_v2.field]]
          path = "weather.#.main"
          rename = "summary"
      [[inputs.http.json_v2.field]]
          path = "main.temp"
          type = "int"
      [[inputs.http.json_v2.field]]
          path = "wind.speed"
          rename = "wind_speed"
      [[inputs.http.json_v2.field]]
          path = "coord.lon"
      [[inputs.http.json_v2.field]]
          path = "coord.lat"

Output:

openweather,city=London,id=2.643743e+06 description="moderate rain",lat=51.5085,lon=-0.1257,temp=13i,wind_speed=4.63 1632859197000000000

Using field + objects subtables

[[inputs.http]]
  urls = [
   "https://api.openweathermap.org/data/2.5/weather?q=london&appid=$API_KEY&units=metric"
  ]
  tagexclude = ["url", "host"]
  data_format = "json_v2"
  [[inputs.http.json_v2]]
      measurement_name = "openweather"    
      [[inputs.http.json_v2.field]]
          path = "weather.#.description"
      [[inputs.http.json_v2.field]]
          path = "weather.#.description"
      [[inputs.http.json_v2.object]]
            path = "@this"
            disable_prepend_keys = true
            included_keys = ["coord_lat", "coord_lon", "main_temp", "wind_speed"] 
            tags = ["name", "id"]

Output:

openweather,id=2.643743e+06,name=London description="moderate rain",lat=51.5085,lon=-0.1257,speed=4.63,temp=13.68 1632859241000000000

Migrating your configurations to from JSON v1 to v2

We do not plan to deprecate the json_v1 parser in any Telegraf 1.x versions. However, any enhancements made to JSON parsing in Telegraf will only be made in the json_v2 parser.

The json_v2 parser offers an improved approach to gather metrics from JSON but still supports all the features of the original JSON parser (which will be referred to as json_v1 to avoid confusion). Therefore migration should be possible for all configurations, and in the process hopefully improve on the old configuration.

Here is an example on how an old configuration using the Baywheels bicycle shares station status data with a snippet of the JSON.

{
    "data": {
        "stations": [
            {
                "last_reported": 1626817316,
                "num_ebikes_available": 3,
                "station_id": "549",
                "is_returning": 1,
                "eightd_has_available_keys": false,
                "is_renting": 1,
                "num_bikes_disabled": 0,
                "legacy_id": "549",
                "num_docks_disabled": 0,
                "station_status": "active",
                "num_docks_available": 9,
                "num_bikes_available": 26,
                "is_installed": 1
            }
        ]
    },
    "last_updated": 1626818120,
    "ttl": 5
}

JSON v1:

[[inputs.http]]
  urls = ["https://gbfs.baywheels.com/gbfs/en/station_status.json"]
  data_format = "json"
  json_query = "data.stations"
  tag_keys = ["station_id"]
  json_string_fields = ["eightd_has_available_keys","legacy_id", "station_status"]
  json_time_key = "last_reported"
  json_time_format = "unix"

JSON v2:

[[inputs.http]]
    urls = ["https://gbfs.baywheels.com/gbfs/en/station_status.json"]
    data_format = "json_v2"
        [[inputs.http.json_v2]]
            [[inputs.http.json_v2.object]]
                path = "data.stations"
                tags = ["station_id"]
                timestamp_key = "last_reported"
                timestamp_format = "unix"

Both configurations will lead to this output:

http,host=MBP15-SWANG.local,station_id=549,url=https://gbfs.baywheels.com/gbfs/en/station_status.json eightd_has_available_keys=false,is_installed=1,is_renting=1,is_returning=1,legacy_id="549",num_bikes_available=26,num_bikes_disabled=0,num_docks_available=9,num_docks_disabled=0,num_ebikes_available=3,station_status="active" 1626817316000000000

More examples!

There are an endless number of examples of JSON formats you may encounter. Here are some of the examples we specifically wanted to address with the new JSON parser. Be sure to check out the full list of test data examples here.

Nested tags

JSON hierarchy in general would cause a lot of issues in the old parser as it would be difficult to retain tags while keeping the JSON hierarchy. Users would often have to flatten their JSON data in order to get the parser to work. With the new json_v2 parser, each array can be read in properly.

JSON input:

{
  "Group A": [
    {
      "Sub-group 1": [
        {
          "Count": 0,
          "Errors": 0,
          "Serial": "9JHNGTUT",
          "Model": "WDC WUH721414ALE604",
          "Firmware": "LDGSW07G"
        }
      ]
    }
  ],
  "Group B": [
    {
      "Sub-group 1": [
        {
          "Count": 0,
          "Errors": 0,
          "Serial": "9JHLPW9T",
          "Model": "WDC WUH721414ALE604",
          "Firmware": "LDGSW07G"
        }
      ]
    }
  ]
}

Configuration:

[[inputs.http]]
    urls = ["https://gist.githubusercontent.com/sjwang90/6c3b066656aa3cece50b4642a4cfef88/raw/3b591c7d3c0bd85687ea90ed920c091178d47ee5/hierarchy-tags.json"]
    tagexclude = ["url", "host"]
    data_format = "json_v2"
    [[inputs.http.json_v2]]
        [[inputs.http.json_v2.object]]
            path = "Group A.#.Sub-group 1"
            tags = ["Firmware", "Model", "Serial"]
        [[inputs.http.json_v2.object]]
            path = "Group B.#.Sub-group 1"
            tags = ["Firmware", "Model", "Serial"]

Output:

http,Firmware=LDGSW07G,Model=WDC\ WUH721414ALE604,Serial=9JHNGTUT Count=0,Errors=0 1633389803000000000
http,Firmware=LDGSW07G,Model=WDC\ WUH721414ALE604,Serial=9JHLPW9T Count=0,Errors=0 1633389803000000000

Timestamp outside of nested array

Timestamps that were not outside of an array had caused problems in the JSON v1 parser. In the new JSON v2 parser, timestamps can be run as an independent query. This format of JSON can be common in IoT environments where an event reports from multiple devices at a specific time. To keep this blog and these examples exciting, I used this same structure for Britney Spears’ Spotify streams.

JSON input:

{
"timestamp": 1626449400,
"spotify": [
{
"album": "Oops...I Did It Again",
"song": "Stronger",
"single": true,
"streams": 87467784
},
{
"album": "Oops...I Did It Again",
"song": "Dear Diary",
"single": false,
"streams": 2243088
},
{
"album": "The Singles Collection",
"song": 3,
"single": true,
"streams": 67916609
}
]
}

Configuration:

[[inputs.http]]
    urls = ["https://gist.githubusercontent.com/sjwang90/a7b370eb8cdbd59970e4d5eccdd3438f/raw/3bd22befd0b1c5d0eb398c8391ec29eff2621630/britney.json"]
    tagexclude = ["url", "host"]
    data_format = "json_v2"
        [[inputs.http.json_v2]]
            measurement_name = "britney"
            timestamp_path = "timestamp"
            timestamp_format = "unix"
            [[inputs.http.json_v2.object]]
                path = "spotify" 
                tags = ["album, song"]

Output:

britney album="Oops...I Did It Again",single=true,song="Stronger",streams=87467784 1626449400000000000
britney album="Oops...I Did It Again",single=false,song="Dear Diary",streams=2243088 1626449400000000000
britney album="The Singles Collection",single=true,song=3,streams=67916609 1626449400000000000

Try it out!

Use the new JSON parser to start sending your data into InfluxDB Cloud. Please reach out with any questions in the #telegraf channel of our InfluxData Community Slack or post any questions on our Community Site.