How to Parse Your XML Data with Telegraf
By Samantha Wang / Apr 14, 2021 / Community, Telegraf, Developer
In March, we released Telegraf 1.18, which included a wide range of new input and output plugins. One exciting new addition was an XML Parser Plugin that added support for another input data format to parse into InfluxDB metrics.
What is XML?
XML stands for eXtensible Markup Language and is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
XML is similar to HTML in being a markup language but is designed to be self-descriptive and to better store and transport data. For example, when you are trying to exchange data between incompatible systems and data needs to be converted, any data that is incompatible can be lost. XML aims to simplify that data sharing and transportation since it is stored in plain text format. This provides a software- and hardware-independent way of storing, transporting and sharing data.
Understanding your XML data
We will use the terms root, child, sub-child throughout this blog to help you understand which data points you’re trying to parse.
<root> <child> <subchild>.....</subchild> </child> </root>
XML documents must contain exactly one root element that is the parent of all other elements.
This XML weather example from OpenWeather is a good basic example to help us understand XML data structure and how to parse it.
<current> <city id="5004223" name="Oakland"> <coord lon="-83.3999" lat="42.6667" /> <country>US</country> <timezone>-14400</timezone> <sun rise="2021-03-24T11:29:19" set="2021-03-24T23:50:05" /> </city> <temperature value="62.26" min="61" max="64.4" unit="fahrenheit" /> <feels_like value="54.63" unit="fahrenheit" /> <humidity value="59" unit="%" /> <pressure value="1007" unit="hPa" /> <wind> <speed value="12.66" unit="mph" name="Moderate breeze" /> <gusts value="24.16" /> <direction value="200" code="SSW" name="South-southwest" /> </wind> <clouds value="75" name="broken clouds" /> <visibility value="10000" /> <precipitation mode="no" /> <weather number="803" value="broken clouds" icon="04d" /> <lastupdate value="2021-03-24T16:15:35" /> </current>
In our weather data,
current is the root element with
wind and the other fields at their same level as its child elements.
An XML element is everything including the start tag
<element> to the element’s end tag
</element>. Some tags can close themselves, as in
<coord />. Elements themselves can contain:
- Text -
- Attributes -
<coord lon="-83.3999" lat="42.6667"/>
- Attributes are designed to contain data related to a specific element. This will be especially important when we are parsing our data values. They can be emitted in a way that comes off a little strange but are still valid, such as
- Attributes are designed to contain data related to a specific element. This will be especially important when we are parsing our data values. They can be emitted in a way that comes off a little strange but are still valid, such as
- Child elements -
<coord>are other elements in the
The relationships between elements are described by the terms parent, child, and sibling.
What is XPath?
The Telegraf XML Parser breaks down an XML string into metric fields using XPath expressions and supports most XPath 1.0 functionality. The parser will use XPath syntax to identify and navigate XPath nodes in your XML data. XPath supports over 200 functions, and the functions supported by Telegraf XML Parser are listed in the underlying library repository.
Note: Usually XPath expressions select a node or a node-set and you have call functions like
number() to access the node’s content. However, when we discuss the Telegraf XML Parser Plugin in more detail below, you’ll see that it handles this in the following way for convenience: both
field_selection only select the node or node-set, so they are normal XPath expressions. However, all other queries will return the node’s “string-value” according to the XPath specification. You can convert the types using functions as shown below.
I found this XPath tutorial particularly helpful in understanding XPath terminology and expressions. There is also this XPath cheat sheet that gives you a one page view of using XPath selectors, expressions, functions and more.
Before parsing any data, take a look at your XML and understand the nodes and node-sets of the data you want to parse. This XPath tester will come in really handy in testing out XPath functions and making sure you are querying the correct path to parse specific XML nodes.
|current||Selects the child node(s) with the name of
|/current||Selects the root element
|current/city||Selects all city elements that are children of
|current//@name||Selects ALL attributes named
|current/city/@name Or //city/@name||Selects attributes named
|current/city/*||Selects all the child element nodes under the
|current/city/@*||Selects all attributes in the
W3Schools provides an extensive list of XPath syntax and dives deep into XPath axes with additional examples.
Configuring Telegraf to ingest XML
XML is currently one of the many supported input data formats for Telegraf. This means that any input plugin containing the
data_format option can be set to
xml and begin parsing your XML data, like this:
data_format = "xml"
Let’s discuss how to get your configuration just right to get that XML data into InfluxDB. As mentioned above, the XML parser breaks down an XML string into metric fields using XPath expressions. XPath expressions are what the parser uses to identify and navigate nodes in your XML data.
Here is the plugin’s default configuration for using the XML parser. As with other Telegraf configs, commented lines start with a pound sign (
[[inputs.tail]] files = ["example.xml"] ## Data format to consume. ## Each data format has its own unique set of configuration options, read ## more about them here: ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md data_format = "xml" ## Multiple parsing sections are allowed [[inputs.tail.xml]] ## Optional: XPath-query to select a subset of nodes from the XML document. #metric_selection = "/Bus/child::Sensor" ## Optional: XPath-query to set the metric (measurement) name. #metric_name = "string('example')" ## Optional: Query to extract metric timestamp. ## If not specified the time of execution is used. #timestamp = "/Gateway/Timestamp" ## Optional: Format of the timestamp determined by the query above. ## This can be any of "unix", "unix_ms", "unix_us", "unix_ns" or a valid Golang ## time format. If not specified, a "unix" timestamp (in seconds) is expected. #timestamp_format = "2006-01-02T15:04:05Z" ## Tag definitions using the given XPath queries. [inputs.tail.xml.tags] name = "substring-after(Sensor/@name, ' ')" device = "string('the ultimate sensor')" ## Integer field definitions using XPath queries. [inputs.tail.xml.fields_int] consumers = "Variable/@consumers" ## Non-integer field definitions using XPath queries. ## The field type is defined using XPath expressions such as number(), boolean() or string(). If no conversion is performed the field will be of type string. [inputs.tail.xml.fields] temperature = "number(Variable/@temperature)" power = "number(Variable/@power)" frequency = "number(Variable/@frequency)" ok = "Mode != 'ok'"
Let’s walk through all the steps and components that will make up your XML parser configuration. Whenever you are setting up an XPath query in your configuration, the specified path can be absolute (starting with
/) or relative. Relative paths use the currently selected node as reference.
- Select subset of nodes you want to parse (optional)
If you wish to parse only a subset of your XML data, you will use the
metric_selectionfield to designate which part. In our weather example, say we only wanted to parse the data under the
windelement, we would set this to
current//wind. Let's go ahead and actually read the entire weather XML document, so I'm going to set my
metric_selection = "/current". There will be one metric per node selected by
metric_selection. A benefit of setting this field is that in subsequent configuration fields, I won't want to add
"current/"to my query's pathname.
- Set measurement name (optional)
You can override the default measurement name (which will most likely be the plugin name) by setting the
metric_namefield. I'm going to set
metric_name = "'weather'"to change the measurement name from
weather. You can also set the XPath query for
metric_nameto derive the measurement name directly from a node in the XML document.
- Set the value you want as your timestamp and its format (optional)
If your XML data contains a specific timestamp you want to assign to your metrics, you will need to set the XPath query of that value. Our weather data has a
lastupdatevalue that indicates the exact time this weather data was recorded. I'll set
timestamp = "lastupdate/@value"to read in that value as my timestamp. If the
timestampfield isn't set, the current time will be used as the timestamp for all created metrics.From there, you can designate the format of the timestamp you just selected. This
timestamp_formatcan be set to
unix_ns, or an accepted Go "reference time". If
timestamp_formatisn't configured, Telegraf will assume your
timestampquery is in
- Set the tags you want from your XML data
To designate the values in your XML you want as your tags, you will need to configure a tags subsection
[inputs.http.xml.tags]. In your subsection you will add a line for each tag in
tag-name = queryformat with the XPath query. For our weather data, I will add the city and country names as tags with
city = "city/@name"and
country = "city/country". Multiple tags can be set under one subsection.
- Configure the fields of integer type you want from your XML data
For your XML data values that are integers that you want to read in as fields, you must configure the field names and XPath queries in a
fields_intsubsection such as
[inputs.tail.xml.fields_int]. This is because XML values are limited to a single type, string, so all your data will be of type string if not converted by an XPath function. This will follow the
field_name = queryformat. In our weather data, values such as humidity and clouds are always integers so we will configure them in this subsection. Results of these field_int-queries will always be converted to int64.
[inputs.http.xml.fields_int] humidity = "humidity/@value" clouds = "clouds/@value"
- Configure the rest of your fields. Be sure to indicate the data type in the XPath function.
To add non-integer fields to the metrics, you will add the proper XPath query in a general fields subsection (ex:
[inputs.http.xml.fields]) in the
field_name = queryformat. It's crucial here to specify the data type of the field in your XPath query using the type conversion functions of XPath such as
string(). If no conversion is performed in the query, the field will be of type string. In our weather data we have a combination of number and string values. For example, our wind speed is a number and will be specified as
wind_speed = "number(wind/speed/@value)"whereas the wind description is text and will be formatted as a string in
wind_desc = "string(wind/speed/@name)".
- Select a set of nodes from your XML data you want to parse as fields (optional)
If you have a large XML file with a large number of fields that would otherwise need to be individually configured, you can select a subset of them by configuring
field_selectionwith an XPath query to the selection of nodes. This setting will also be commonly used if the node names are not yet known (ex: value of precipitation is not populated unless it's actively raining). Each node that is selected by
field_selectionforms a new field within the metric.You can set the name and value of each field by using the optional
field_valueXPath queries. If these queries are not specified, the field's name defaults to the node name and the field's value defaults to the content of the selected field node. It is important to note that
field_valuequeries are only used if
field_selectionis specified. You can also use these settings in combination with the other field specification subsections.Based on the multi-node London bicycle example below, to retrieve all the attributes in the
field_selectionsettings would be configured as
field_selection = "child::info" field_name = "name(@*)" field_value = "number(@*)"
- Expand field names to a path relative to the selected node (optional)
If you want your field names that have been selected with
field_selectionto be expanded to a path relative to the selected node, you will need to set
field_name_expansion = true. This settings allows you to flatten out nodes with non-unique names in the subtree. This would be necessary if we selected all leaf nodes as fields and those leaf nodes did not have unique names. If
field_name_expansionwasn't set, we would end up with duplicate names in the fields.
Basic Parsing example: OpenWeather XML data
I have been referencing the OpenWeatherMap XML API response so far in this blog when explanationing XML concepts and steps on configuring your XML parser. This configuration should help you understand how to parse somewhat simple XML data with Telegraf. There is also a 5 day OpenWeather forecast test case in the plugin’s testcases folder.
You can sign up for a free API key to retrieve this XML data over HTTP. Once you have your API key (this may take a few hours after signing up), you can set your URL to specify the location(s) of your weather. My configuration below retrieves Oakland, New York, and London current weather data in imperial units (blame us Americans not knowing the metric system :)). If you want to test the example below make sure you set your
API_KEY as an environment variable to be read by the Telegraf config.
[[inputs.http]] ## OpenWeatherMap API, need to register for $API_KEY: https://openweathermap.org/api urls = [ "http://api.openweathermap.org/data/2.5/weather?q=Oakland&appid=$API_KEY&mode=xml&units=imperial", "http://api.openweathermap.org/data/2.5/weather?q=New%20York&appid=$API_KEY&mode=xml&units=imperial", "http://api.openweathermap.org/data/2.5/weather?q=London&appid=$API_KEY&mode=xml&units=imperial" ] data_format = "xml" ## Drop url and hostname from list of tags tagexclude = ["url", "host"] ## Multiple parsing sections are allowed [[inputs.http.xml]] ## Optional: XPath-query to select a subset of nodes from the XML document. metric_name = "'weather'" ## Optional: XPath-query to set the metric (measurement) name. metric_selection = "/current" ## Optional: Query to extract metric timestamp. ## If not specified the time of execution is used. timestamp = "lastupdate/@value" ## Optional: Format of the timestamp determined by the query above. ## This can be any of "unix", "unix_ms", "unix_us", "unix_ns" or a valid Golang ## time format. If not specified, a "unix" timestamp (in seconds) is expected. timestamp_format = "2006-01-02T15:04:05" ## Tag definitions using the given XPath queries. [inputs.http.xml.tags] city = "city/@name" country = "city/country" ## Integer field definitions using XPath queries. [inputs.http.xml.fields_int] humidity = "humidity/@value" clouds = "clouds/@value" ## Non-integer field definitions using XPath queries. ## The field type is defined using XPath expressions such as number(), boolean() or string(). If no conversion is performed the field will be of type string. [inputs.http.xml.fields] temperature = "number(/temperature/@value)" precipitation = "number(precipitation/@value)" wind_speed = "number(wind/speed/@value)" wind_desc = "string(wind/speed/@name)" clouds_desc = "string(clouds/@name)" lat = "number(city/coord/@lat)" lon = "number(city/coord/@lon)" ## If "precipitation/@mode" value returns "no", is_it_raining will return false is_it_raining = "precipitation/@mode = 'yes'"
Most of the settings for this weather configuration are explained above. The last field for
is_it_raining displays how you can use an XPath operator in your configuration to return a node-set, a string, a Boolean, or a number:
is_it_raining = "precipitation/@mode = 'yes'"
weather,city=New\ York,country=US clouds=1i,clouds_desc="clear sky",humidity=38i,is_it_raining=false,lat=40.7143,lon=-74.006,precipitation=0,temperature=58.15,wind_desc="Gentle Breeze",wind_speed=8.05 1617128228000000000 weather,city=London,country=GB clouds=0i,clouds_desc="clear sky",humidity=24i,is_it_raining=false,lat=51.5085,lon=-0.1257,precipitation=0,temperature=66.56,wind_desc="Light breeze",wind_speed=5.75 1617128914000000000 weather,city=Oakland,country=US clouds=90i,clouds_desc="overcast clouds",humidity=34i,is_it_raining=false,lat=42.6667,lon=-83.3999,precipitation=0,temperature=64.54,wind_desc="Moderate breeze",wind_speed=17.27 1617128758000000000
Multi-node selection example: COVID-19 Vaccine Distribution Allocations by Jurisdiction
Your XML data will commonly contain similar metrics for multiple sections (each section could be a different device; in this example, each section represents a different jurisdiction). You can use the XML Parser for multi-node selection to generate metrics for each chunk of data.
Considering this blog is being written during spring 2021, there is plenty of COVID-19 data out there. To stay somewhat optimistic, let’s take a look at some COVID-19 vaccine XML data provided from the Center of Disease Control (CDC). The CDC provides weekly allocation of vaccines by jurisdiction. There is an HTTP XML file for each vaccine manufacturer: Moderna, Pfizer or Janssen/Johnson & Johnson. Each vaccine has its own personality type too!
This COVID vaccine XML data will be a good example on how to do multi-node selection with the XML parser.
<response> <row> <row _id="row-vuan~mg8h_vwjk" _uuid="00000000-0000-0000-9614-D811B3DD0141" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-vuan~mg8h_vwjk"> <jurisdiction>Connecticut</jurisdiction> <week_of_allocations>2021-04-05T00:00:00</week_of_allocations> <_1st_dose_allocations>50310</_1st_dose_allocations> <_2nd_dose_allocations>50310</_2nd_dose_allocations> </row> <row _id="row-suay.uwx5_hiiz" _uuid="00000000-0000-0000-C448-E7F5D3B8E3CA" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-suay.uwx5_hiiz"> <jurisdiction>Maine</jurisdiction> <week_of_allocations>2021-04-05T00:00:00</week_of_allocations> <_1st_dose_allocations>19890</_1st_dose_allocations> <_2nd_dose_allocations>19890</_2nd_dose_allocations> </row> <row _id="row-dhdq_gsf8~rzrd" _uuid="00000000-0000-0000-6882-622E1430CDFA" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-dhdq_gsf8~rzrd"> <jurisdiction>Massachusetts</jurisdiction> <week_of_allocations>2021-04-05T00:00:00</week_of_allocations> <_1st_dose_allocations>95940</_1st_dose_allocations> <_2nd_dose_allocations>95940</_2nd_dose_allocations> </row> <row _id="row-jehx-8sxy_8dma" _uuid="00000000-0000-0000-56CD-DCA4760B56BC" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-jehx-8sxy_8dma"> <jurisdiction>New York</jurisdiction> <week_of_allocations>2021-04-05T00:00:00</week_of_allocations> <_1st_dose_allocations>153270</_1st_dose_allocations> <_2nd_dose_allocations>153270</_2nd_dose_allocations> </row> <row _id="row-chrx-6f37~qbn9" _uuid="00000000-0000-0000-30C3-4B8A23B1DF14" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-chrx-6f37~qbn9"> <jurisdiction>New York City</jurisdiction> <week_of_allocations>2021-04-05T00:00:00</week_of_allocations> <_1st_dose_allocations>117000</_1st_dose_allocations> <_2nd_dose_allocations>117000</_2nd_dose_allocations> </row> </row> </response>
The above script was snipped of CDC COVID-19 Vaccine Distribution Allocations by Jurisdiction - Pfizer
This multi-node dataset doesn’t have many child values for us to configure but many parent subsections. We will use
week_of_allocations as our timestamp,
jurisdiction as a tag,
_2nd_dose_allocations as fields. Even though the Janssen/Johnson & Johnson data doesn’t contain the
_2nd_dose_allocations (one and done), we do not need a separate configuration for it but the parser just won’t emit a field for it.
I included the
processors.enum to my configuration. In the XML data itself there is no indicator besides the URL to indicate which manufacturer the data belongs to. The enum processor I configured will add a tag for the manufacturer name for its corresponding URL.
[[inputs.http]] urls = [ "https://data.cdc.gov/api/views/b7pe-5nws/rows.xml", # Moderna "https://data.cdc.gov/api/views/saz5-9hgg/rows.xml", # Pfizer "https://data.cdc.gov/api/views/w9zu-fywh/rows.xml" # Janssen/Johnson & Johnson ] data_format = "xml" ## Drop hostname from list of tags tagexclude = ["host"] [[inputs.http.xml]] metric_selection = "//row" metric_name = "'cdc-vaccines'" timestamp = "week_of_allocations" timestamp_format = "2006-01-02T15:04:05" [inputs.http.xml.tags] state = "jurisdiction" [inputs.http.xml.fields_int] 1st_dose_allocations = "_1st_dose_allocations" 2nd_dose_allocations = "_2nd_dose_allocations" [[processors.enum]] [[processors.enum.mapping]] ## Name of the tag to map. Globs accepted. tag = "url" ## Destination tag or field to be used for the mapped value. By default the ## source tag or field is used, overwriting the original value. dest = "vaccine_type" ## Table of mappings [processors.enum.mapping.value_mappings] "https://data.cdc.gov/api/views/b7pe-5nws/rows.xml" = "Moderna" "https://data.cdc.gov/api/views/saz5-9hgg/rows.xml" = "Pfizer" "https://data.cdc.gov/api/views/w9zu-fywh/rows.xml" = "Janssen"
Output (snippet of output based of the sample of XML vaccine data above full configuration will provide a much larger output)
cdc-vaccines,state=Connecticut,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=60840i,2nd_dose_allocations=60840i 1617580800000000000 cdc-vaccines,state=Maine,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=23400i,2nd_dose_allocations=23400i 1617580800000000000 cdc-vaccines,state=Massachusetts,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=117000i,2nd_dose_allocations=117000i 1617580800000000000 cdc-vaccines,state=New\ York,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=188370i,2nd_dose_allocations=188370i 1617580800000000000 cdc-vaccines,state=New\ York\ City,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=143910i,2nd_dose_allocations=143910i 1617580800000000000
Using field selectors for batch field processing (example: London bicycle data)
Your XML data will often contain metrics with so many fields that it would be tedious to configure each field in the
[inputs.tail.xml.fields] sub-section. Also, your XML data might generate fields that are unknown during configuration. In these situations, you can use field selectors to parse these metrics.
For our example, we’ll use the London hire for cycle data provided by Transport for London. The data contains the latest time the data was updated (
lastUpdate) that we’ll use as our timestamp. The
info nodes contain the bicycle station status information that we’ll use as our fields.
<stations lastUpdate="1617397861012" version="2.0"> </stations> <response> <location id="1" name="River Street , Clerkenwell"> <info terminalName="001023" /> <info lat="51.52916347" /> <info long="-0.109970527" /> <info installDate="1278947280000" /> <temporary>false</temporary> <info nbBikes="10" /> <info nbEmptyDocks="9" /> <info nbDocks="19" /> </location> <location id="2" name="Phillimore Gardens, Kensington"> <info terminalName="001018" /> <info lat="51.49960695" /> <info long="-0.197574246" /> <info installDate="1278585780000" /> <temporary>false</temporary> <info nbBikes="28" /> <info nbEmptyDocks="9" /> <info nbDocks="37" /> </location> <location id="3" name="Christopher Street, Liverpool Street"> <info terminalName="001012" /> <info lat="51.52128377" /> <info long="-0.084605692" /> <info installDate="1278240360000" /> <temporary>false</temporary> <info nbBikes="2" /> <info nbEmptyDocks="30" /> <info nbDocks="32" /> </location> </response>
In our configuration, we’ll still use the
metric_selection option to select all
location nodes. For each
location we then use
field_selection to select all child nodes of the location as field-nodes. This field selection is relative to the selected nodes for each selected field-node we will configure
field_value to determine the field’s name and value, respectively. The
field_name pulls the name of the first attribute of the node, while
field_value pulls the value of the first attribute and converts the result to a number.
For our non-numerical fields, we can still use
[inputs.tail.xml.fields] in conjunction with
field_selection. We will still set the node
temporary that contains a string to read in as a field. Also, note that my timestamp is outside my
metric_selection so I had to make sure the XPath query to pull
lastUpdate was an absolute path predicated with
[[inputs.tail]] files = ["/pathname/london-cycle-for-hire.xml"] data_format = "xml" [[inputs.tail.xml]] metric_selection = "response/child::location" metric_name = "string('bikes')" timestamp = "/stations/@lastUpdate" timestamp_format = "unix_ms" field_selection = "child::info" field_name = "name(@*)" field_value = "number(@*)" [inputs.tail.xml.tags] address = "@name" id = "@id" [inputs.tail.xml.fields] placement = "string(temporary)"
bikes,address=River\ Street\ \,\ Clerkenwell,host=MBP15-SWANG.local,id=1 installDate=1278947280000,lat=51.52916347,long=-0.109970527,nbBikes=10,nbDocks=19,nbEmptyDocks=9,placement="false",terminalName=1023 1617397861000000000 bikes,address=Phillimore\ Gardens\,\ Kensington,host=MBP15-SWANG.local,id=2 installDate=1278585780000,lat=51.49960695,long=-0.197574246,nbBikes=28,nbDocks=37,nbEmptyDocks=9,placement="false",terminalName=1018 1617397861000000000 bikes,address=Christopher\ Street\,\ Liverpool\ Street,host=MBP15-SWANG.local,id=3 installDate=1278240360000,lat=51.52128377,long=-0.084605692,nbBikes=2,nbDocks=32,nbEmptyDocks=30,placement="false",terminalName=1012 1617397861000000000
There is a folder of XML test cases in the Telegraf GitHub repository of more examples. If you think you have an example XML document + XML parser configuration that will be helpful to the community, please contribute a PR containing the documents.
Quick tips and other helpful resources
If you’re looking to do generic troubleshooting, be sure to set
debug = "true" in your agent settings and the parser will (for the *_selection settings) walk up the nodes if the selection is empty and print how many children it found. This will help you see which part of the query could be causing the problem.
An XPath tester like XPather or Code Beautify’s XPath Tester will be your best friend while configuring your XML parser to help you make sure you are selecting the proper XPath query for your data. It will make configuration a lot less frustrating when you can visibly see what nodes your XPath query is selecting.
A few syntax things to reiterate are that when you are setting up an XPath query in your configuration, the specified path can be absolute (starting with
/) or relative. This will be important to remember if you are querying a node outside of your metric selection. If you don’t include the starting
/, you’d end up querying a node in your selected metrics that may not exist.
Lastly, something I kept running into when querying to attribute (ex:
<coord lon="-83.3999" lat="42.6667"/>) is to remember to include the
@. I would accidentally query
current/city/[email protected] which would result in nothing when the correct query is
Here are some resources that will help you have a better understanding of the Telegraf XML Parser and XPath:
- XPath Golang library used for the Telegraf XML Parser
- Xpath cheatsheet
- The W3Schools XPath Tutorial
- XPath Tester and Evaluator
Incredibly massive shoutout to Sven Rebhan for building this plugin!
If you end up with any questions about parsing your XML data, please reach out to us (
@Sven Rebhan if you’d like to chat with Sven specifically) in the #telegraf channel of our InfluxData Community Slack or post any questions on our Community Site.
Want to learn more about data acquisition through Telegraf? Register for free for InfluxDays EMEA to attend Jess Ingrassellino’s “Data Acquisition” talk covering Telegraf, CLI Integration to the cloud, and client libraries, on May 18, 2021.