Apache Zipkin Monitoring

Use This InfluxDB Integration for Free

Apache Zipkin is a distributed tracing system that helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data and is based on the Google Dapper paper.

The Zipkin user interface represents a dependency diagram, which shows you how many trace requests went through each application at what times. This can be incredibly helpful for identifying aggregate behavior, highlighting things like error paths or even calls to depreciated services. Some of this data will be summarized for you, and you can also search based on attributes like service, operation name, tags and even the duration of the event you want to take a closer look at.

Why use a Telegraf plugin for Apache Zipkin?

The Apache Zipkin Telegraf plugin collects traces and time-stamped data essential for troubleshooting latency problems in microservice architectures. It collects spans sent by tracing clients at regular intervals in the Zipkin data format, converts them to Telegraf's internal data format, and writes this data into an InfluxDB instance. Of course, this data can be queried in InfluxDB to understand the behavior of your microservices.

How to collect your traces using the Apache Zipkin Telegraf plugin

The Apache Zipkin Telegraf plugin has a simple configuration that defines the path for the span data and can accept spans in JSON or thrift. The plugin uses tags and fields to track data from the spans.

  • TRACE: is a set of spans that share a single root span. Traces are built by collecting all spans that share a traceId.
  • SPAN: is a set of Annotations and BinaryAnnotations that correspond to a particular RPC.
  • Annotations: for each annotation & binary annotation of a span, a metric is output and records an occurrence in time at the beginning and end of a request.

Annotations may have the following values:

  • CS (client start): the beginning of a span, when a request is made.
  • SR (server receive): the server receives a request and will start processing network latency & clock jitters that differ from the client start (CS).
  • SS (server send): when the server is done processing, it sends the request back to the client; the amount of time it took to process the request differs from the server receive (SR).
  • CR (client receive): the end of span; the client receives the response from the server, and the RPC is considered complete with this annotation.

It is very important to note that as of 2020, the Apache Zipkin Telegraf Plugin is still considered to be experimental. This means that its data scheme (not to mention other properties) may be subject to change in the future based on its main usage cases, along with the evolution of the OpenTracing standard moving forward.

Having said that, the configuration process for the Apache Zipkin Telegraf Plugin is very straightforward. Simply replace the default values in the following command with ones relevant to your deployment. Note that if “Content-Type” is not set, the plugin will assume it to be in the JSON format:

[[inputs.zipkin]]
    path = "/api/v1/spans" # URL path for span data
    port = 9411 # Port on which Telegraf listens

What do traces look like when you collect them with the Apache Zipkin Telegraf plugin?

The following shows the structure of your traces in the InfluxDB data format:

Tags

  • "id": The 64 bit ID of the span.
  • "parent_id": An ID associated with a particular child span. If there is no child span, the parent ID is set to ID.
  • "trace_id": The 64 or 128-bit ID of a particular trace. Every span in a trace shares this ID. Concatenation of high and low and converted to hexadecimal.
  • "name": Defines a span

Annotations have these additional tags:

  • "service_name": Defines a service
  • "annotation": The value of an annotation
  • "endpoint_host": Listening port concat with IPV4; if port is not present, it will not be concatenated.

Binary Annotations have these additional tags:

  • "service_name": Defines a service
  • "annotation": The value of an annotation
  • "endpoint_host": Listening port concat with IPV4, if port is not present it will not be concatenated
  • "annotation_key": label describing the annotation

Fields

  • "duration_ns": The time in nanoseconds between the end and beginning of a span.
For more information, please check out the documentation.

Project URL   Documentation

Related resources

InfluxDb-cloud-logo

The most powerful time series
database as a service

Get Started for Free
Influxdbu

Developer Education

Training for time series app developers.

View All Education