Perform Distributed Tracing with Zipkin

Navigate to:

This post, written by Tarun Telang, was originally published in The New Stack and is reposted here with permission.

Open source Zipkin offers a robust set of features that make it easier for developers to understand and optimize complex distributed systems.

Distributed tracing is a technique you can use to trace and monitor requests propagating through a distributed system. It can work in environments where multiple services process a request, making it an essential tool for modern microservices architectures.

Zipkin is an open source distributed tracing system for monitoring and troubleshooting complex systems. In this tutorial, we’ll show you how to set up and use Zipkin to trace issues and help troubleshoot common service problems.

Overview of Zipkin

While Twitter created Zipkin, the OpenZipkin community currently maintains it. Twitter designed it to be language-agnostic, and it supports a wide range of programming languages and frameworks, including C#, Go, Java, JavaScript, Ruby, Scala and PHP. It also integrates with other monitoring systems, such as Prometheus, InfluxDB and Grafana, to provide a comprehensive view of system performance.

Zipkin’s key features

Zipkin provides a robust set of features for distributed tracing that make it easier for developers to understand and optimize complex distributed systems. Below are some of its key features:

Distributed tracing

Zipkin allows developers to trace the path of a request as it passes through a system, making it easier to identify bottlenecks and errors.

Service graph visualization

Zipkin provides a service graph visualization that shows the dependencies between services in a distributed system, making understanding the interactions between services easier.

Customizable sampling rate

Zipkin allows developers to customize the sampling rate of trace data, making it easier to balance the amount of data collected with the system’s performance.

Support for multiple languages and platforms

Zipkin supports a wide range of programming languages and platforms, making it easy to integrate into any distributed system.

Integration with other monitoring systems

Zipkin can be combined with other monitoring systems, such as Prometheus or Grafana, making it easier to analyze and troubleshoot system performance.

Annotations and tags

Zipkin supports using annotations and tags to provide additional context for each span in a trace, making it easier to identify and troubleshoot issues.

Easy to deploy and configure

Zipkin is easy to deploy and configure, and it has a range of options for storage, transport and sampling.

Zipkin use cases

Most developers use Zipkin for distributed tracing in complex, distributed systems. In a distributed system, a single request can involve multiple services, each performing a different task and potentially distributed across multiple servers. Zipkin allows developers to trace the path of a request as it travels through these services and to monitor how long each service takes to complete its task. This information is valuable for identifying and troubleshooting performance bottlenecks or errors, understanding the flow of requests through the system and optimizing the system accordingly.

Troubleshooting and debugging

When something goes wrong in a distributed system, it can be difficult to identify where the problem originated. Zipkin helps developers identify where requests are being slowed down and where errors are occurring, making it easier to troubleshoot and debug issues.

Performance monitoring

Zipkin provides detailed metrics about the amount of time each service takes to process a request, making it possible for you to identify performance bottlenecks and optimize the system accordingly.

Dependency analysis

Zipkin allows developers to visualize the dependencies between services in a distributed system. This information helps you understand how different services are interacting with each other and can help you identify where changes need to be made.

Capacity planning

By analyzing the performance metrics that Zipkin provides, developers can identify underutilized or overutilized areas of the system. This information is valuable for capacity planning and resource allocation.

Installing Zipkin

Below are step-by-step instructions for installing Zipkin:

Step 1: Install Java

Zipkin is a Java-based application, so you need to have Java installed on your system. Before installing, check to see if you already have it installed. To do this, open a terminal and run the following command:

java -version

If Java is already installed on your system, you should see something similar to the following:

java version "17.0.1" 2021-10-19 LTS
Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-jvmci-21.3-b05)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-jvmci-21.3-b05, mixed mode, sharing)

If you see something like the above, Java is already installed, and you can skip to the next step. Otherwise, you can download the latest version of Java from theofficial website and install it.

Step 2: Download Zipkin

Once Java is installed, you can download the Zipkin executable jar file from Maven Central Repository.

Step 3: Start the Zipkin Server

Once you’ve downloaded Zipkin, you can start the Zipkin server by running the command below in your terminal or command prompt. Copy the jar file in the directory where you want to install Zipkin. To start the Zipkin server, you can run the command in your terminal or command prompt from the same directory where you installed it.

java -jar zipkin-server-<version>-exec.jar

Here, “version” is the version of the Zipkin jar. This will start the Zipkin server on your local machine. Below is the output:

Zipkin jar

Command line output on a successful execution of a Zipkin server

Step 4: Access the Zipkin UI

Once the Zipkin server is running, you can access the Zipkin UI by opening your web browser and navigating to http://localhost:9411/zipkin/.

Access the Zipkin UI

Zipkin UI

This web-based user interface displays the traces collected from instrumented applications. A trace is a collection of spans, where each span represents a unit of work that has occurred in the system. For example, a span could represent a database query, an HTTP request or a method call. Spans are connected to form a trace, which represents the complete path of a request as it travels through the system.

Step 5: Instrument your application

To trace requests in your application, you need to add Zipkin instrumentation to your code. Zipkin provides libraries for various programming languages that make it easy to trace requests. You can find the complete list of tracers and instrumentation libraries on Zipkin’s Tracers and Instrumentation page. Once the applications are instrumented, they can send trace data to the Zipkin server, which collects, stores and queries the trace data.

Step 6: Configure your application

Next, you need to configure your instrumented application to send trace data to the Zipkin server. You can do this by setting the endpoint URL of the Zipkin server in the application’s configuration file.

Step 7: Generate and view traces

Once your applications are instrumented and configured, you can start sending requests through the system. Zipkin will collect the trace data and display it in a trace view. You can use this view to analyze the flow of requests through the system and identify performance bottlenecks and errors.

Step 8: Analyze and troubleshoot

Use the trace view to analyze your system’s performance and troubleshoot issues. You can use filters and annotations to narrow down the scope of your analysis. You can also search for a trace using the search bar in the Zipkin UI.

The dependency view in Zipkin provides a visual representation of the dependencies between services in a distributed system. You can use the dependency view to identify services that are critical to the system’s performance. In addition, you can isolate services that are overutilized or underutilized and services that you may need to optimize or replace.

Storing Zipkin tracing data at scale

By default, Zipkin stores collected data in memory and doesn’t have any way to persist in tracing data long-term. Luckily, Zipkin is designed with a component-based architecture that makes it easy to use a number of different databases or data warehouses to store your data depending on your use case. Zipkin’s storage layer is abstracted through a simple interface, which supports plug-and-play functionality for different storage backends like InfluxDB, Elasticsearch, Cassandra and more.

Zipkin also integrates with OpenTelemetry, which further enhances what you can do by opening up additional ways to use not only traces but other types of observability data generated by your application. When using OpenTelemetry with Zipkin, developers can leverage the extensive instrumentation libraries provided by OpenTelemetry, while also benefiting from the powerful visualization and analysis capabilities of Zipkin.

InfluxDB is a particularly good choice as a storage backend for Zipkin because as a time series database using column-oriented storage it allows you to store all types of observability data in a single place, which will reduce the complexity of your architecture. To send your Zipkin data to InfluxDB, you can use OpenTelemetry, Telegraf or your own custom solution via InfluxDB’s API.

Conclusion

Zipkin is a valuable tool for developers working with microservices architectures and distributed systems. By using Zipkin, developers can gain a comprehensive understanding of their system’s performance. As a result, it helps them make data-driven decisions to improve efficiency and reliability.

Check out our blogs for more informative tutorials.