Getting Started with OpenTelemetry for Observability
Charles Mahler /
Jun 21, 2022
This article was published in The New Stack.
For most developers, software development means there is an API for almost everything, hardware is provisioned via the cloud and the core focus is on building only the features most crucial to your business.
Of course, all these integrations and modern distributed architectures create their own set of problems. Having full insight into your application has become even more important and is now commonly known as observability. Being able to collect the data required for this insight in itself is a challenge, and as a result, we have seen a number of major tech companies work together to create a standardized framework to simplify the collection of telemetry data via a project called OpenTelemetry.
What is OpenTelemetry?
Using the OpenTelemetry specification, OpenTelemetry can be described as a collection of tools, APIs and SDKs used to generate and collect data including metrics, logs, and traces to help analyze your software’s performance and behavior. OpenTelemetry resulted from the merger of the OpenCensus and OpenTracing projects and is an incubating project with the Cloud Native Computing Foundation (CNCF).
The purpose of OpenTelemetry is to simplify the collection and management of telemetry data to enable developers to adopt observability best practices. OpenTelemetry has support from some of the biggest companies in the tech industry, with active contributions from Microsoft, Google, Amazon, Red Hat, Cisco and many others.
What are the benefits of OpenTelemetry?
So why exactly are so many companies adopting OpenTelemetry? The benefits vary slightly depending on whether you use observability tools or are a vendor, but overall, the result is a better software ecosystem for everybody.
The biggest benefit of OpenTelemetry is that it provides a standard vendor-agnostic interface. For users, this means that you don’t have to worry about being locked in to an observability tool because migrating would require a bunch of code changes. For vendors, this means that the best service will win because there is less of a moat for newcomers to overcome. As long as vendors support OTEL, they can acquire new users. This will drive innovation and improvement in the observability space. OpenTelemetry will even create competition in the open source ecosystem. Many libraries and frameworks are already embedding OTEL support into their projects so developers can get telemetry data out of the box.
Another benefit is that OTEL supports the three main types of telemetry data in the form of metrics, traces and logs. This saves developers time because they don’t need to use different tools or libraries to collect this data; they can just use OTEL. Features like auto-instrumentation also make it possible for OpenTelemetry to be added to an application without even needing to modify the codebase in many cases.
Flexibility and ease of use are probably the biggest advantages of OpenTelemetry. The project has been designed from the ground up to work with cloud native applications and modern architectures. No matter what a developer is doing, there is a way to integrate OpenTelemetry. Making the collection of telemetry data easier long term will result in higher-quality software for everyone.
Multiple data type support
OpenTelemetry aims to support the three main types of telemetry data: metrics, logs and traces. The specifications for metrics and traces are currently stable, although the logging spec is still considered experimental but expected to be finalized sometime in 2022.
APIs and SDKs
The OpenTelemetry APIs and SDKs are the programming-language-specific portion of the project. The APIs are used to instrument your application’s code and to generate telemetry data. SDKs act as a bridge between the API and exporting your telemetry data to its destination. SDKs can be used to filter or sample data prior to being exported. SDKs also allow data to be exported to multiple destinations, so different data types like metrics or traces can be sent to more specialized tools.
The OpenTelemetry Collector can be used as an alternative to direct in-process exporters available for different backends. The OTEL Collector is completely vendor- and language- agnostic. This means you can send data to the Collector from any of the language SDKs and export the data to any supported backend without modifying any application code, and you don’t have to import additional packages into your application.
The OTEL Collector consists of three main components: receivers, processors and exporters. These components can be thought of as a pipeline, and they can be configured using YAML files. The Collector has two deployment options. It can be deployed as either an agent that is run on every app instance as a sidecar proxy or binary, or as a gateway, which is a stand-alone service that receives telemetry data from multiple app instances.
Instrumentation is the process of wiring up your application so it is actually generating telemetry data. OpenTelemetry provides language-specific implementations for 11 popular programming languages. These libraries can be imported into your application, and you write the code to instrument your app.
Auto-instrumentation is another option. These are community- or vendor-provided tools that allow you to export OTEL telemetry data without having to make any manual code changes. Many of the most popular frameworks and libraries for each programming language already support auto-instrumentation. Using Python as an example, Flask and Django both have auto-instrumentation packages available, with the full repo here.
How to Use Your OpenTelemetry Data
So what should you do with this data once you’ve collected it? Let’s look at a few use cases and tools you can use with OpenTelemetry to get value out of your data.
Storing your data
The first thing you need to consider is where you are going to export and store your telemetry data. The data store you choose will depend on the volume of data you will be storing, how long you will store it and how frequently you will be querying that data. If you are only storing a relatively small amount of data, you could use a more general-purpose solution. On the other hand, it might make sense to use more specialized solutions if you are storing large amounts of data and have a high number of queries on that data.
Some options to consider for storing your telemetry data:
- Search database — These data stores are designed for text searches and are useful for analyzing logs. Elasticsearch or Solr would be open source examples of a search database. An in-memory database like Redis with the RedisSearch module is also an option, if it fits your architecture.
- Time-series database — Time-series databases are designed to store high volumes of data being written and to query data across ranges of time, the types of queries frequently used when working with metrics that are difficult for other databases to query efficiently. Time-series databases also work well for log and tracing data.
- Multiple database combinations — There’s no reason to limit yourself to a single storage option. Using a graph database might not be your first thought for working with telemetry data, but you can find valuable insights and relationships by pulling data from your primary data store and then analyzing it using tools provided by a graph database. You could do the same by using a key-value database in combination with any number of the above storage options to provide new features to users or better analyze your data.
Analyzing your data
Once you have determined how you will store your data, you can begin to think about how you will analyze your data to get insights from it. The first step will probably be related to visualizing your data and creating dashboards. For this, you can use something like Grafana or build a custom UI using your preferred data visualization libraries.
The next step is often adding some form of monitoring and alerts to notify engineers when something is going wrong. By creating alerts and automated actions, you can begin to mitigate outages and other things that affect user experience.
The final step would be to not just react to issues, but to actually begin to understand your application well enough to be able to take action before things go wrong. You can also begin forecasting based on historical data and try to optimize and become more efficient over time.
The steady rise of observability and OpenTelemetry
Observability continues to gain mindshare and adoption with developers. OpenTelemetry will likely be a crucial piece in the observability ecosystem that helps to tie all the different tools and vendors together. Because of that, it is worthwhile for developers to become familiar with OpenTelemetry and start experimenting with what it can do.