Apache Zookeeper Monitoring

Use This InfluxDB Integration for Free

Apache ZooKeeper is an open source project that centralizes configuration information, naming, synchronization, passwords and certs and group services over large clusters in distributed systems. This makes configuration easier to manage with improved, more reliable propagation of changes.

With regards to application development, Zookeeper is designed to provide an infrastructure for cross-node synchronization, which itself is made possible by maintaining status type information in memory on Zookeeper servers. Under normal working conditions, Zookeeper servers will keep a copy of the state of the entire system in local log files. Larger Hadoop clusters (think: ones that span 500 or more commodity servers) will be supported by multiple Zookeeper servers, all with a master server synchronizing everything at the top level servers.

Within the Zookeeper infrastructure itself, an application can create something called a znode. This is a file that stays in memory on the Zookeeper servers that can be updated by any node in the cluster. Any node can also register to be notified automatically when changes in that znode occur. To put it another way, applications can now synchronize all of their tasks across the distributed cluster simply by updating their status within the znode. This is of paramount importance for not only the management but also the serialization of tasks across large, distributed sets of servers within an organization.

Why use a Telegraf plugin for Apache Zookeeper?

Because Apache Zookeeper helps to keep your service-oriented architecture highly available, it makes sense to keep Apache Zookeeper up and running in a performant and efficient manner. You can use the Apache Zookeeper Telegraf Plugin to help you collect key performance metrics about your instance to do just that.

How to monitor Apache Zookeeper using the Telegraf plugin

Getting started with the Apache Zookeeper Telegraf Plugin requires that you modify some configurations to help fit with your environment. Simply add the addresses to gather the metrics from (IP or hostname with port), set some time outs, and configure the optional TLS configurations. Once you have that set up, it will start pumping metrics into your InfluxDB instance for you to query, visualize, and start keeping your Apache Zookeeper instance running.

The Telegraf plugin for Apache Zookeeper is all about providing much-needed context that wouldn’t usually exist. In addition to learning exactly what happened when something goes wrong, you can also now understand where the issue occurred exactly and why it happened in the first place. This is a great way to understand the root cause of the issue, thus allowing you to take actionable steps to prevent it from happening again.

To get the Apache Zookeeper Telegraf Plugin up and running, simply replace the default values in the following command with the ones that are relevant to your own deployment:

# Reads 'mntr' stats from one or many zookeeper servers
[[inputs.zookeeper]]
  ## An array of address to gather stats about. Specify an ip or hostname
  ## with port. ie localhost:2181, 10.0.0.1:2181, etc.

  ## If no servers are specified, then localhost is used as the host.
  ## If no port is specified, 2181 is used
  servers = [":2181"]

  ## Timeout for metric collections from all servers.  Minimum timeout is "1s".
  # timeout = "5s"

  ## Optional TLS Config
  # enable_tls = true
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## If false, skip chain & host verification
  # insecure_skip_verify = true

Key Apache Zookeeper metrics to use for monitoring

Some of the important Apache Zookeeper metrics that you can proactively monitor with the Apache Zookeeper Telegraf plugin include:

  • approximate_data_size (integer)
  • avg_latency (integer)
  • ephemerals_count (integer)
  • max_file_descriptor_count (integer)
  • max_latency (integer)
  • min_latency (integer)
  • num_alive_connections (integer)
  • open_file_descriptor_count (integer)
  • outstanding_requests (integer)
  • packets_received (integer)
  • packets_sent (integer)
  • version (string)
  • watch_count (integer)
  • znode_count (integer)
  • followers (integer, leader only)
  • synced_followers (integer, leader only)
  • pending_syncs (integer, leader only)

Note, however, that the exact field names may vary depending on the configuration, the platform and the version of Apache Zookeeper that you are currently using.

For more information, please check out the documentation.

Project URL   Documentation

Related resources

InfluxDb-cloud-logo

The most powerful time series
database as a service

Get Started for Free
Influxdbu

Developer Education

Training for time series app developers.

View All Education