At its core, Apache Aurora is a framework for long-running services, applications and cron jobs. In addition to being responsible for running applications and services across a shared pool of machines, Aurora is also tasked with keeping them running — no matter what. Even if some of the machines in question begin to experience failure, Aurora is capable of intelligently rescheduling those jobs onto healthy machines, thus guaranteeing performance and integrity across the board.
When updating a job, Aurora will also detect both the health and status of a deployment and automatically roll things back to a more stable state if necessary. Aurora also includes an innovative quota system to help provide guarantee resources for all of your specific, critical applications. It can even support multiple users to deploy services, too.
Why use a Telegraf plugin for Apache Aurora?
The Apache Aurora Telegraf Plugin is designed to gather all important metrics from your Apache Aurora schedulers, arranging things in a visual way that makes them easy for anyone in your organization to understand. Not only that, but it acts as a complete one-stop shop for all of your essential metrics and related data, all so that you can have easy access to the information you need to always make the best decisions possible given the circumstances.
How to monitor Apache Aurora using the Telegraf plugin
Thankfully, the process of configuring the Apache Aurora Telegraf Plugin is quite straightforward. It can be added very simply using the inputs.aurora plugin ID. This will then gather all metrics that you select from any available Apache Aurora scheduler. Note that the scheduler itself will expose a significant amount of instrumentation data via its built-in HTTP interface. To get a quick overview of exactly what is going on at any moment, use the following command:
$ vagrant ssh -c 'curl -s localhost:8081/vars | head'
Just a few of the available metrics that you'll be able to collect using the Apache Aurora Telegraf Plugin include timeouts, basic authentication and even an optional TLS configuration.
As soon as you have your Apache Aurora Telegraf Plugin properly set up, you can immediately begin putting your metrics into your InfluxDB instance for further review.
Key Apache Aurora metrics to use for monitoring
As stated, the Apache Aurora Telegraf Plugin is nothing if not versatile — meaning that you can use it to proactively monitor a large number of different elements of your deployment depending on your needs. Just a few of these include:
- Tags, including the URL of the scheduler and the role (meaning whether or not it is a leader or a follower).
- Fields. These are number metrics collected from the
/varsendpoint. Note that string fields are not actually gathered during this process.
JVM_uptime_secs. This allows you to see the number of seconds the JVM process has been running for.
System_load_avg. This shows you the current load average for the system, taken from data collected over the last minute.
Process_CPU_cores_utilized. This shows the current number of CPU cores that are in use by the JVM process. Note that for the best results, this should never exceed the number of logical CPU cores that are actually present on the machine in question.