Template built by

Telegraf Plugins used:

Included Resources:

  • 2 Labels: ceph,inputs.prometheus
  • 1 Telegraf Configuration
  • 1 Dashboard: Ceph Cluster
  • 1 Variable: bucket

Quick Install

If you have your InfluxDB credentials configured in the CLI, you can install this template with:


Ceph cluster monitoring dashboard

Ceph is a free-software storage platform, implementing object storage on a single distributed computer cluster, and providing interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, to be scalable to the exabyte level, and freely available.

Why monitor your Ceph Cluster system?

Monitoring your Ceph Storage infrastructure is as important as monitoring the containers that your applications run in. Ceph uniquely delivers object, block, and file storage in one unified system. Ceph has become popular for being open source and free to use, and is favored by Kubernetes users for being highly reliable and easy to manage. Ceph delivers extraordinary scalability:

  • A Ceph Node leverages commodity hardware and intelligent daemons.
  • A Ceph Storage Cluster accommodates large numbers of nodes that communicate with each other to replicate and redistribute data dynamically.

The Ceph Storage Cluster receives data from Ceph Clients – whether it comes through a Ceph Block Device, Ceph Object Storage, the Ceph Filesystem, or a custom implementation you create using librados – and stores the data as objects.

How to use Ceph Cluster Monitoring Template

Once your InfluxDB credentials have been properly configured in the CLI, you can install the Ceph Cluster system Monitoring template using the Quick Install command. Once installed, the data for the dashboard will be populated by the included Telegraf configuration, which includes the relevant Telegraf Prometheus Input Plugin Input. Note that you might need to customize the input configuration to better serve your needs, including by specifying a new input value. All of this will depend on how your organization is currently running Ceph.

To find out more information about environmental variables within the Telegraf configuration, consult the following link.


Ceph Cluster:

It is compatible with a Rook Ceph cluster running in Kubernetes and possibly also most Ceph Clusters with the Ceph MGR Prometheus module enabled.


The Telegraf Prometheus Input Plugin needs to scrape the Ceph MGR(s) Prometheus Metrics Endpoint(s).

Telegraf configuration requires the following environment variables:

  • INFLUX_TOKEN - The token with the permissions to read

Telegraf configs and writes data to the telegraf bucket. You can just use your operator token to get started.

  • INFLUX_ORG - The name of your Organization.
  • CEPH_MGR_SVC_URLS - URLs to Ceph Manager metrics endpoint service(s) e.g.

Any configuration changes reflecting your specific Kubernetes or Ceph installation can be set in the Telegraf configuration manually.

You MUST set these environment variables before running Telegraf using something similar to the following commands:

  • This can be found on the Load Data> Tokens page in your browser: export INFLUX_TOKEN=TOKEN
  • Your Organization name can be found on the Settings page in your browser: export INFLUX_ORG=my_org

Key Ceph cluster system monitoring metrics to monitor

Some of the most important Ceph cluster system monitoring metrics that you should proactively monitor include:

  • Cluster Latency
  • Cluster Capacity
  • Cluster Pools
  • Cluster I/O
  • Objects in Pods
  • PGs/OSDs
  • MON Quorum Status
  • MON Quorum Total

Related Resources

Kubernetes Monitoring Template

The Kubernetes Monitoring Template provides 2 basic Kubernetes dashboards: Kubernetes Node Metrics and Kubernetes Inventory.

Linux System Monitoring Template

Linux monitoring of Ubuntu, Centos, RedHat and any other distro is crucial to ensuring uptime. Use this template to start monitoring now.

Kubernetes monitoring solution

Gain real-time visibility into your entire container-based environment for faster root cause analysis.

Scroll to Top