Bcache is a cache in the Linux kernel's block layer, which is used for accessing fast disk drives. For example, flash-based solid state drives (SSDs) act as a cache for one or more slower hard disk drives as a read/write cache or read cache. It essentially creates hybrid volumes using older and newer technology, providing performance improvements in even the most demanding environments.
Bcache also brings with it a number of unique benefits, like the fact that it also minimizes write amplification by avoiding random writes and turning them into sequential writes instead. This essentially merges I/O operations together for the purposes of helping both the cache and the primary storage device. Not only does this improve the performance of write-sensitive primary storage like RAID 5 sets, but it also goes a long way towards extending the lifetime of flash-based SSDs that are used as caches — thus maximizing your return on investment.
Why use a Telegraf plugin for Bcache?
SSDs provide significant improvement in performance over traditional rotating SATA and SAS drives while the costs of SSDs have also declined significantly; this makes it possible for more solutions to make the switch to SSDs. Previously, cost was a major hurdle that organizations needed to overcome when switching to SSDs, yet this is slowly but surely no longer becoming an issue as the cost of storage plunges dramatically. This, coupled with the fact that it's flash-based storage so there are no moving parts to break down on the inside of the device (thus reducing the possibility of data loss) means that more businesses are using SSDs (under heavier workloads) than ever before.
Bcache provides some useful metrics about how well your caches are performing, and these metrics can be ingested into an InfluxDB instance using the Bcache Telegraf Plugin. By default, the Bcache Telegraf plugin gathers metrics for all Bcache devices, but you can also restrict the metric collection to specified Bcache devices.
The Bcache plugin can be properly executed on your environment when run with the following command:
./telegraf --config telegraf.conf --input-filter bcache --test
At that point, you can use the following configuration to get everything running exactly as you'd like it to be. Just replace the default values in the example below with the ones that make the most sense in the context of your deployment:
[bcache] # Bcache sets path # If not specified, then default is: # bcachePath = "/sys/fs/bcache" # # By default, telegraf gather stats for all bcache devices # Setting devices will restrict the stats to the specified # bcache devices. # bcacheDevs = ["bcache0", ...]
Key Bcache metrics to use for monitoring
Some of the important Bcache metrics that you can collect include:
- Amount of dirty data for this backing device in the cache. Continuously updated unlike the cache set's version, but may be slightly off.
- Amount of IO (both reads and writes) that has bypassed the cache
- Hits and misses for IO that are intended to skip the cache are still counted, but broken out here.
- Hits and misses are counted per individual IO as Bcache sees them; a partial hit is counted as a miss.
- Counts instances where data was going to be inserted into the cache from a cache miss, but raced with a write and data was already present (usually 0 since the synchronization for cache misses was rewritten)
- Count of times readahead occurred