In general, network bonding is a term used to describe the combination of network interfaces on one host, all for the purposes of redundancy and/or increased throughput. This is especially important in terms of virtualized environments, where redundancy is a critical factor. Obviously, you want to do whatever it takes to protect that virtualized environment from loss of service due to a disaster at a single point of failure or physical link. Network bonding goes a long way towards accomplishing precisely that.
Network bonding is also helpful in terms of increasing not only the network throughput and bandwidth, but it's also ideal when looking for fault tolerance, when load balancing networks and more. This is not to be confused with the concept of teaming, which is essentially a new way to implement network bonding by way of a separately provided driver.
Why use a Telegraf plugin for Bond?
Bonding your networks increases network throughput for your users, so it is important to know the current state of these bond interfaces to ensure performance guarantees to your users.
By paying attention to what is happening during normal working conditions, you can also put yourself in a position to more quickly identify whenever anything starts to stray outside of those "default" boundaries. This insight can be a great way to spot a small problem and fix it right now, all before allowing it to become a much larger (and potentially more damaging) issue in the future.
The Bond Telegraf Plugin collects metrics so you know which interface is active, what the state of the bond interface is, and if there are any failures. Knowing this will help you to maintain these interfaces, and therefore keep your SLA promises to your customers. Furthermore, you can combine these metrics with metrics collected from other Telegraf plugins like SNMP, NetFlow, and Cisco gRPC Network Management Interface (gNMI) to get a fully integrated view of your network health.
How to monitor Bond using the Telegraf plugin
By default, the Bond Telegraf Plugin collects metrics from all bond interfaces, but you can restrict the metrics to specified bond interfaces.
To properly configure the Bond Telegraf Plugin, simply use the following command. Note that you'll need to replace the default values for the ones that make the most sense given your deployment:
[[inputs.bond]] ## Sets 'proc' directory path ## If not specified, then default is /proc # host_proc = "/proc" ## By default, telegraf gather stats for all bond interfaces ## Setting interfaces will restrict the stats to the specified ## bond interfaces. # bond_interfaces = ["bond0"]
At that point, getting a description of the metrics that matter most to you is easy. Below are a few common commands that you'll use, along with descriptions of the information that will be returned:
active_slave. This will show you the current active slave interface, which is important when using the active-backup mode.
status. This will show you the status of the bond interface, or of the bond's slave interface. A value of 0 means that the interface is down, while a value of 1 means it is up.
failures. This will show you the total number of failures for the bond's slave interface, right from the command line.
Key bond metrics to use for monitoring
Some of the important bond metrics that you should proactively monitor include:
- Active Slave
- Status (up or down)