Network Performance Monitoring
Networks play a fundamental role in the adoption and growth of Internet applications. Penetrating enterprises, homes, factories, and even cities, networks sustain modern society. While assuring responsive and performant networks in today’s hybrid, distributed and containerized application environments occurs behind the scenes in intangible clouds and diagrams abstractions, network glitches are more visible and unforgiving than ever.
The basics of network monitoring
Three network performance monitoring pillars separate the unmonitored from the monitored network — and if you feed this data into a centralized time series platform you will be able to enable holistic network performance monitoring.
This refers to host reachability. If unavailable, there could be something related to the endpoint health or network path (such as a load balancer’s sessions limit or expired SSL) preventing traffic from reaching the host.
Latency refers to the time it takes for traffic to cross the network to a target, and packet loss determines the error rate experienced. Latency and error rates render the network suitable for all or some more sensitive applications. For instance, high latency will completely undermine unified communications (voice & video) services.
Network Bandwidth Consumption
This tracks metrics from the network interface providing important information about bandwidth load at the interface, which can be used to set up alerts before the interface is completely saturated. Adding to the network interface metrics data from network traffic analysis appliances allows for identification of bandwidth struggles, providing precious insights about sessions and IPs/protocol/port troublemakers that are causing saturation, as well as identifying resource misuse and potential Denial of Service (DoS) attacks.
The functional architecture of the InfluxData network monitoring platform
- Telegraf – the collection agent with 200+ plugins and a vast client library, that can source metrics directly from the system it’s running on, pull metrics from third-party APIs, listen for metrics via streaming consumer services, and support monitoring protocols such as ICMP/Ping, SNMP, NETFlow and SFlow, as well as gathering metrics and logs with Syslog.
- InfluxDB – the database and storage engine purpose-built to handle time series data. A perfect metric store for multiple data sources to help you avoid a siloed approach.
- Chronograf – the visualization tool with pre-canned dashboards with the standard baseline for network monitoring.
- Kapacitor – the rules engine for processing, monitoring, and alerting.
Customer Use Cases
“Out of all of the monitors there [in our office], we only have a couple that show some custom dashboards. The rest is entirely Grafana running on InfluxDB.”
Senior Software Engineer, NewVoiceMedia
“We are getting good mileage in a short time period from our investment in TICK and Grafana.”
Senior Director, Site Reliability Engineering, Coupa
“It’s very important today to deliver data with high
NewVoiceMedia, a UK-based cloud service company, chose InfluxData to provide 99.999% uptime monitoring of its global SaaS because InfluxData could meet and exceed its business and technical requirements. Grafana is used for all its graphing capabilities.
In just 4 weeks, Coupa (a cloud platform for business spend), was able to go beyond building a proof-of-concept with InfluxData, and was able to create a working prototype that was kept simple and iterated upon often: It used Telegraf to collect data, a single InfluxDB node to store data, Grafana to visualize data, and Kapacitor to analyze data.
Using InfluxDB, ntopng is open to “big data” systems that can scale with data in volume and
speed. It is able to export monitoring information in JSON format towards various systems
including Elasticsearch/Logstash and ZMQ. ntopng is also able to collect, self-produce (from
packets), and export monitoring information by normalizing it in JSON format.
Want to Know More?
- Cisco: Introducing Pipeline: A Model-Driven Telemetry Collection Service
- Juniper: Guidelines for Aggregating Junos Telemetry Interface Data
- Meraki: How to Monitor SNMP Devices with Telegraf and InfluxDB and Grafana | Meraki WAN Data
- Metamako: Introducing Telemetry: Actionable Time Series Data from Counters