Use Cases for Time-Series Data

In the world of DevOps, one size does not fit all when it comes to monitoring unique application and infrastructure deployments. (Not to mention ever evolving data structures.) Raw data is “dumb” when randomly distributed. To make this data work for you, you’ll need to implement smart analytics in order to unlock the hidden insights already present in the data.

Emerging trends like microservices, APIs, containerization, elastic storage, Software Defined Networking, Hybrid clouds, etc,  all keep pushing the boundaries of what coverage and depth traditional monitoring solutions can provide out of box.

The unique tooling and data needs of an organization’s DevOps team is often left unmet because the already deployed monitoring systems rarely keep pace, or can even support the new applications and infrastructure constantly being rolled out. If you can’t monitor it, how can you support it? That’s why it often makes sense to create custom monitoring solutions using extensible open source technologies.

Why Companies Build Custom Monitoring Solutions

DevOps is not just a technology shift, it is also a process and cultural shift that has been happening for years and will continue to do so for the foreseeable future.

Traditional waterfall development methodologies have been replaced by agile development processes with incremental, but shorter Code-Test-Build-Deploy cycles. Builds are pushed much more frequently, sometimes even 5-10 times a day, sometimes for patches, sometimes to support fragmented deployment targets like mobile devices, sensors or servers.

The lines between client and server are getting blurred by technology like isomorphic JavaScript (JavaScript on the client, Node.js on the server) which can run the same software in both ends. Virtual containers and deployment orchestrators are letting us dynamically provision microservices, while SDN switching is helping dynamically route traffic in elastic clouds.

Most off-the-shelf monitoring products were built to give deep visibility into one layer of the stack and hence we have silos of monitoring solutions like APM (Application Performance), IM (Infrastructure), NFM (Network Flows), RUM (Real User), Crash Reporting (Mobile), and Log Analytics to name a few. The challenges with these tools are many.

Correlating events and metrics from each silo is non-trivial and often we see “Frankenstein” monitoring systems built. These systems are traditionally riddled with problems of all large deployments:

  • Complexity
  • Maintenance overhead
  • Breaking dependencies
  • High TCO (Total Cost of Ownership)

get-started__graphic-6
Metrics redundancy and alert duplicacy also render these “Frankenstein” monitors ineffective in real triaging scenarios. Correlation between siloed metrics are faulty because of incompatible data formats or the variability of persistence frequencies. Further, each silo may store its metrics in different data stores like an RDBMS, Cloud storage or flat disk files, making cross-silo aggregation impractical or at least an ETL-nightmare.

Other major problems frequently encountered are:

  • Lack of customizability around views, graphs, data overlays, custom timescales
  • Support for newer languages, virtualization, network or storage stacks
  • Custom instrumentation to generate arbitrary metrics is typically expensive to support
  • Monitoring solutions built even 7-8 years ago have scalability issues with the data explosion on distributed devices and sensors

 

The reality is that we are stuck with 20-30 year old monitoring methodologies and solutions (whether they are on-premise or SaaS) which were not built for today’s dynamic DevOps reality.

Challenges in building a custom monitoring solutions

Unfortunately, the list is long:

  • Breaking dependencies are bad. The forces monitoring components like visualization, storage, data collection and processing to be encapsulated from each other
  • Metric collection should be comprehensive and fully extendible to quickly add any new technology
  • Integrating disparate open source projects can cause design challenges. For example: granularity of data in an open source database could be quite different from what an open source graphing tool supports. An open source stack is more advisable rather than unaligned proprietary products.
  • Heterogenous data is hard to handle and co-relate. Usually you need to handle the following:
    • Unpredictable: events, exceptions, messages
    • SLA bound: performance, availability, statistics
    • Silos: customer, application, network, servers, containers, cloud
    • Protocols: Mix of UDP, SNMP, HTTP or socket-to-socket TCP
    • Endpoints: sensors, mobile, web apps, api
  • Picking the right datastore is crucial. Storing time-series data (the main data type in monitoring) in a relational database simply won’t meet the read-write speeds needed when you hit billions of data points.
  • Metric schema designs need to be flexible and if possible schema-less.
  • Storage needs to be highly scalable and designed to support “Big Data” architectures
  • Logic processing and pattern matching needs to be able to handle big data volumes with minimal processing overhead
  • Asymmetric metric granularity should not affect persistence or querying
  • Alerts, anomaly detection, reports, APIs, role-based access control, 3rd party integrations, etc are all needed in mature monitoring systems, but may not be available as existing open source projects.
  • Data compression and rollups should be addressed from get go to keep storage costs in check.

 

InfluxData for Custom DevOps Monitoring

The InfluxData platform is uniquely suited for building custom DevOps monitoring solutions.

Collecting Monitoring Metrics

Monitoring means having to collect data from disparate systems, applications, datasources, services and infrastructure components. InfluxData’s Telegraf collector supports 30+ inputs and 10+ outputs and can be easily extended to support your sources of data. Telegraf makes collecting data in a format InfluxDB can consume, simple. Here’s why:

  • MIT License
  • Minimal memory footprint
  • Extensible plugin design with 40+ input and output plugins
  • Support for datasources like MongoDB, MySQL and Redis
  • Messaging systems like Apache Kafka and RabbitMQ
  • Third party APIs like Mailchimp, AWS CloudWatch and Google Analytics
  • Collects system metrics like CPU, Memory, I/O, etc


However, the InfluxData platform is extensible by design so you can easily integrate other collection agents like collectd, in conjunction with Telegraf.

Learn more about Telegraf

Storing Monitoring Metrics

The most popular data type in any monitoring system is going to be in a time-series format. InfluxDB is designed from the ground up to handle just time-series data and to do it better than any other database. InfluxDB is the “I” in the TICK stack. More specifically, InfluxDB is an open source database written in Go to handle time-series data with high availability and high performance requirements. InfluxDB installs in minutes without external dependencies, yet is flexible and scalable enough for complex deployments. Here’s why InfluxDB is the best choice for storing a custom monitoring solution’s time-series data:

  • MIT License
  • Simple to install, yet highly extensible
  • Purpose built for time-series data, no special schema design or custom app logic required
  • Thousands of writes per second with the new TSM1 storage engine
  • Horizontal clustering for high availability in active development
  • A native HTTP API means no server side code to manage
  • Time centric functions and an easy to use SQL­-like query language
  • Data can be tagged, allowing very flexible querying
  • Answer queries in real­time with every data point indexed as it comes in and immediately available in less than 100ms


Learn more about InfluxDB

Visualizing Monitoring Metrics

If you don’t already have a dashboarding or graphing UI in place, InfluxData provides Chronograf. It’s the “C” in the TICK stack. Chronograf is a downloadable binary you install behind your firewall to collaboratively, yet securely, perform ad-hoc visualizations on your time-series data. Features include:

  • Simple installation and configuration
  • Tight integration with InfluxDB making getting connected to data easy
  • Support for ad-hoc visualizations
  • Smart query builder designed to work with large datasets
  • Collecting multiple graphs into dashboards
  • Templating, new graph types and visualizations coming!

 

Another visualization UI choice that offers tight integration with InfluxDB is the open source Grafana project. Either choice makes connecting to and visualizing time-series data, simple.

Learn more about Chronograf
Learn more about InfluxDB & Grafana

Processing Monitoring Metrics

Inevitably, you are going to want to either alert on or in some way process the time-series data in your monitoring system. You’ll want to do this either before it gets written to InfluxDB or when it is retrieved. To address this need, the InfluxData platform ships with the open source Kapacitor project. Kapacitor is the “K” in the the TICK stack. It’s an alerting and data processing engine specifically designed for time-series data. It lets you define your own custom pipeline to aggregate, select, transform or otherwise process data and then store it back in InfluxDB or trigger an event. Features include:

  • MIT licensed
  • Stream data from InfluxDB or query from InfluxDB
  • Trigger events/alerts based on complex or dynamic criteria
  • Perform any transformation currently possible in InfluxQL, for example: SUM, MIN, MAX, etc.
  • Store transformed data back into InfluxDB
  • Process historical data, for example: backfill data using a processing pipeline


Learn more about Kapacitor

Testimonials

InfluxData products are used for custom DevOps monitoring by startups and large enterprises alike. Visit our Testimonials page for a comprehensive list.


Next: Real-Time Analytics

InfluxCloud

InfluxDB Clusters + Grafana on AWS

14 Day Free Trial

InfluxEnterprise

Highly-Scalable InfluxDB Clusters on Your Infrastructure with a Management UI

Learn More