Why You Need a Centralized Approach to Monitoring
Jason Myers /
Feb 06, 2023
This article was originally published in The New Stack and is reposted here with permission.
With a standard model for monitoring data across the organization, different teams can use a common infrastructure and extract maximum value from it.
Monitoring (also sometimes referred to as observability) involves collecting and analyzing data from a source over time to track its health and/or performance. Because change occurs over time, virtually all monitoring data is time series data, meaning it has a timestamp. So when anyone talks about monitoring data, they’re talking about time series data by default.
It is common to find multiple teams within a single organization that each have its own monitoring solutions. For example, some may store monitoring data in a relational database, while others use something more appropriate for this kind of time series data — a time series database. The fact that some teams use more effective tools than others, even within the same organization speaks to the problem of organizational silos.
These usually reflect an organizational structure and how investment approval happens within the organization. However, this established way of doing things creates redundant overhead from procurement and deployment costs, and can create inefficiencies when working with the data itself.
Each team or role that uses monitoring data has different requirements for it, which makes the data itself an asset. But stakeholders need to analyze data to gain value from it. Additional insights from reusing and repurposing data, like raw material in manufacturing plants, for other uses generates even more value. If one team stores data in a proprietary solution, they’re limiting the capabilities of all the other teams in that organization.
To take full advantage of collected data while also addressing the needs of all monitoring data users in a timely manner, engineering teams are adopting a centralized approach to monitoring with a “Metrics as a Service” model.
Metrics as a Service (MaaS) is an approach to organizational data management whereby a company stores monitoring data in a central location so that different stakeholders can easily access it. With a standard model to provision monitoring data across the organization, different teams can use a common infrastructure and extract maximum value from collected data. This approach avoids siloes and vendor lock-in, offers a better return on investment (ROI), and frees people up to work on higher-value tasks.
Monitoring data stakeholders
To build a MaaS solution, it’s important to understand who the stakeholders are and how they use monitoring data. We can identify at least three key groups that typically work with this kind of data.
These users care about the stability, availability and reliability of production environments. Therefore, it’s critical for them to have visibility into resource consumption, monitor system health and status and use diagnostic data for fast recovery. Monitoring data comes from a variety of sources to make sure that the underlying computing and networking infrastructure is functional and responsive to the needs of the applications running on it.
Primary concerns for IT operations monitoring include, but aren’t limited to, resource status and state of:
- Physical devices, operating systems and virtual machines
- Containers and orchestration of containerized services
- Network and meshed services
This group of users primarily focuses on agility and performance. To gain insight in these areas, developers require granular data related to inter- and intra-system observability, tracing and end-to-end experience.
App developer concerns include things like:
- How performant an application and its dependencies are
- How long it takes to make new code available
- How fast can they find the root cause of an issue and restore a functional state
By controlling the code, developers can easily expose application custom metrics, push events, generate logs and trace delays to satisfy their observation needs. Once exposed, developers need a way to ingest, visualize, analyze and store that data, which demands high-performance writes and reads. Developers use this data to build effective solutions with a minimal change failure rate.
This group of people seeks to discover trends that affect profits and growth and to find opportunities to improve efficiency and efficacy. To accomplish this, business managers and data engineers rely on monitoring the dynamics and success rates of transactions, user activities and business outcomes in correlation with other business dimensions.
Measuring success relies on key performance indicators (KPIs) that inform stakeholders how well things are working and can even shed light on how to achieve better results. Generating the input to feed KPIs requires advanced analytics that perform correlation, aggregation, summations and operations across measurements and multiple data sources. Processing data across a combination of factors reflects the complexity of systems that impact business indicators and presents a more holistic picture.
The overall goal is to keep a positive trend on business growth and to keep service-level agreements (SLAs) and service-level objectives (SLOs) in check.
Building a metrics-as-a-service offering
While each of these roles has different requirements from monitoring data, we can see how their combined needs cover the entire spectrum from the infrastructure that supports applications, to the functionality of the actual applications, to the end-user results of those applications. Everything is related, so it stands to reason that the monitoring data for each group has an impact on the others.
This interconnectedness is the very reason why a MaaS solution makes sense. One of the attractions to this model is that it turns costs associated with monitoring into an investment in information, which companies can monetize.
This represents a major leap forward from the old model of narrow-focus budget approval for a siloed, cost-sunk monitoring solution. Because the Metrics as a Service model touches all these different groups, to reach its full potential it cannot be effectively implemented with point solutions. It demands the scalability, performance and functionalities of a purpose-built time series platform.
Any Metrics as a Service solution starts with data collection. It should be able to collect data from any source, and support both push and pull methods. An open source data collection agent like Telegraf has hundreds of plugins, so it can handle virtually any data source.
Telegraf is a lightweight tool, so it can run on traditional infrastructure, as well as on edge devices or in containers and virtual machines. This flexibility makes it ideal for the different needs of monitoring stakeholders.
Collected data needs somewhere to go and the best tool to use for storing this data is a time series database. A purpose-built time series database like InfluxDB provides multiple deployment options, including a self-managed and fully managed cloud. It integrates seamlessly with Telegraf to create efficient, durable and reliable data pipelines.
Just as important, InfluxDB can use Telegraf to output data to a wide range of destination sources, and it also exposes HTTP endpoints to give a broad range of users access to the data stored within the database.
With these two tools, each group can set up their own Telegraf instances to collect the data they need and send that to separate data buckets in an InfluxDB instance. This approach allows each group to create isolated data pipelines within the InfluxDB platform, but because they store everything in InfluxDB, users can query data from any bucket to which they have access.
Creating value from data
The Metrics as a Service model brings changes that extend beyond data centralization. It shifts monitoring data from tactical and circumstantial decision-making processes to the strategic planning level for an organization. This shift fundamentally affects the way managers, providers and consumers view monitoring data because it provokes a change in perspective led by:
- ROI: This is the ability to quantify the value of data based on its impact on business performance. A potential outcome in this area is the opportunity to turn IT into a profit center.
- Accountability: Added visibility into data consumption that aligns with an ROI approach to monitoring requests.
Stakeholders need to process, analyze and visualize their monitoring data to better understand it and to draw insights and value from it. InfluxDB provides multiple options for querying data, including SQL and Flux lang (InfluxDB’s native scripting language).
InfluxDB has native dashboarding and visualization tools, but users can also leverage third-party integrations with tools like Grafana to create visualizations. The platform also provides Restful APIs that give users granular control over their data and the flexibility to extend the capabilities of the platform into other applications and systems.
This is just the tip of the iceberg when it comes to using a time series platform to build a Metrics as a Service solution. But as a broad framework, this approach holds true across verticals and use cases. Leveraging a purpose-built time series platform to handle critical time series data benefits all stakeholders because it simplifies the process of creating value from data.
Furthermore, a platform based on open source, like InfluxDB, provides a cost-effective way to consolidate monitoring solutions across an organization. The flexibility of a platform like InfluxDB also simplifies the process of centralizing data, encourages cross-functional collaboration and innovation and democratizes data access so key stakeholders have the information they need to make decisions to benefit their organization.