Automating Storage Forecasting Using a Time Series Database Puts the Future in Customers' Hands Today
By Chris Churilo / Jun 18, 2020 / InfluxDB, Community, Developer
When the stakes are high, every decision is only as good as the information behind it. With the right information, enterprises and vital sectors can confidently make informed decisions. Data becomes a foundation for action and a source of differentiation. But how do you store the relentless influx of data especially since data storage costs, amplified by the risk of data loss, are among the top hurdles facing organizations today?
Proactive support through storage forecasting
Managing the cost of data storage is among the challenges that Veritas Technologies, a leader in data backup center and recovery, has taken on. Veritas Technologies, whose mission is to enable people to harness the power of information, offers a complete family of data protection and long-term retention appliances that ensure data availability, reduce costs and complexity while increasing operational efficiency. Among these are NetBackup appliances (NetBackUp is a leading backup and recovery software for enterprises, commonly used for backing up data centers).
Veritas has more than 10,000 NetBackup appliances deployed actively using Auto Support and daily reporting, internally, several types of telemetry data. Previously, Veritas measured problems through their auto-support capabilities. They had years of Veritas AutoSupport information and hundreds of millions of telemetry data points from their deployed appliances. But they didn’t have any analytics for forecasting to enable preventing problems from happening. Visibility was a rear-view mirror. If the appliance runs out of storage, then the backup fails. A backup fail means that at that point, if any type of event happens in the infrastructure of a company, there is a risk of data loss.
Veritas needed to proactively reduce downtime with NetBackup Appliances in order to lower risk and save cost for its customers, so they built Veritas Predictive Insights: a SaaS platform that uses Artificial intelligence (AI) and machine learning (ML) to deliver predictive support services for Veritas appliance customers by detecting potential issues and offering prescriptive remediation before problems can occur. Storage forecasting runs in Veritas Predictive Insights to track storage consumption of NetBackup appliances and reduce downtime.
What is forecasting?
Forecasting is the process of making predictions of the future based on present and past data. The key assumption behind forecasting is that the way in which the environment is changing will continue into the future. Since forecasts are error-prone, what makes them useful is when the error involved in a given forecast is small for the type of use case being addressed.
Storage forecasting uses
The availability of a vast amount of time series data (collected for use internally from Veritas’ AutoSupport capabilities) enabled forecasting for several use cases. The most important is storage forecasting. Predictive analytics generated by Veritas Predictive Insights, which provide forecasts on probable events using past data, enable visibility and preventive action. Veritas wanted to utilize storage usage forecasts for:
- Resource planning
- Detection of workload anomalies
- Identifying possible data unavailability or SLA violations
- Capitalizing on sales opportunities
Veritas Predictive Insights is built on years of Veritas AutoSupport information and hundreds of millions of telemetry data points from over 10,000 Veritas appliances. Veritas had been using this data before internally, for their support engineers, when they would take a call about a particular problem. The customer could also access this data by logging into their support account. But there were no predictive capabilities from it. They didn’t have the engine.
Solving the challenge of storage forecasting automation
Once Veritas built the hardware setup in their ML platform, they needed to automate storage forecasting. This presented a lot of challenges because they had more than 10,000 appliances. For each appliance, they were forecasting for each type of storage partition. It was impossible to run the forecast manually for this massive data volume, so they needed to automate storage forecasting.
The challenge was to automate a historically manual process handcrafted for the analysis of a single data series of just tens of data points to large-scale processing of thousands of time series and millions of data points. The natural next step? To select a time series database.
Time series forecasting at scale using a time series database
Veritas chose InfluxDB time series database to implement their solution for tackling the issues of time series forecasting at scale, including continuous accuracy evaluation and algorithm hyperparameters optimization. They use InfluxDB for their storage forecasting implementation of data in Veritas Predictive Insights which is capable of training, evaluating and forecasting over 70,000 time series daily. Veritas chose InfluxDB because it is purpose-built for time series data. This made it easier to work with time series data than other types of databases.
The custom ML platform architecture
<figcaption> Veritas Technologies’ custom ML platform design</figcaption>
Autonomous AI and ML-based data and infrastructure management
For each appliance, the telemetry data generates a System Reliability Score (SRS) a simple health score using an additive machine learning (ML) model. The model aggregates inputs from different ML processes to predict appliance health and displays the results in an easy-to-understand format.
<figcaption> A simple score resulting from complex aggregations. Image source.</figcaption>
The higher the SRS, the better the appliance is operating and the lower the chances of unplanned downtime.
Solving the three challenges of forecasting automation
To automate storage forecast implementation, Veritas had to overcome the three challenges of forecasting automation:
1- Determining which model is the best
Selecting the best model for the type of data they have can be done manually because they assume that the data is coming from a similar source. But numerous issues remained, such as how they handle missing values, outliers, trend & seasonality, trend changepoints, and algorithm parameters. They tackle them through algorithm adjustment, advanced detection methods and forecasting tools.
2- Evaluating the model's accuracy in production
Every time the model is run, an accuracy result is generated. They store this result as a percentage with a timestamp in InfluxDB, so they can monitor model accuracy over time. When model accuracy hits below a certain threshold, they can go back either to change the model or change parameters within the model in order to improve forecasting.
The more data, the more likely that your forecast will be accurate, but the slower the processing. To combat that, you make adjustments to those models, based on the accuracy, and that is the whole purpose of time series data: monitoring performance over time. You want to monitor performance of models, and that will trigger a re-look at models and parameters of models.
3- Continuously tuning the model
They had to solve several model update problems. They had thousands of models, and each had to be tuned for a specific time series (i.e., they need to tune more than 70,000 models). They also needed to adapt to changes in the underlying process to keep the model accurate. Further, backtesting is computationally too expensive, but they have the online validation data. To solve these problems, they rely on a mathematical tool called Sequential Model-Based Optimization (SMBO), which iterates between fitting models and using them to make choices about which configurations to explore. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then choose new hyperparameters to test based on these models.
Predictive analytics to provide proactive support
Utilizing encrypted data from thousands of Veritas appliances, Veritas Predictive Insights’ cloud-based AI/ML Engine today detects potential issues and monitors system health to create proactive and prescriptive remediation. Veritas Predictive Insights enhances Veritas product and customer satisfaction and helps customers:
- Increase operational availability
- Resolve potential issues before they occur
- Reduce TCO by optimizing storage investments and avoiding over provisioning
<figcaption> Veritas Predictive Insights is always on and always learning. Image source.</figcaption>
The technology to know, the knowledge to act
Continuous AI/ML self-learning processes in Veritas’ platform constantly improve insights and accuracy, identifying patterns, predicting trends and optimizing resiliency and utilization with intelligent forecasting and predictive maintenance.
Powered by InfluxDB as its time series database, Veritas Predictive Insights delivers immediate value for both new and existing installations with prescriptive support services that can mitigate problems before they occur.
For Veritas and its NetBackup appliance customers, visibility into the future through predictive analytics has enabled acting in time which can make all the difference in organizational decision-making, service and security outcomes.
Learn more about this forecasting automation use case.
If you’re interested in sharing your InfluxDB story, click here.