Mist Clears the Way for Multicloud Observability

Navigate to:

This article was originally published in The New Stack.

A multicloud strategy is a necessity for modern businesses, as the recent AWS outages made clear, but managing this infrastructure remains a huge challenge. Infrastructure management teams have long struggled to juggle diverse technology solutions, policies and services to get access to a point-in-time view of their resources. The result is either waste through overprovisioning or huge overheads for nitpicking manual management and repetitive tasks.

A new breed of management apps is stepping into the breach to give teams a single platform to observe and manage their resources across clouds. Mist, powered by the time series database InfluxDB, empowers IT teams to manage their infrastructure by providing a unified interface enabled by a flexible REST API.

Background

Mist is an open source multicloud management platform that draws insights from public and private clouds, hypervisors, containers and bare metal servers. In the same UI, it powers easy provisioning, orchestration, monitoring, cost management and automation.

The application was born of frustration: Mist’s cofounders were running an IT consulting firm and their customers were proliferating across cloud services, creating a nightmare for them to manage.

Cofounder Chris Psaltis said they were faced with daily questions from customers about the resources they had, where they were, how to control access and how to automate common processes.

“We had customers around the world, and their infrastructure was all over the place: AWS, on-prem, co-located … In order to make good decisions, you need data, and that’s why monitoring metrics are essential when managing infrastructure. Since the early days of Mist, we integrated with monitoring tools so we can pool those monitoring metrics inside the platform and then help our users make good decisions,” he said.

Contextual data drives smart decisions

A view of overall VM utilization across different cloud services

A view of overall VM utilization across different cloud services

Mist connects to infrastructure providers with a native API to manage virtual machines and view their performance including CPU usage and load average. It can spin up, provision, troubleshoot and destroy individual VMs, as well as monitor and compare performance across different clouds. It can show how many VMs are on each cloud, how much they cost and their performance. Users can set rules specific to machines or clouds, and dig through logs. They can execute scripts across machines to install applications and make backups, or run apps in clusters across any resources in that cloud.

Mist allows for fine-grained resource permissions, so managers can easily set access based on resource, at the cloud level, globally and via tags. For example, they can set different policies for administrators, dev and QA teams. They can also configure Mist to force their teams to set budgets and manage the destruction of resources to avoid cost overruns. If something goes wrong, the logs are easily accessible in one Mist dashboard.

Though Mist was originally conceived and built on Collectd and Graphite, the team soon migrated to InfluxDB and its collection agent, Telegraf, because they are built for time series data.

Infrastructure monitoring data lifecycle

The architecture of Mist's monitoring subsystem.

The architecture of Mist's monitoring subsystem

The Mist app is built on InfluxDB and uses the time series database as its core for both monitoring and metering data. Data is collected from the target VM or bare-metal server by the Telegraf agent, which sends monitoring data streams into InfluxDB. Metering data is aggregated by Gocky and also fed into InfluxDB. The metering data is important for users to be able to measure their capacity, for example, data points per machine and per organization. From there, the Mist plugin Cilia acts as a rules engine, evaluating the streaming data based on a set of rules and triggers actions like executing scripts and running alerts or webhooks.

With InfluxDB as its core, Mist’s developers could take advantage of its open source MIT license to build on it and use it in production. They didn’t have to preconfigure metric granularity or preallocate disk space because InfluxDB provides these capabilities by default. This reduces Mist’s operational burden when it comes to scaling their service and also saves them money on hardware.

Using Telegraf also means their users can access its rich array of over 200 plugins to collect data from almost any data source with minimal configuration. Being able to deploy Telegraf as a standalone binary without dependencies also streamlines Mist’s deployment process.

Monitoring VMs in Mist

Users can get started in Mist almost instantly, choosing their cloud from a list of supported providers and pasting an API token to start collecting data. Using an upstream, unmodified version of Telegraf, Mist immediately starts recording performance. If Mist doesn’t have SSH access to the target machine, users can deploy the agent manually.

By default, the dashboard gives users a view of all the resources for that cloud and their costs. From here, users can see network information, file systems, CPU usage, processes, and drill down into individual VMs and logs. They can also write custom Python scripts to collect non-default information specific to their needs; for example, ping times to an IP address.

Automating for peak performance

Some common rules and automations in Mist.

Some common rules and automations in Mist

The Cilia plugin continuously queries InfluxDB for time-series data and Elasticsearch for log data. Cilia allows users to manage alerts and other workflows based on specific triggers. For example, if there is more than one new machine spinning up per minute, which might indicate overprovisioning, it will send an email to a specific team.

Mist takes advantage of InfluxDB’s custom logic to identify dynamic thresholds and automatically scale infrastructure. Users can trigger webhooks or restart tempestuous VMs automatically. They can create rules by VM, cloud, groups of clouds or tags, that will apply to new VMs that are added later, with no need for configuration.

Using the InfluxDB platform, Mist developed a powerful app that smooths and democratizes infrastructure management, saving organizations money and paving the way for more automation.

To learn more about InfluxDB, visit www.influxdata.com.

About the author:

Lyndal Cairns InfluxData

Lyndal Cairns (she/her) has been writing about technology since the internet made screeching noises down a phone line. A former journalist, she approaches her job with curiosity and a deep desire to understand how technologies can help us thrive.