TL;DR InfluxDB Tech Tips: Migrating to InfluxDB Cloud

Navigate to:

If you’re an InfluxDB user you might be considering migrating your workload to InfluxDB Cloud. You probably want to free yourself from the responsibilities associated with managing and serving your OSS account. Perhaps you are finding that you simply cannot scale your OSS instance vertically to meet your needs. Maybe you want to use all of the Flux functions that are available to you in InfluxDB Cloud. Perhaps you’ve decided that you want a centralized instance to store IoT data from OSS instances at the edge and you want to take advantage of the InfluxDB Edge Data Replication feature. Whatever the case, it’s important to be aware of scaling considerations when making the move.

If you’re a light OSS user, migrating shouldn’t be a problem for you and you should be able to follow the OSS to Cloud migration guide without issues. However, if you’ve been scaling your OSS instance vertically to tackle impressive query and write workloads, then you need to readjust some of your thinking when migrating to Cloud. Specifically, you need to change your thinking from “I’ll just supply more resources to my InfluxDB instance to query high volume data” to “I need to think about how I can leverage the task system in InfluxDB to get the query performance I expect. I need to create a continuous compute solution so that InfluxDB reduces my expensive queries into component parts”.

InfluxDB-cloud-vertical-scaling

The above is a diagram describing the change in approach users must take to executing heavy queries in InfluxDB Cloud. The use of continuous compute solutions through leveraging the task system enables users to return the same data that they were previously obtaining with inefficient queries.

Problems for users migrating to InfluxDB Cloud

There are three types of users that migrate to Cloud:

  1. Enterprise users

  2. OSS users

  3. Cloud 1 users

The most common problem that these users face when migrating to InfluxDB Cloud is that they experience query timeouts in InfluxDB Cloud. For Enterprise and Cloud 1 users this is because Enterprise and Cloud 1 supports vertical and horizontal scaling. InfluxDB Cloud or Cloud 2 doesn’t offer vertical scaling. Additionally, queries time out at 2.5 minutes. Similarly, many OSS users dramatically scale their OSS instance vertically by supplying extra compute and storage resources or by increasing the query timeout time to accommodate Flux scripts that query and transform large amounts of data. However these users still want to take advantage of all the following features unique to InfluxDB Cloud:

  • Unified APIs across InfluxDB OSS v2 and InfluxDB Cloud

  • Edge Data Replication feature

  • Immense data processing and transformation capabilities with Flux

  • A managed serverless and elastic service

  • Multi-tenanted, horizontally scalable time series storage. To learn more about how InfluxDB Cloud offers horizontal availability with Kafka, read this blog.

  • A unified developer experience. Enterprise, Cloud 1, or OSS 1.x users no longer need to manage a stack of products to manage their time series data (Telegraf, InfluxDB, Chronograf, and Kapacitor). Neither do they need to learn multiple query languages (InfluxQL, continuous queries, and TICKscripts). Instead they can just use InfluxDB Cloud and Flux.

Avoiding query timeouts and continuous computing

In order to avoid query timeouts, migrated users must take advantage of the Task system to offload query workload there. For example, imagine that you were an OSS user. You used Flux to query the last 30 days of data, downsample it, and then perform a calculation. Your Flux query might look like this:

from(bucket: "<bucket>")
  |> range(start: -30d)
  |> filter(fn: (r) => r["_measurement"] == "<measurement>")
  |> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
  |> yield(name: "mean")
  |> map(fn: (r) => ({ r with _value_celcius: (float(v: r._value) - 32.0)*(5.0/9.0) }))  
  |>  yield(name: "converted to celsius")

In this instance, the user is interested in:

  1. Querying over a large amount of data

  2. Viewing the downsampled data, or mean, of that data over 1h periods

  3. Performing a calculation on that downsampled data, i.e. converting the values from Fahrenheit to Celsius.

It’s possible that this user was able to execute this query by extending the query duration time above the default 10s with the query manager. Alternatively maybe they scaled their VM up to provide their OSS instance with more resources to execute this query faster. Either way, these options aren’t available to them in InfluxDB Cloud anymore. Instead they must partition this work into a series of tasks. These tasks will execute this work in separate steps and write the intermediate values to new buckets. Aside from enabling the user to perform this query, splitting up a query into tasks offers the additional following advantages:

  • Faster query execution

  • Faster dashboard loading that relies on the outputs of these intermediate tasks

  • Some data redundancy across different buckets

The user must instead complete this work in two tasks:

  1. The first task will perform the downsampling at a higher frequency and write the output to a new bucket. Additionally the user should consider creating a downsampling task for each tag instead of downsampling all of the data simultaneously to reduce the query and break down the query further.

  2. The second task will perform the calculation and write the calculation to a new bucket or new measurement.

This way the user must only query the output from the second task.

The first task might look like this:

option task = { name: "downsample", every: 1h0m0s, offset: 5m0s }
from(bucket: "<bucket>")
  |> range(start: -task.every)
  |> filter(fn: (r) => r["_measurement"] == "<measurement>")
  |> filter(fn: (r) => r["<takKey1>"] == "<tagValue>")
  |> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
  |> to(bucket: "<downsampled and calculation bucket>"

The second task might look like this:

option task = { name: "calculation", every: 1h0m0s, offset: 5m0s }
from(bucket: "<downsampled and calculation bucket>")
  |> range(start: -task.every)
  |> filter(fn: (r) => r["_measurement"] == "<measurement>")
  |> filter(fn: (r) => r["<takKey1>"] == "<tagValue>")
  |> map(fn: (r) => ({ r with _value_celcius: (float(v: r._value) - 32.0)*(5.0/9.0) }))  
// this will overwrite your downsampled points to include the downsampled value and your celsius calculation
  |> to(bucket: "<downsampled and calculation bucket>"

After the tasks have been running for a while, the user will be able to query for their data with:

from(bucket: "<downsampled and calculation bucket>")
  |> range(start: -30d)
  |> filter(fn: (r) => r["_measurement"] == "<measurement>")
  |>  yield(name: "fast query”)

And return the data without a query timeout. If you want to be able to query over the long range from the start, you might have to perform some of that downsampling separately. You can do so without a task. Handle this by querying for specific tags, incrementally changing the range, and writing the result to your “downsampled and calculation bucket”.

Final thoughts on query timeout for migrating users

I hope this blog post inspires you to take advantage of the task system to reduce the query load and take advantage of InfluxDB Cloud. If you are an InfluxDB open source user and need help, please reach out using our community site or Slack channel. If you’re developing a cool IoT application on top of InfluxDB, we’d love to hear about it, so make sure to share it on social using #InfluxDB! Additionally, feel free to reach out to me directly in our community slack channel to share your thoughts, concerns, or questions. I’d love to get your feedback and help you with any problems you run into!