What to Expect When You’re Expecting InfluxDB: A Guide

Navigate to:

Well, you’ve done it. You decided to take the plunge with InfluxDB. While vast and diverse possibilities await, you may have more short-term concerns. Namely: now what?

Getting started looks different for everyone because no two users are doing the exact same thing. This post is primarily aimed at InfluxDB Cloud Dedicated and InfluxDB Clustered users (or any other products that include support agreements. You can chat with one of our sales folks if you have questions about that).

Our aim is to equip you with best practices and a clear set of expectations from the get-go. For those of us who like to read the instructions before we start putting that IKEA furniture together, this will make sense. If you’re the type to step up to the plate with a bandolier of hex wrenches and a disdain for instructions, this is one project where you may want to take a more cautious approach.

The goal here is to reduce (or eliminate!) headaches or issues when using InfluxDB.

Before you start

One of the most important things to figure out is to make sure you get the correct product for your workload. InfluxDB has a range of options to fit workloads of any size. While the following tips are true for all users, they’re critical for those with large workloads, which is why we’re focused on InfluxDB Cloud Dedicated and InfluxDB Clustered.

Schema design

Yes, InfluxDB is a schema-on-write database, which is really useful for workloads that change shape. However, InfluxDB also allows users to design their schema. Being able to map data to tags or fields can optimize your data collection and analysis. Here are a few things to keep in mind when you’re designing your schema.

  • Number of columns: Currently, an InfluxDB 3.x measurement supports a maximum of 250 columns. One is reserved for timestamps, giving you 249 columns for tags and fields. That’s a lot of columns! (If you need a refresher on InfluxDB’s line protocol data model, go here.) If you need more than 249 columns, you might think about narrowing your schema before you start writing a lot of data.
    • For example, instead of putting all the data from one plant into a single measurement, perhaps create a measurement for each machine in the plant and then roll those up to a single dashboard. That way, you can collect higher-resolution data on each device without running out of columns for your entire plant.

A screenshot of a computer
Description automatically generated

In the left panel, multiple machines send data to the same measurement, increasing the number of rows necessary. In the right panel, each machine has its own measurement.

  • Data types: Make sure you know what type of data your device(s) produce. Different components of InfluxDB line protocol accept different data types. For example, tag keys and values, along with field keys, must be unquoted strings, while field values can be quoted strings, floats, integers, unsigned integers, and booleans.
  • Tags: Tags are metadata that contextualize your data. Tags aren’t required but we strongly recommend using them. For example, if you have robotic arms in multiple plant locations, you would use tags to designate which robot is in what facility.
  • Fields: As mentioned above, you have a lot of flexibility when it comes to field data types. With that said, you need to be careful to avoid field-type conflicts. These occur when you have a field mapped to a data type but try to add a different type to the same field. InfluxDB is schema-on-write, and any undesignated numeric values are parsed by default as floats. Some client libraries may inherit data types from their own typing system and assign them to the field. However, you can explicitly designate file types in raw line protocol or through your client library of choice.
    • If you want to write a numeric value as an integer, you have to include the ‘i’ suffix to the value. So value=“96.0” becomes value=“96i.” If you need to change field data types, you can use Telegraf and the Converter processor plugin to convert data types. (Be sure to read the documentation to ensure the plugin fits your use case.)
  • Professional Services: Customers with support contracts can also leverage InfluxDB’s professional services team to get help with schema design and best practices. Remember, it’s better to take advantage of this before you start writing data to prevent additional work down the line. Check with your account manager for more information on this option.

Partitioning

When you query data, the amount of data the database needs to sift through to find what you need impacts the query response time. The more data the query needs to go through, the longer it takes. By default, InfluxDB 3.x partitions data by day and persists that data as Apache Parquet files. Partitioning your data splits it into smaller, logical groups so that queries can target smaller data sets and return results faster.

The best way to partition your data will depend on your data and what kinds of queries you want to run against it. To return to our temperature example, if you’re collecting data from across the country, you might want to partition the data by city, state, region, day, or month. Just be sure you don’t create too many partitions because that can also impact query time. As with anything database related, you will need to find the balance and weigh the tradeoffs between query response time and storage.

Check out the partitioning best practices in our docs for more information.

After you start

For users that have support agreements, like those using InfluxDB Cloud Dedicated and InfluxDB Clustered, once you’re ready to get going, the InfluxDB support team will reach out directly to your organization to make sure all your licenses are in order and in the case of Dedicated, to provision your cluster. At this time, our team will also ensure that we know who from your organization should have access and get them set up properly. We’ll also schedule a call with one of our Support Engineers to begin the onboarding process. During this initial call, we can help with any schema or partition checks at that time.

Following the initial onboarding, we will proactively schedule a system health check-in. We conduct these on a quarterly basis. The health check-in is an opportunity for us to analyze and understand your production environment(s) and to make sure that we understand ‌how your organization defines success so that we can help you achieve it. This may include discussions around utilization and potential growth opportunities, queries and query optimization, outages, resizing, any other best practices that may help your use case, and a review of any issues or support tickets your organization filed since the last check-in.

At the end of your check-in, we will schedule the next one so that we have a consistent line of communication and ensure you always have a venue to surface questions or issues.

Final thoughts

Plenty of resources exist to help you get the most out of InfluxDB. For those with support contracts, it’s helpful to understand what you can expect from that agreement. Support can be a big difference-maker for some organizations. With regular system health checks, our support team works diligently so you have everything you need to reach your goals.

Learn more about InfluxDB 3.x here.