Forecasting and Visualizing Time Series with Tableau and InfluxDB Cloud
By Anais Dotis-Georgiou / Apr 06, 2023 / InfluxDB Cloud
Data analysis is a crucial aspect of any business or organization because it helps with making informed decisions and improving overall performance. However, with the vast amounts of data generated every day, it can be overwhelming to manually analyze and derive insights from it.
Thankfully, with the advent of powerful data analytics tools such as Tableau and InfluxDB Cloud powered by IOx, organizations can now easily store, process, and analyze their data in real-time. Tableau, a leading data visualization and business intelligence software, provides an intuitive and user-friendly interface for creating interactive visualizations and dashboards. InfluxDB Cloud is a cloud-based platform, built on top of the Apache ecosystem, that enables organizations to collect, store, and analyze time series data at scale. The IOx storage engine provides near-unlimited cardinality, and high-performance storage and querying capabilities.
Combining the power of Tableau and InfluxDB Cloud provides organizations with a powerful toolset for data analysis. This combination enables organizations to quickly create appealing and insightful visualizations, allowing them to derive insights and make data-driven decisions faster. In this blog post, we’ll explore how to use Tableau with InfluxDB Cloud for data analytics. We’ll upload Parquet files from InfluxDB Cloud to Tableau, query with SQL, create some visualizations, and forecast our data with Tableau.
Note: InfluxDB Cloud is built on top of the Apache ecosystem including Apache Arrow Flight SQL which enables interoperability with a variety of visualization tools. However, further development is required before you can use the JDBC Flight Arrow driver to connect Tableau directly to the InfluxDB Cloud server. I encountered the following issue #2415. This post will be updated to use the JDBC Flight Arrow driver when it’s available. However, InfluxDB Cloud users will also be able to query Parquet files directly as well and use the approach described here in the future as well. While these features aren’t currently available, this post aims to shed light on the future of InfluxDB Cloud.
Uploading Parquet files in Tableau
For this tutorial we will be loading a portion of the airSensor sample dataset from InfluxDB Cloud to Tableau. Specifically we’ll be loading temperature, humidity, and carbon monoxide levels for sensor TML0100. You can find the Parquet file from InfluxDB Cloud, the corresponding Parquet to Hyper file conversion script, and the resulting Hyper file in this repo.
In order to ingest Parquet files you must first convert the Parquet file to a Hyper file. To convert our data into a Hyper file we must understand what our current data looks like. Use the following command to view the columns, the types, and whether they’re nullable.
SHOW COLUMNS FROM “airSensors”
This information is important for when we construct a TableDefinition with the Tableau Hyper API.
create_hyper_file_from_parquet_file.py converts Parquet files to Hyper files. Most of it is boilerplate. In order to write our data, focus on the following section of code:
if __name__ == '__main__': try: # The `airSensors` table to read from the Parquet file. table_definition = TableDefinition( table_name="airSensors", columns=[ TableDefinition.Column("co", SqlType.double(), NULLABLE), TableDefinition.Column("humidity", SqlType.double(), NULLABLE), TableDefinition.Column("sensor_id", SqlType.text(), NULLABLE), TableDefinition.Column("temperature", SqlType.double(), NULLABLE), TableDefinition.Column("time", SqlType.timestamp(), NOT_NULLABLE), ] ) run_create_hyper_file_from_parquet( "airSensors.parquet", table_definition, "airSensors.hyper")
This is where we define our pathways for the source Parquet file, “airSensors.parquet”, and destination pathway of our Hyper file, “airSensors.hyper”. This is also where we make our TableDefinition, a SQL table definition. We use the information from our
SHOW COLUMNS command to inform our TableDefinition for our table, airSensors, as defined in the table_name variable. See all supported SqlTyes types here. If the script runs successfully, you should see the following terminal output.
python3 create_hyper_file_from_parquet.py COPY "airSensors" FROM 'airSensors.parquet' WITH (FORMAT PARQUET) -- 2888 rows have been copied from 'airSensors.parquet' to the table "airSensors" in 'airSensors.hyper'.
Visualize your time series data from InfluxDB in Tableau
To make a time series graph in Tableau, we must first load our Hyper file into Tableau. In the To a file section, select More… then select the Hyper file and click Open.
Now you can view your Data Source and the data there. Click on Sheet 1 to make a time series visualization. Specifically we’ll graph carbon monoxide levels over time.
In Sheet 1, drag and drop the Co values into the Rows shelf and the Time into the Columns shelf. To view the data with the correct timestamp resolution, right click and select More. By default we’ll visualize the sum across all of the sensors.
To view the raw Co values for each sensor, drag the
sensor_id into the Marks panel and set it as a Dimension by right clicking on it. Next, click on the icon to the right of
sensor_id and select Color to produce the four-colored circle icon and assign a color to each carbon monoxide time series based on the sensor value.
Finally, right click on the Co value in the Rows shelf and change it from a Measure to an Attribute to produce ATTR(Co) instead of SUM(Co).
Forecasting in Tableau
Tableau has many capabilities, including forecasting. One of the benefits of forecasting in Tableau is that it performs the algorithm selection automatically, so you can focus on taking action on forecasted data rather than generating forecasts. Now, let’s generate a forecast for the sum of our Co values by following this documentation. Specifically, go to the Analytics tab and drag Forecast onto your data. If you get the error “A forecast cannot be computed because there are too many missing values” try reducing the resolution of your data by changing the time resolution to MINUTE(Time) in the Columns shelf. If you encounter other errors while forecasting your data in Tableau, consult the following documentation.
I recommend using the following resources when working with Tableau, Parquet, and Hyper files:
- The repo that accompanies this tutorial.
- Tableau Hyper API documentation.
- Tableau Hyper API samples, a repo containing both Tableau-Supported Hyper API Samples and Community-Supported Hyper API Samples for converting data into Hyper files.
- Resolving Forecasting Errors documentation.
- Create a Forecast documentation. To take advantage of all the advancements with InfluxDB IOx, sign up here. If you would like to contact the InfluxDB IOx developers, join the InfluxData Community Slack and look for the #influxdb_iox channel.
I hope this blog post inspires you to explore InfluxDB Cloud and gets you excited about its future — especially the ability to transport large datasets from InfluxDB for data processing using the tools of your choice with the help of Parquet. If you’re interested in this interoperability, please reach out using our community site or Slack channel. I’d love to hear about what you’re trying to achieve and what features you’d like InfluxDB Cloud to have.