<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>InfluxData Blog - Rick Spencer</title>
    <description>Posts by Rick Spencer on the InfluxData Blog</description>
    <link>https://www.influxdata.com/blog/author/rick-spencer/</link>
    <language>en-us</language>
    <lastBuildDate>Wed, 08 Nov 2023 08:00:00 +0000</lastBuildDate>
    <pubDate>Wed, 08 Nov 2023 08:00:00 +0000</pubDate>
    <ttl>1800</ttl>
    <item>
      <title>Write a Database in 50 Lines of Code</title>
      <description>&lt;h2 id="writing-your-own-database"&gt;Writing your own database&lt;/h2&gt;
&lt;p&gt;Writing a database isn’t something that most people set out to do on a whim. There are a ton of different components and things to think about to build something that is truly functional and performant.&lt;/p&gt;

&lt;p&gt;But what if it wasn’t so difficult? What if you could use a set of tools like building blocks, interchanging them in a modular fashion, to write something that simply worked? There can be a lot of ‘reinventing the wheel’ when it comes to databases, so the less time you need to spend on table stakes, the more time you can spend on the features you really need.&lt;/p&gt;

&lt;p&gt;This is where open source and the community of developers that contribute to open source projects come into play. I thought it would be an interesting exercise to see if I could build a database using open source tools and as little code as possible.&lt;/p&gt;

&lt;h2 id="the-fdap-stack"&gt;The FDAP stack&lt;/h2&gt;
&lt;p&gt;Underpinning my thinking on this is a great article my colleague Andrew Lamb wrote on the &lt;a href="https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/"&gt;FDAP stack&lt;/a&gt; (&lt;a href="https://www.influxdata.com/glossary/apache-arrow-flight-sql/"&gt;Apache Flight&lt;/a&gt;, &lt;a href="https://www.influxdata.com/glossary/apache-arrow/"&gt;Arrow&lt;/a&gt;, &lt;a href="https://www.influxdata.com/glossary/apache-datafusion/"&gt;DataFusion&lt;/a&gt;, and &lt;a href="https://www.influxdata.com/glossary/apache-parquet/"&gt;Parquet&lt;/a&gt;). At the same time he was working on that, I was experimenting separately with the Flight Python libraries. In many cases, these libraries represent the best way for users to interact with &lt;a href="https://www.influxdata.com/products/influxdb-overview/"&gt;InfluxDB 3&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After some learning, I put the simplest example of implementing an &lt;a href="https://www.influxdata.com/glossary/olap/"&gt;OLAP&lt;/a&gt; database using Python into a few lines of code. This demonstrates the power and simplicity of the Apache Arrow project. The result really highlights the power of FDAP for creating domain-specific analytic databases.&lt;/p&gt;

&lt;p&gt;This is possible because the Apache Arrow project maintains Python bindings and libraries for pretty much everything. So, starting from Python, I can access the underlying, high-performance code maintained upstream. This is particularly notable in the case of DataFusion, a library written in Rust for executing &lt;a href="https://www.influxdata.com/glossary/sql/"&gt;SQL&lt;/a&gt; queries against Arrow tables. As you will see below, I wrote about eight lines of code, which allowed me to load a Parquet file and execute SQL queries against its contents in a feature-complete and high-performance way.&lt;/p&gt;

&lt;p&gt;This is a trivial example and the code is inefficient, but it proves the power of a healthy upstream project, and the benefits to the community of working upstream.&lt;/p&gt;

&lt;h2 id="flightserver"&gt;FlightServer&lt;/h2&gt;
&lt;p&gt;There is an upstream class called FlightServerBase that implements all of the basic functionality to serve the standard Flight gRPC endpoints. Apache Flight is a protocol that allows users to send commands to the database, and receive back Arrow. Using FlightServer has some important advantages:
The data returns to the client as Arrow. We can efficiently convert ‌that Arrow data to Pandas or Polars. This means we can support high-performance analytics.&lt;/p&gt;

&lt;p&gt;Most popular programming languages have a Flight client library. As a result, users don’t need to use your own bespoke tools, and you don’t need to support those tools either.&lt;/p&gt;

&lt;p&gt;To implement such a simple Flight server, you only need to implement two of the myriad methods on FlightServerBase:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;do_put(): This function receives a FlightDescriptor object and a MetadataRecordBatchReader. While that is a mouthful of a class name, it is really just an object that allows the server to read the data that a user passes in. It also supports nice things like batching, for example.&lt;/li&gt;
  &lt;li&gt;do_get(): This is the function that processes a Flight Ticket, executes a query, and returns the results. As you will see, adding SQL support for this is dead easy.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id="doput"&gt;do_put()&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;do_put&lt;/code&gt; function receives a FlightDescriptor and a Reader, and uses those two objects to add the data to the existing Parquet file.&lt;/p&gt;

&lt;p&gt;While this write implementation is trivial – it is non-performant and lacks any sort of schema validation – its simplicity is also instructive.&lt;/p&gt;

&lt;p&gt;To respond to a write, the code extracts the table name from the FlightDescriptor, and loads the data passed from the Reader into memory. It also calculates the file path for the Parquet file where the data is persisted.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;table_name = descriptor.path[0].decode('utf-8')
data_table = reader.read_all()
file_path = f"{table_name}.parquet"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the case where there is already data with that table name, the code loads the existing data into a table, then concatenates the new and old tables.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;if os.path.exists(file_path):
    try:
        existing_table = pq.read_table(file_path)
        data_table = pa.concat_tables([data_table, existing_table])
    except Exception as e:
        print(e)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Next, it writes the table to a Parquet file:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;try:
pq.write_table(data_table, file_path)
except Exception as e:
print(e)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here is some client code that creates some data, converts it to Arrow tables, creates a FlightClient, and calls &lt;code&gt;do_put()&lt;/code&gt; to write the data:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;import json
import io
from pyarrow.flight import FlightDescriptor, FlightClient
import pyarrow as pa

data = [{"col1":3, "col2":"one"},
{"col1":3, "col2":"two"}]

table = pa.Table.from_pylist(data)

descriptor = FlightDescriptor.for_path("mytable")
client = FlightClient("grpc://localhost:8081")

writer, _ = client.do_put(descriptor, table.schema)
writer.write_table(table)
print(f"wrote: {table}")
writer.close()
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that these are all upstream objects in the Apache Arrow community. We don’t need any custom libraries. If you run the example &lt;code&gt;write.py&lt;/code&gt; file, you will see that it works fine.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;% python3 write.py                
wrote: pyarrow.Table
col1: int64
col2: string
----
col1: [[3,3]]
col2: [["one","two"]]
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id="doget"&gt;do_get()&lt;/h4&gt;
&lt;p&gt;At this point, you might be thinking: “That’s cool and all, but writing files to disk is not hard.” That may be true, but what about writing a SQL implementation that can query that data? Have you ever written a parser, a planner, and everything else needed to make a SQL implementation work? That’s not so easy.&lt;/p&gt;

&lt;p&gt;In this section, you can see how – with just a few lines of code – you can use DataFusion to add a full-featured and fast query experience.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;do_get()&lt;/code&gt; function receives a ticket object, which has the information needed to execute the query. In this case, that includes the table name and the SQL query itself.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;ticket_obj = json.loads(ticket.ticket.decode())
sql_query = ticket_obj["sql"]
table_name = ticket_obj["table"]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, it creates a DataFusion SessionContext and reads the Parquet file into it.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;ctx = SessionContext()
ctx.register_parquet(table_name, f"{table_name}.parquet")
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally, it executes the query and returns the result.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;result = ctx.sql(sql_query)
table = result.to_arrow_table()
return flight.RecordBatchStream(table)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here is some client code that uses the pyarrow library to execute a query and outputs the results as a Pandas DataFrame:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;from pyarrow.flight import Ticket, FlightClient
import json

client = FlightClient("grpc://localhost:8081")
ticket_bytes = json.dumps({'sql':'select * from mytable', 'table':'mytable'})
ticket = Ticket(ticket_bytes)
reader = client.do_get(ticket)
print(reader.read_all().to_pandas())
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;% python3 read.py 
  col1  col2
0  one     1
1  two     2
2  one     1
3  two     2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It’s important to note that a mere eight lines of code allows you to add DataFusion support. In doing so, you get an incredibly fast and complete SQL engine. For example, you can see all the SQL statements DataFusion supports in its &lt;a href="https://arrow.apache.org/datafusion/user-guide/sql/index.html"&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here is a slightly modified SQL query you can also try:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;'select mean(col2) as mean from mytable'&lt;/code&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;% python3 read.py
   mean
0   1.5
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="we-value-open-source"&gt;We value open source&lt;/h2&gt;
&lt;p&gt;At InfluxData, one of our core company values is “We Value Open Source.” We demonstrate this value in different ways. For example, our support for the &lt;a href="https://www.influxdata.com/time-series-platform/telegraf/"&gt;Telegraf&lt;/a&gt; project, and the fact that we always offer a permissively licensed open source version of InfluxDB. With the introduction of &lt;a href="https://www.influxdata.com/products/influxdb-overview/"&gt;InfluxDB 3&lt;/a&gt;, the way we value open source expanded and deepened.&lt;/p&gt;

&lt;p&gt;I’m sure you can think of many companies that bill themselves as the “Company Behind” some open source technology. This model typically entails the company supporting an open source version that is free to use, permissively licensed, and with limited functionality. The company also offers one or more commercial versions of its software with more functionality. In these cases, the open source version largely functions as a lead generation tool for the company’s commercial products.&lt;/p&gt;

&lt;p&gt;InfluxData, the company behind InfluxDB, is no different. For both versions 1 and 2 of InfluxDB, InfluxData also released paid commercial versions.&lt;/p&gt;

&lt;p&gt;True, the InfluxDB source code may be interesting for anyone wishing to build their own database, but that code also probably has limited utility. Sure, it was useful to fix bugs and extend the code. But if you wanted to create a new type of database, like a location database, then the code for our time series database likely isn’t a useful starting point. In other words, the core code and components of the database weren’t inherently reusable.&lt;/p&gt;

&lt;p&gt;This is where things really changed with InfluxDB 3. Yes, InfluxData is still the company behind InfluxDB. Yes, &lt;a href="https://www.influxdata.com/blog/the-plan-for-influxdb-3-0-open-source/"&gt;open source InfluxDB 3&lt;/a&gt; is already in the works. Yes, &lt;a href="https://www.influxdata.com/products/influxdb-overview/"&gt;we offer commercial versions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, a significant amount of the code that we write now goes into the upstream Apache Arrow project. The Apache Arrow project, and specifically the &lt;a href="https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/"&gt;FDAP&lt;/a&gt; stack, &lt;em&gt;is&lt;/em&gt; meant to be a starting point for anyone who wants to create a new type of database. This means that our contributions to the Apache Arrow project benefit community members seeking to create their own innovations in the database field.&lt;/p&gt;

&lt;h2 id="try-it-yourself"&gt;Try it yourself&lt;/h2&gt;
&lt;p&gt;If you want to play around with the code and client examples, you can find them in &lt;a href="https://github.com/InfluxCommunity/exambledb/blob/main/app.py"&gt;this github repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I hope that this basic database example demonstrates the power of the “We value open source” idea. We don’t have to limit our contributions to our own projects to support open source. Hopefully, our contributions to, and support of the upstream Apache Arrow project reflect that commitment. We also hope that the availability of these high-quality and performant tools inspires others to innovate around specific domains in the database space.&lt;/p&gt;
</description>
      <pubDate>Wed, 08 Nov 2023 08:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/write-database-50-lines-code/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/write-database-50-lines-code/</guid>
      <category>Developer</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>How We Did It: Data Ingest and Compression Gains in InfluxDB 3.0</title>
      <description>&lt;p&gt;A few weeks ago, we published some &lt;a href="https://www.influxdata.com/resources/InfluxDB-3-0-vs-oss-tech-paper" title="InfluxDB 3.0 Benchmarks"&gt;benchmarking&lt;/a&gt; that showed performance gains in &lt;a href="https://www.influxdata.com/products/influxdb-overview/"&gt;InfluxDB 3.0&lt;/a&gt; that are orders of magnitude better than previous versions of InfluxDB – and by extension, other databases as well. There are two key factors that influence these gains: 1. Data ingest, and 2. Data compression. This begs the question, just how did we achieve such drastic improvements in our core database?&lt;/p&gt;

&lt;p&gt;This post sets out to explain how we accomplished these improvements for anyone interested.&lt;/p&gt;

&lt;h2 id="a-review-of-tsm"&gt;A review of TSM&lt;/h2&gt;
&lt;p&gt;To understand where we ended up, it’s important to understand where we came from. The storage engine that powered InfluxDB 1.x and 2.x was something we called the InfluxDB Time-Structured Merge Tree (TSM). Let’s take a brief look at data ingest and compression in TSM.&lt;/p&gt;

&lt;h3 id="tsm-data-model"&gt;TSM data model&lt;/h3&gt;

&lt;p&gt;There’s a lot of information out there about the TSM data model, but if you want a quick overview, check out the information in the TSM docs &lt;a href="https://docs.influxdata.com/influxdb/v1/concepts/storage_engine/"&gt;here&lt;/a&gt;. Using that as a primer, let’s turn our attention to data ingest.&lt;/p&gt;

&lt;h3 id="tsm-ingest"&gt;TSM ingest&lt;/h3&gt;
&lt;p&gt;TSM uses an inverted index to map metadata to a specific series on disk. If a write operation adds a new measurement or tag key/value pair, then TSM updates the inverted index. TSM needs to write this data in the proper place on disk, essentially indexing it when written. This whole process requires a significant amount of CPU and RAM.&lt;/p&gt;

&lt;p&gt;The TSM engine sharded files by time, which allowed the database to enforce retention policies, evicting expired data.&lt;/p&gt;

&lt;h3 id="introduction-of-tsi"&gt;Introduction of TSI&lt;/h3&gt;
&lt;p&gt;Prior to 2017, InfluxDB calculated the inverted index on startup, maintained it during writes, and kept it in memory. This led to very long startup times, so in the autumn of 2017, we introduced the Time Series Index (TSI), which is essentially the inverted index persisted to disk. This created another challenge, however, because the size of TSI on disk could become very large, especially for high &lt;a href="https://www.influxdata.com/glossary/cardinality/"&gt;cardinality&lt;/a&gt; use cases.&lt;/p&gt;

&lt;h3 id="tsm-compression"&gt;TSM compression&lt;/h3&gt;
&lt;p&gt;TSM uses run length encoding for compression. This approach is very efficient for metrics use cases where data timestamps occurred at regular intervals. InfluxDB was able to store the starting time and time interval, and then calculate each time stamp at query time, based on only row count. Additionally, the TSM engine could use run length encoding on the actual field data. So, in cases where data did not change frequently and the timestamps were regular, InfluxDB could compress data very efficiently.&lt;/p&gt;

&lt;p&gt;However, use cases with irregular timestamp intervals, or where the data changed with nearly every reading, reduced the effectiveness of compression. The TSI complicated matters further because it could get very large in high cardinality use cases. This meant that InfluxDB hit practical compression limits.&lt;/p&gt;

&lt;h2 id="introduction-of-influxdb-30"&gt;Introduction of InfluxDB 3.0&lt;/h2&gt;
&lt;p&gt;When we embarked upon architecting InfluxDB 3.0, we were determined to solve these limitations related to ingest efficiency and compression, and to remove cardinality limitations to make InfluxDB effective for a wider range time series use cases.&lt;/p&gt;

&lt;h3 id="data-model"&gt;3.0 Data model&lt;/h3&gt;
&lt;p&gt;We started by rethinking the data model from the ground up. While we retain the notion of separating data into databases, rather than persisting time series, InfluxDB 3.0 persists data by table. In the 3.0 world, a “table” is analogous to a “measurement” in InfluxDB 1.x and 2.x.&lt;/p&gt;

&lt;p&gt;InfluxDB 3.0 shards each table on disk by day and persists that data in the &lt;a href="https://www.influxdata.com/glossary/apache-parquet/"&gt;Parquet&lt;/a&gt; file format. Visualizing that data on disk looks like a set of Parquet files. The database’s default behavior generates a new Parquet file every fifteen minutes. Later, the compaction process can take those files and coalesce them into larger files where each file represents one day of data for a single measurement. The caveat here is that we limit the size of each Parquet file to 100 megabytes, so heavy users may have multiple files per day.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/1D6S5BElimUnlrZ6kvEDak/ac6eb181db2dcd544db0c1547998afdd/image2.png" alt="parquet file partitions" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Parquet file partitioning and data model example for InfluxDB 3.0&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;InfluxDB 3.0 retains the notion of tags and fields; however, they play a different role in 3.0. A unique set of tag values and a time stamp identifies a row, which enables InfluxDB to update it with an UPSERT on write.&lt;/p&gt;

&lt;p&gt;Now that we’ve explained a bit about the new data model, let’s turn to how it allows us to improve data ingest efficiency and compression.&lt;/p&gt;

&lt;h3 id="alternate-partitioning-options"&gt;Alternate partitioning options&lt;/h3&gt;
&lt;p&gt;We designed InfluxDB 3.0 to perform analytical queries (i.e. queries that summarize across a large number of rows) and optimized the default partitioning scheme for this. However, users may always need to query a subset of tag values for some measurements. For example, if every query includes a certain customer id or sensor id. Due to the way it indexes data and persists it to disk, TSM handles this type of query very well. This is not the case with InfluxDB 3.0 so those using the default partitioning scheme may experience a regression in performance for these specific queries.&lt;/p&gt;

&lt;p&gt;The solution to this in InfluxDB 3.0 is custom partitioning. Custom partitioning allows the user to define a partitioning scheme based on tag keys and tag values, and to configure the time range for each partition. This approach enables users to achieve similar query performance for these specific query types in InfluxDB 3.0 while retaining its ingest and compression benefits.&lt;/p&gt;

&lt;h2 id="influxdb-30-ingest-path"&gt;InfluxDB 3.0 ingest path&lt;/h2&gt;
&lt;p&gt;The data model for InfluxDB 3.0 is simpler than previous versions because each measurement groups all of its data together instead of separating it out by series. This streamlines the data ingest process. When InfluxDB 3.0 handles a write, it validates the schema and then finds the structure in memory for any other data for that measurement. The database then appends the data to the measurement and returns a success or failure to the user. At the same time, it appends the data to a write ahead log (WAL). We call these in-memory structures “mutable batches” or “buffers” for reasons that will be clear below.&lt;/p&gt;

&lt;p&gt;This process requires fewer compute resources compared to other databases, including InfluxDB 1.x and 2.x because, unlike other databases:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The ingest process does not sort or otherwise order data; it simply appends it.&lt;/li&gt;
  &lt;li&gt;The ingest process delays data deduplication until persistence.&lt;/li&gt;
  &lt;li&gt;The ingest process uses the buffer tree as an index. This index lives within each specific ingester instance and identifies the data required for specific queries. We were able to make this buffer tree extremely performant using hierarchical/sharded locking and reference counting to eliminate contention.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/QD3V5XVPzmoAqkT8mxVsX/c228a4f3d31fbcd0a657059163703380/image1.png" alt="Data Ingest Path" /&gt;
&lt;em&gt;&lt;a href="https://www.influxdata.com/blog/influxdb-3-0-system-architecture/"&gt;Source&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When the mutable batches run out of memory, InfluxDB persists the data in the buffers as Parquet files in object store. By default, InfluxDB will persist all the data into Parquet every 15 minutes if the memory buffer hasn’t been filled. This is the point where InfluxDB 3.0 sorts and deduplicates data. Deferring this work off the ‘hot/write path’ keeps latency and variance low.&lt;/p&gt;

&lt;p&gt;Streamlining the ingest process in this way means that the database does less work on the hot/write path. The result is a process that moves operations that are expensive to execute to persist time so that overall it requires a minimal amount of CPU and RAM, even for significant write workloads.&lt;/p&gt;

&lt;p&gt;From completeness, there is also a Write Ahead Log that is flushed periodically ensuring all writes are immediately durable, and is used to reconstitute the mutable batches in cases where the container fails. InfluxDB 3.0 only replays WAL files in unclean shutdown or crash situations. In the ‘happy path’, (e.g., an upgrade) the system gracefully stops, flushes (i.e., persists) all buffered data to object storage and deletes all WAL entries. As a result, startup is fast, because InfluxDB doesn’t have to replay the WAL. Furthermore, in non-replicated deployments, the data that would otherwise sit on the WAL disk on an offline node is actually in object storage and readable, preserving read availability.&lt;/p&gt;

&lt;h4 id="leading-edge-queries"&gt;Leading edge queries&lt;/h4&gt;
&lt;p&gt;We also optimized InfluxDB 3.0 for querying the leading edge of data. Most queries, especially time sensitive ones, query the most recently written data. We call these “leading edge” queries.&lt;/p&gt;

&lt;p&gt;When a query comes in, InfluxDB converts the data to &lt;a href="https://www.influxdata.com/glossary/apache-arrow/"&gt;Arrow&lt;/a&gt;, which then gets queried by the &lt;a href="https://www.influxdata.com/glossary/apache-datafusion/"&gt;DataFusion&lt;/a&gt; query engine. In cases where all the data being queried already exists in the mutable batches, the ingester serves the querier the data in arrow format, and then DataFusion quickly performs any necessary sorting and deduplicating before returning the data to the user.&lt;/p&gt;

&lt;p&gt;In cases where some or all of the data is not in the read buffer, InfluxDB 3.0 uses its catalog of Parquet files – stored in a fast relational database – to find the specific files and rows to load into memory in the Arrow format. Once loaded into memory, DataFusion is able to quickly query this data.&lt;/p&gt;

&lt;p&gt;&lt;img style="padding: 20px 0;" src="//images.ctfassets.net/o7xu9whrs0u9/5owQ1ra6r2c5KRyTq0Bp7u/a158e521b1f343d765882adc93e0b681/image4.png" alt="Querier" height="auto" width="450" /&gt;&lt;/p&gt;

&lt;h4 id="compaction"&gt;Compaction&lt;/h4&gt;
&lt;p&gt;A series of compaction processes help maintain the data catalog. These processes optimize the stored Parquet files by ordering and deduplicating the data. This makes finding and reading Parquet files on disk efficient.&lt;/p&gt;

&lt;h4 id="data-compression"&gt;Data compression&lt;/h4&gt;
&lt;p&gt;There are a few key reasons why we selected the Apache Arrow ecosystem, including Parquet, for InfluxDB 3.0. These formats were designed from the ground up to support high performance analytical queries on large data sets. Because they’re designed for columnar data structures, they can also achieve significant compression. The Apache Arrow community, which we’re proud to be a part of and contribute to, continues to improve these technologies.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/27kn9gcPf7qY1lVWSWvM6e/a668510b8d99f3a962f80569bd8d5ecc/image3.png" alt="Parquet file format" /&gt;
&lt;em&gt;Parquet file format&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So, because InfluxDB 3.0 starts with Arrow’s columnar structure, it inherits significant compression benefits. Using Parquet compounds these benefits. Furthermore, because InfluxDB is a time series database, we can make some assumptions about time series data that allow us to get the most out of these compression techniques.&lt;/p&gt;

&lt;p&gt;For example, suppose that retaining the notion of tag key/value pairs in InfluxDB 3.0 meant that dictionary encoding on these columns would yield the best compression. In this context, dictionary encoding assigns a number to each tag value. This number takes up a small number of bytes on disk. Then, Parquet can run length encode those numbers for each tag key column. On top of this encoding scheme, InfluxDB can apply general purpose compression (e.g., gzip, zstd) to compress the data even further.&lt;/p&gt;

&lt;p&gt;The overall combination of Arrow and Parquet results in significant compression gains. When you combine those gains with the fact that InfluxDB 3.0 relies on object storage for historical data, users can store a lot more data, in less space, for a fraction of the cost.&lt;/p&gt;

&lt;h2 id="wrap-up"&gt;Wrap up&lt;/h2&gt;
&lt;p&gt;Hopefully, you found this explanation of how InfluxDB 3.0 achieves such impressive ingest efficiency and compression interesting. In conjunction with the separation of compute and storage, customers can achieve significant total cost of ownership advantages over other databases, including InfluxDB 1.x and 2.x!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.influxdata.com/get-influxdb/"&gt;Try InfluxDB 3.0&lt;/a&gt; to see how these performance and compression gains impact your applications.&lt;/p&gt;
</description>
      <pubDate>Wed, 04 Oct 2023 07:35:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/improved-data-ingest-compression-influxdb-3-0/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/improved-data-ingest-compression-influxdb-3-0/</guid>
      <category>Product</category>
      <category>Company</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>Announcing InfluxDB Clustered: InfluxDB 3 for Self-Managed Environments</title>
      <description>&lt;p&gt;Today, we’re excited to announce &lt;a href="https://www.influxdata.com/products/influxdb-clustered"&gt;InfluxDB Clustered&lt;/a&gt;, our latest product developed on the InfluxDB 3 product suite. InfluxDB Clustered is the evolution of InfluxDB Enterprise, our popular self-managed product for large-scale time series workloads. For enterprises, the performance leap from InfluxDB Enterprise to InfluxDB Clustered is orders of magnitude higher with significant improvements across analytics, storage, and costs.&lt;/p&gt;

&lt;p&gt;Like the rest of the &lt;a href="https://www.influxdata.com/products/influxdb-overview/"&gt;InfluxDB 3&lt;/a&gt; product suite, InfluxDB Clustered delivers the same high throughput for data writes and reads, support for unlimited data &lt;a href="https://www.influxdata.com/glossary/cardinality/"&gt;cardinality&lt;/a&gt;, real-time data analysis, and native &lt;a href="https://www.influxdata.com/glossary/sql/"&gt;SQL&lt;/a&gt; support for large time series workloads. InfluxDB 3 is developed in Rust and built on the &lt;a href="https://www.influxdata.com/glossary/apache-arrow/"&gt;Apache Arrow&lt;/a&gt; ecosystem (&lt;a href="https://www.influxdata.com/glossary/apache-datafusion/"&gt;DataFusion&lt;/a&gt;, &lt;a href="https://www.influxdata.com/glossary/apache-parquet/"&gt;Parquet&lt;/a&gt;, &lt;a href="https://www.influxdata.com/glossary/apache-arrow-flight-sql/"&gt;Flight&lt;/a&gt;). Since Apache Arrow is forming the core of a new set of large-scale analytics tools, InfluxDB 3 is able to immensely benefit from the inherent interoperability with this next-generation toolset.&lt;/p&gt;

&lt;h2 id="self-managed-for-large-scale-workloads"&gt;Self-managed for large-scale workloads&lt;/h2&gt;

&lt;p&gt;InfluxDB Clustered departs from the &lt;a href="https://www.influxdata.com/products/influxdb-cloud/serverless/"&gt;InfluxDB Cloud Serverless&lt;/a&gt; and &lt;a href="https://www.influxdata.com/products/influxdb-cloud/dedicated/"&gt;InfluxDB Cloud Dedicated&lt;/a&gt; products released earlier this year in that it is a self-managed product. This gives you ultimate control over your time series database, making it well-suited to meet enterprise and compliance requirements. InfluxDB Clustered runs where you need it – on-premises, in your private cloud, or self-managed public cloud environments. This flexibility comes from the fact that we deliver InfluxDB Clustered as a collection of Kubernetes-based containers with decoupled, independently scalable ingest and query tiers.&lt;/p&gt;

&lt;p&gt;This high availability and scalability gives you the ability to build and iterate technical infrastructure to meet your specific needs. Clustered allows you to scale your cluster up or down in size at will. Need to scale up for a few hours or days to accommodate an anticipated usage spike? Clustered lets you do that, too. No matter what your security or data residency requirements may be, InfluxDB Clustered can handle them.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/5OgiP2m0YmmgOc3RJ7yAmj/b0adcddd3e461c4e4afc46f7203f8af3/influxdb-clustered.png" alt="influxdb-clustered" width="600" height="auto" /&gt;&lt;/p&gt;

&lt;h2 id="high-performance-with-unlimited-scale"&gt;High performance with unlimited scale&lt;/h2&gt;

&lt;p&gt;Developed on Apache Arrow for high-performance analytical queries, InfluxDB Clustered – like the rest of the InfluxDB 3 product suite – can handle high-speed, high-volume analytics in real-time. This includes managing high cardinality data without impacting performance.&lt;/p&gt;

&lt;p&gt;There are several factors that play into this development, one of which is the separation of compute and storage. Clustered’s self-managed configuration means that you as the user can scale the components of the database to best suit the specific needs of your data. If we drill down a bit further on the storage front, we also introduced multiple storage tiers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ingested data hits the hot storage tier first and it’s immediately available for querying.&lt;/strong&gt; There’s no need to wait for batching or other processing on leading-edge data. This enables queries to be &lt;a href="https://www.influxdata.com/blog/influxdb-3-0-is-2.5x-45x-faster-compared-to-influxdb-open-source/"&gt;45x faster&lt;/a&gt; than previous versions of InfluxDB. The hot storage tier consists of the data that you’re actually using. This can include data retrieved from cold storage as well (more on that in a moment). Combining this hot storage approach with a &lt;a href="https://www.influxdata.com/blog/influxdb-3-0-is-2.5x-45x-faster-compared-to-influxdb-open-source/"&gt;45x better data ingest rate&lt;/a&gt; and the ability to handle unlimited cardinality data, means that users can derive insights on large datasets in real-time without degrading database performance.&lt;/p&gt;

&lt;p&gt;Clustered also improves the way it handles historical data. The cold storage tier consists of low-cost cloud object storage. InfluxDB moves historical data out of the hot tier to the cold tier for long-term storage. &lt;strong&gt;This historical data is always available and there are no additional fees for retrieving data from cold storage for current queries.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id="reduction-in-storage-costs"&gt;90% reduction in storage costs&lt;/h2&gt;

&lt;p&gt;Storage is a big concern when it comes to time series data. This is because sources produce massive volumes of time series data, especially at enterprise scale. Companies that want to get the maximum value from this data need to analyze it in real-time, but they also need to hold onto it so they can use it for historical or predictive analysis.&lt;/p&gt;

&lt;p&gt;With InfluxDB Clustered, organizations don’t need to choose between storing data and value-driven data analysis. InfluxDB 3 reduces storage costs by &lt;a href="https://www.influxdata.com/blog/influxdb-3-0-is-2.5x-45x-faster-compared-to-influxdb-open-source/"&gt;90% or more&lt;/a&gt;, allowing you to store more data using less space and at a fraction of the cost. One of the big factors in this reduction is the use of low-cost cloud object storage mentioned above.&lt;/p&gt;

&lt;p&gt;There’s another key factor in the storage/cost equation though, which is data compression. There are two main components to data compression in InfluxDB 3. First is the shift to a columnar database. This allows the database to compress each column on an individual basis. Because the data in each column is often similar, the per-column compression can be significantly greater.&lt;/p&gt;

&lt;p&gt;At the same time, InfluxDB uses Apache Parquet as its data persistence format. Parquet is a file format designed to work with columnar data structures, and it uses those structures to organize homogeneous data for better compression. It can use both dictionaries and run length encoding to efficiently compress and store repeated values.&lt;/p&gt;

&lt;p&gt;So, the combination of cheaper object storage and more highly compressed data means that you can retain more data, using less space, for less money.&lt;/p&gt;

&lt;h2 id="enterprise-grade-security-and-compliance"&gt;Enterprise-grade security and compliance&lt;/h2&gt;

&lt;p&gt;As always, InfluxDB encrypts data in transit by default. Users can expect to see the addition of enhanced security features in the very near future, too. These include things like private networking options, single sign on (SSO), audit logging, high availability, and attribute-based access control (ABAC).&lt;/p&gt;

&lt;h2 id="get-started-today"&gt;Get started today&lt;/h2&gt;

&lt;p&gt;Going from InfluxDB Enterprise to InfluxDB Clustered is a gigantic leap forward. For a long time, users had to make difficult choices about their databases between performance, data retention, and costs.&lt;/p&gt;

&lt;p&gt;InfluxDB Clustered (and the rest of the InfluxDB 3 products) virtually eliminates those challenges. It delivers real-time performance, on leading-edge (and historical) data, while lowering TCO. Not only does this mean that you can do more with your data, but, because you manage your own infrastructure with InfluxDB Clustered, you can make more cost-effective decisions that reduce initial startup costs and long-term maintenance and overhead needs.&lt;/p&gt;

&lt;p&gt;We’re so excited about getting these capabilities into the hands of our users. To get started, &lt;a href="https://www.influxdata.com/contact-sales-influxdb-clustered"&gt;request a proof of concept&lt;/a&gt; and our team of experts will contact you.&lt;/p&gt;
</description>
      <pubDate>Wed, 06 Sep 2023 05:30:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/announcing-influxdb-clustered/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/announcing-influxdb-clustered/</guid>
      <category>Product</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>Save 96% on Data Storage Costs</title>
      <description>&lt;h2 id="painful-tradeoffs-between-utility-and-costs"&gt;Painful tradeoffs between utility and costs&lt;/h2&gt;

&lt;p&gt;Users with real-time and other analytic workloads want or need to keep large volumes of historical data to aid in important activities, such as ad hoc historical trend analysis and training AI models.&lt;/p&gt;

&lt;p&gt;However, storing this much data in a way that also makes it easily queryable becomes prohibitively expensive. As a result, users must balance data availability and usability with sacrificing data fidelity and storage costs.&lt;/p&gt;

&lt;p&gt;That is until now. With &lt;a href="https://www.influxdata.com/products/influxdb-overview/"&gt;InfluxDB 3.0&lt;/a&gt;, users don’t need to choose between the data they need and storage costs. We designed it so that users can keep and query large quantities of data in a cost-effective manner.&lt;/p&gt;

&lt;h2 id="how-influxdb-works"&gt;How InfluxDB works&lt;/h2&gt;

&lt;p&gt;InfluxDB 3.0 uses &lt;a href="https://www.influxdata.com/glossary/apache-parquet/"&gt;Parquet&lt;/a&gt; files as the underlying file storage. InfluxDB engineers spent two years fine-tuning InfluxDB 3.0 to squeeze every bit of efficiency out of Parquet. The result of their efforts is that data stored in InfluxDB 3.0 uses less space on disk for time series data than any other database.&lt;/p&gt;

&lt;p&gt;Not satisfied with compression gains alone, the InfluxDB engineers took it a step further, designing InfluxDB 3.0 for a distributed environment. That means that InfluxDB can store those Parquet files in inexpensive object storage, while maintaining high performance querying of that data. This effectively eliminates the need to make trade-offs between data storage costs, availability, and fidelity.&lt;/p&gt;

&lt;h2 id="the-math-is-simple"&gt;The math is simple&lt;/h2&gt;

&lt;p&gt;So, why does the title claim to save up to 96% on storage costs? We didn’t pull the 96% figure from the air. It comes from a real customer’s experience. This customer was collecting large amounts of IoT device data using the TSM storage engine in InfluxDB OSS 1.8 and 2.0.&lt;/p&gt;

&lt;p&gt;Let the bytes on disk be x. Then, InfluxDB 3.0 compresses down to one half the size of space on disk for the same amount of data. Then, the Object Storage costs are about 7.69% of attached disks.&lt;/p&gt;

&lt;p&gt;Savings&lt;br /&gt;
= compression_factor * ratio_of_object_storage_costs&lt;br /&gt;
= 2 * 7.69%&lt;br /&gt;
~= x * 4%&lt;/p&gt;

&lt;p&gt;It’s important to note here that InfluxDB 1.8 already compresses data efficiently on disk. If the customer was using a solution like ClickHouse, the savings effect would be much more pronounced.&lt;/p&gt;

&lt;p&gt;The benefits of InfluxDB 3.0 over its open source predecessors are dramatic and significant. &lt;a href="https://www.influxdata.com/get-influxdb/"&gt;Try it for yourself&lt;/a&gt;.&lt;/p&gt;
</description>
      <pubDate>Wed, 05 Jul 2023 07:35:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/save-96-percent-on-data-storage-costs/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/save-96-percent-on-data-storage-costs/</guid>
      <category>Product</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>Introducing InfluxDB 3.0: Available Today in InfluxDB Cloud Dedicated</title>
      <description>&lt;p&gt;It’s been literally years now that I have been first tangentially, and then intimately involved with the project that has become InfluxDB 3.0. I started using it so early that one of the DataFusion upstream developers literally calls me “User0” … a moniker of which I am not-so-secretly proud. Now, after those years of development, I am really happy to have a small role in sharing the work of the InfluxData team, and the wider Apache Arrow community with the world, as we roll out InfluxDB 3.0, and I sincerely expect that both existing and new users will find this release very useful.&lt;/p&gt;

&lt;h2 id="introducing-influxdb-30-the-evolution-of-influxdb-iox"&gt;Introducing InfluxDB 3.0: the evolution of InfluxDB IOx&lt;/h2&gt;

&lt;p&gt;As of today, &lt;a href="https://www.influxdata.com/products/influxdb-overview"&gt;InfluxDB 3.0&lt;/a&gt; now serves as the foundation for all InfluxDB products, both current and future, bringing high performance, unlimited cardinality, SQL support, and low-cost object store to the InfluxDB platform for the first time. Developed in Rust as a columnar database, InfluxDB 3.0 introduces support for the full range of time series data (metrics, events, and traces) in a single datastore to power use cases in observability, real-time analytics, and IoT/IIoT that rely on high-cardinality time series data.&lt;/p&gt;

&lt;p&gt;InfluxDB 3.0 is now available in InfluxData’s cloud products: &lt;a href="https://www.influxdata.com/products/influxdb-cloud/serverless"&gt;InfluxDB Cloud Serverless&lt;/a&gt; (our fully managed, elastic, multi-tenant database) and &lt;a href="https://www.influxdata.com/products/influxdb-cloud/dedicated"&gt;InfluxDB Cloud Dedicated&lt;/a&gt; (a fully managed, single tenant version of InfluxDB) announced today. Stay tuned for our self-managed product coming later this year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;InfluxDB 3.0 Clustered:&lt;/strong&gt; The evolution of InfluxDB Enterprise.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/MM2l2D6mxcyKOPhMZjIdP/3a666c905cf446b2c76a890cdc98e558/InfluxDB-3.0-02.png" alt="InfluxDB 3.0 diagram" width="750" height="auto" /&gt;&lt;/p&gt;

&lt;h3 id="influxdb-cloud-dedicated-now-generally-available"&gt;InfluxDB Cloud Dedicated now Generally Available&lt;/h3&gt;

&lt;p&gt;InfluxDB Cloud Dedicated is an ideal solution for customers working with large data sets, who require the reassurance and security of data isolated in a dedicated, single tenant cluster. It offers custom configuration and enhanced security options (including Enterprise SSO, private connectivity, and Role-Based Access Controls) and a capacity-based pricing model.&lt;/p&gt;

&lt;h2 id="optimizing-influxdb-30-for"&gt;Optimizing InfluxDB 3.0 for…&lt;/h2&gt;

&lt;p&gt;If you fall into one of the following categories, we think you’ll want to check out InfluxDB 3.0:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;You’re an existing InfluxDB OSS user — InfluxDB 3.0 is likely to run your existing workload faster and cheaper with minimal changes. Plus, it gives you access to new functionality and the ability to use InfluxDB with more, different kinds of data.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;You’re not an existing InfluxDB user, but you need an analytics database with real-time capabilities for a large volume of data, or you’re struggling to get the most out of your existing analytics database, then InfluxDB 3.0 will fulfill your needs.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id="performance-with-unlimited-scale"&gt;Performance with unlimited scale&lt;/h2&gt;

&lt;p&gt;InfluxDB 3.0 goes beyond InfluxDB 1.x and 2.x in some important ways. Enhancements to InfluxDB 3.0 bring InfluxDB to the forefront of analytics databases, allowing developers to ingest and query full fidelity time series data of all types, in real-time at scale, and with no compromises.&lt;/p&gt;

&lt;p&gt;InfluxDB 3.0 now supports unlimited &lt;a href="https://www.influxdata.com/glossary/cardinality/"&gt;cardinality&lt;/a&gt;, which expands the use cases for InfluxDB to any time-stamped data. Unlike other analytics databases, InfluxDB 3.0 features massive gains in ingest performance, scalability, resilience, and efficiency, even as data complexity and cardinality increase.&lt;/p&gt;

&lt;p&gt;For example, compared to previous versions of InfluxDB, the new InfluxDB 3.0 delivers performance gains in the following areas:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;100x faster queries&lt;/strong&gt; across high cardinality data, delivering real-time query response&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;10x ingest performance&lt;/strong&gt; to ingest, store, and analyze billions of time series data points per second without limitations or caps&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;10x greater data compression&lt;/strong&gt; from using the Apache Parquet file format, which is designed for efficient data storage and retrieval&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id="influxdb-in-the-arrow-ecosystem"&gt;InfluxDB in the Arrow ecosystem&lt;/h2&gt;

&lt;p&gt;We developed InfluxDB IOx – and by extension, InfluxDB 3.0 – around the &lt;a href="https://www.influxdata.com/glossary/apache-arrow/"&gt;Apache Arrow&lt;/a&gt; Project, an open source, in-memory specification for columnar data, which is the gold standard for high performance computing for analytics use cases. We built the InfluxDB IOx engine on Arrow to take advantage of its performance and ecosystem.&lt;/p&gt;

&lt;p&gt;InfluxDB 3.0 now uses the &lt;a href="https://www.influxdata.com/glossary/apache-parquet/"&gt;Apache Parquet&lt;/a&gt; file format for storing data. Parquet’s compression achieves orders of magnitude gains in efficient use of disk space. Having the ability to store more data in less space is important for controlling costs, as well as overall efficiency for large analytic workloads.&lt;/p&gt;

&lt;p&gt;Leveraging &lt;a href="https://www.influxdata.com/glossary/apache-datafusion/"&gt;Apache DataFusion&lt;/a&gt;, InfluxDB 3.0 has a modern and blazing-fast &lt;a href="https://www.influxdata.com/glossary/sql/"&gt;SQL&lt;/a&gt; implementation. Because it is based on open standards, you can bring your existing SQL knowledge and tools to your InfluxDB experience. We even enhanced DataFusion’s SQL dialect to include key time series functions.&lt;/p&gt;

&lt;p&gt;We also brought InfluxQL, InfluxData’s time series query language, forward into DataFusion. Now, InfluxQL runs faster than ever.&lt;/p&gt;

&lt;p&gt;At InfluxData, we believe in the Apache Arrow ecosystem. True to our open source ethos, our engineers made significant contributions to upstream Arrow projects to ensure that performance and capabilities meet the standards of InfluxDB and its dedicated user-base. The introduction of InfluxDB 3.0 brings time series data to the Arrow ecosystem for the first time, allowing analytics workloads to more easily incorporate times series data. This ensures OSS contributions are easier to build and integrate.&lt;/p&gt;

&lt;h2 id="get-started-with-influxdb-30-today"&gt;Get started with InfluxDB 3.0 today&lt;/h2&gt;

&lt;p&gt;To get started with InfluxDB 3.0, try &lt;a href="https://cloud2.influxdata.com/signup"&gt;InfluxDB Cloud Serverless&lt;/a&gt;, or &lt;a href="https://www.influxdata.com/contact-sales-cloud-dedicated"&gt;request a proof of concept for InfluxDB Cloud Dedicated&lt;/a&gt; today.&lt;/p&gt;
</description>
      <pubDate>Wed, 26 Apr 2023 06:30:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/introducing-influxdb-3-0/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/introducing-influxdb-3-0/</guid>
      <category>Product</category>
      <category>Company</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>Why InfluxDB Cloud, Powered by IOx is a Big Deal to Me</title>
      <description>&lt;p&gt;From time to time throughout my career, I have been involved in projects with dramatic releases when we built and delivered something very new and very special. The release of InfluxDB Cloud, powered by IOx (referred to as “InfluxDB IOx” for short below) absolutely meets those criteria. I want to explain my personal views of why this release is so impactful and why I am so excited to be part of it.&lt;/p&gt;

&lt;p&gt;For more information on the motivation and technologies that went into building this new database engine, check out the blog posts &lt;a href="https://www.influxdata.com/blog/announcing-influxdb-iox/"&gt;here&lt;/a&gt;, &lt;a href="https://www.influxdata.com/blog/influxdb-engine/"&gt;here&lt;/a&gt;, and &lt;a href="https://www.influxdata.com/blog/understanding-influxdb-iox-commitment-open-source/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First and foremost, we designed InfluxDB IOx to be fast for large time series workloads. This means:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Fast queries for the leading edge of data. Typically, but not always, this equates to the last two hours of data, though all queries should show a significant performance boost.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Fast ingest of massive amounts of data.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The smallest possible on disk footprint.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Users no longer need to make compromises with their data and can write as much data as necessary, and query it efficiently. As a technologist and Product Manager, what can be more exciting than delivering the things that users want the most?&lt;/p&gt;

&lt;h1 id="expanded-approach-to-open-source"&gt;Expanded approach to open source&lt;/h1&gt;

&lt;p&gt;I have been an Open Source contributor and community member since 2008 at least. InfluxData’s commitment to Open Source is one of the core things that attracted me to the company. InfluxData considers itself an open source company, a belief held so firmly we encoded it in our company value statements. In the past, this entailed working in the open and maintaining open source software that was ready for users to pick up and use.&lt;/p&gt;

&lt;h2 id="apache-arrow-project"&gt;Apache Arrow project&lt;/h2&gt;

&lt;p&gt;However, with InfluxDB IOx, we went beyond simply delivering code for our projects. Instead, we are working upstream, contributing to the &lt;a href="https://www.influxdata.com/glossary/apache-arrow/"&gt;Apache Arrow&lt;/a&gt; project, with a special focus on &lt;a href="https://www.influxdata.com/glossary/apache-datafusion/"&gt;Apache DataFusion&lt;/a&gt;, the SQL query engine for Apache Arrow.&lt;/p&gt;

&lt;p&gt;What does this mean for the open source community? While it was always nice that users could see, tinker with, and contribute to InfluxDB code, in practice this had limited utility. For example, what if you wanted to build a different kind of database, say a location database? Sure, you could get some ideas by looking at the InfluxDB code base, but it wouldn’t help you actually build your database. The Apache Arrow project fundamentally changes what’s possible by providing you with the actual components to assemble your own high performance database to meet your specific requirements by extending the work in the Apache Arrow project. This approach to open source truly enables the community.&lt;/p&gt;

&lt;p&gt;Doing work that both strengthens the company, but also creates technical and economic opportunity for the wider community, is deeply satisfying for me.&lt;/p&gt;

&lt;h2 id="embracing-standards"&gt;Embracing standards&lt;/h2&gt;

&lt;p&gt;Embracing emerging standards means we don’t have to provide every piece of functionality ourselves. Users have a range of tools and services that they prefer, and by eschewing a walled garden approach, we can cooperate with other developers and companies to help satisfy all types of users.&lt;/p&gt;

&lt;h3 id="sql-query-language"&gt;SQL query language&lt;/h3&gt;

&lt;p&gt;Clearly, SQL isn’t an “emerging standard,” but the route we chose to implement SQL support is. As described in detail elsewhere, IOx uses Datafusion for querying, and Datafusion uses SQL as the query language. This investment in DataFusion means that anyone who knows even a little bit of SQL can query time series data in InfluxDB, and SQL experts can make heavy use of that same data. Additionally, when other contributing developers from the community and other companies improve DataFusion, those improvements flow into InfluxDB.&lt;/p&gt;

&lt;h3 id="parquet"&gt;Parquet&lt;/h3&gt;

&lt;p&gt;InfluxDB IOx stores files in Parquet as the native file format. Parquet is part of the Apache Arrow project and as such the two technologies work together very well. Parquet files deliver significant data compression, especially the way InfluxDB IOx uses them. Parquet is also an open standard with many high quality libraries to read and write Parquet files. As such, Parquet is emerging as the standard file format for interchanging large analytical datasets, whether that means running jobs on the data in place, or moving it to a service. Using Parquet makes it possible for services that rely on large datasets, such as anomaly detection, AI/ML, visualizations, etc., to easily work with InfluxDB IOx, and to do so without a time- or compute-intensive export step.&lt;/p&gt;

&lt;h3 id="flight-sql"&gt;Flight SQL&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.influxdata.com/glossary/apache-arrow-flight-sql/"&gt;Flight SQL&lt;/a&gt; is also a standard in the Apache Arrow project. It is a client/server protocol for ingesting SQL statements and returning results in Arrow. Any database is free to implement Flight SQL. As a result, by supporting a small set of drivers, Flight SQL enables almost any dashboard, visualization, or BI tool.&lt;/p&gt;

&lt;p&gt;For example, we created a Grafana-to-Flight SQL plugin and are in the process of contributing that to Grafana, so anyone who uses a database that supports Flight SQL can use that plugin. Similarly, we created a Flight SQL SQLAlchemy dialect so that anyone can use it to communicate between their database and Apache Superset.&lt;/p&gt;

&lt;p&gt;You can find the repos where we contributed these resources here:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Grafana-to-FlightSQL plugin &lt;a href="https://github.com/influxdata/grafana-flightsql-datasource"&gt;here&lt;/a&gt; (awaiting official inclusion in Grafana’s plugin library)&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Superset support via an upstream adapter that we wrote &lt;a href="https://github.com/influxdata/flightsql-dbapi"&gt;here&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good quality JDBC Flight SQL driver and an ODBC Flight SQL driver already exist upstream. You can use these drivers with a plethora of tools to unlock access to your data. If you have teams that want to use tools like Tableau or PowerBI, they can use these established drivers to access data in InfluxDB.&lt;/p&gt;

&lt;p&gt;Truth be told, I am particularly excited about our support for Apache Superset. This is a full-featured, combination BI/dashboarding tool that is part of the Apache foundation. It is a very accessible Python codebase, and is well-supported by several large companies. I highly recommend &lt;a href="https://docs.influxdata.com/influxdb/cloud-iox/visualize-data/superset/"&gt;trying Superset&lt;/a&gt; if you haven’t already.&lt;/p&gt;

&lt;h1 id="querying-data-in-influxdb-iox"&gt;Querying data in InfluxDB IOx&lt;/h1&gt;

&lt;p&gt;We designed InfluxDB IOx from the ground up to support SQL queries, and for those queries to be fast. But standard SQL lacks some core time series capabilities. Because of our investment in DataFusion, we were able to implement some of these core time series functions directly upstream. Not only does this ensure that these functions are performant, but it almost means that anyone in the Apache Arrow community can benefit from them and even contribute improvements to them.&lt;/p&gt;

&lt;p&gt;Currently, there are three time-series-specific functions in DataFusion:&lt;br /&gt;
&lt;strong&gt;date_bin()&lt;/strong&gt; - this function creates rows that are time windows of data with an aggregate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;selector_first()&lt;/strong&gt;, &lt;strong&gt;selector_last()&lt;/strong&gt; - these functions provide the first and last row of a table that meet a specific criteria.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;time_bucket_gapfill()&lt;/strong&gt; - this function returns windowed data, but if there are windows that lack data, it will fill those gaps.&lt;/p&gt;

&lt;h2 id="arrow-libraries"&gt;Arrow libraries&lt;/h2&gt;

&lt;p&gt;So far I have discussed how the InfluxDB IOx engine leverages the Apache Arrow project to enable users to write time-series-specific queries, using a familiar language (SQL) and their tool(s) of choice, and that these low latency queries act on the leading edge of data. But the Apache Arrow and Flight SQL combination brings another critical advantage.&lt;/p&gt;

&lt;p&gt;Remember that Apache Arrow is designed to move large amounts of columnar data and to allow tools to operate effectively on that data. Therefore, upstream libraries in the Apache Arrow project allow users to query large amounts of data, efficiently bring it onto their clients, and operate on that data in interesting ways.&lt;/p&gt;

&lt;p&gt;For me, the most exciting aspect of this is that pyarrow, the Python arrow library, has built-in Pandas support. Pandas is by far the most popular data manipulation library, used by developers, business analysts, AI/ML practitioners, et al. Libraries such as Plotly Express and Neural Prophet are just a couple examples of richly functional libraries built on Pandas. Long-time Python users will welcome the easy interoperability between InfluxDB IOx and these libraries. That said, Arrow has libraries available for &lt;a href="https://arrow.apache.org/docs/"&gt;many different languages&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Remember, we designed InfluxDB IOx to process large time series workloads quickly, and its support for the Arrow libraries carries that goal all the way to your client code.&lt;/p&gt;

&lt;h1 id="writing-data"&gt;Writing data&lt;/h1&gt;

&lt;p&gt;While we made a lot of changes to InfluxDB, from a user’s perspective, we did not change the way data gets written to InfluxDB. This is because we optimized InfluxDB for ingesting large amounts of time series data long ago, and, in fact, this continues to be a core point of differentiation. Your Telegraf configs and line protocol code still work the same with the InfluxDB IOx engine.&lt;/p&gt;

&lt;p&gt;In truth, we implemented some changes in the configuration of an IOx-powered InfluxDB instance that enables even faster data ingestion and “Time to Be Readable.” But users don’t need to change the way they write data to InfluxDB to receive these benefits.&lt;/p&gt;

&lt;p&gt;Ultimately, this means that InfluxDB Cloud, powered by IOx, can ingest more data, faster than any other database that you might consider for such a workload.&lt;/p&gt;

&lt;h1 id="even-easier-to-use-and-for-more-use-cases"&gt;Even easier to use and for more use cases&lt;/h1&gt;

&lt;p&gt;One of the operational challenges of operating a time series database is that the things you want to measure can come and go. For example, new sensor types may arrive on your factory floor, or you start observing a new kind of server infrastructure.&lt;/p&gt;

&lt;p&gt;With most databases, these changes are highly disruptive because you must deploy a schema change to account for the new data. These schema updates can be labor intensive and risky. InfluxDB always handled this problem by being “schema on write.” This means you can simply write data with a new schema and InfluxDB handles the changes for you behind the scenes. There is no change to this functionality with InfluxDB IOx.&lt;/p&gt;

&lt;p&gt;In previous versions of InfluxDB, users had to be cognizant of the concept of &lt;a href="https://www.influxdata.com/glossary/cardinality/"&gt;cardinality&lt;/a&gt; to maintain good performance. InfluxDB IOx completely removes this concern. The way it stores and queries data does not require any notion of cardinality. This means that users can create a schema based on the way they think about the data, and how it is collected, without worrying about slowing down their queries!&lt;/p&gt;

&lt;h1 id="every-journey-starts-with-a-single-step"&gt;Every journey starts with a single step&lt;/h1&gt;

&lt;p&gt;This release is just the beginning for InfluxDB IOx. We are already hard at work on the next steps. This includes:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Optimize, optimize, optimize. We already see lots of opportunities to make InfluxDB IOx even faster, so expect the database to get faster and faster over the coming months.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Fast InfluxQL. We are implementing native InfluxQL support into DataFusion. This means that users with workloads from previous versions of InfluxDB who used InfluxQL for their queries will be able to get the benefits of InfluxDB IOx with minimal changes to their application.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Single Tenant and Enterprise versions. We are happy to release InfluxDB IOx to new Multi-tenant customers, but over the course of the year, you can expect offerings for customers who want InfluxDB IOX in a single tenant managed service, as well as those who want to manage InfluxDB IOx themselves.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;InfluxDB IOx is a real game changer in multiple ways, and I have to admit that I feel exhilarated. This feeling reminds me of paddling out to a break of waves when surfing. That combination of excitement but bracing for the unknown really makes me feel alive.&lt;/p&gt;

&lt;p&gt;Aside from deepening InfluxData’s commitment to open source, it allows better performance on bigger workloads, with support for all of your team’s favorite tools. If you are an existing InfluxDB user, you will have to think about how you use InfluxDB a bit differently to realize these benefits, but I think you will agree that the benefits will be extreme as you bring your new time series workloads to InfluxDB IOx.&lt;/p&gt;

&lt;p&gt;At the time of writing, InfluxDB IOx is the engine powering InfluxDB Cloud in two specific regions. &lt;a href="https://cloud2.influxdata.com/signup"&gt;Sign up for an InfluxDB Cloud account&lt;/a&gt; in either the US East 1 (Virginia) or EU Central 1 (Frankfurt) regions. We plan to roll out the InfluxDB IOx engine to additional regions and cloud providers in the near future.&lt;/p&gt;
</description>
      <pubDate>Fri, 10 Mar 2023 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/why-influxdb-iox-big-deal-to-me/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/why-influxdb-iox-big-deal-to-me/</guid>
      <category>Product</category>
      <category>Use Cases</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>More Capabilities, Less Code: Announcing Platform New Features at InfluxDays 2022</title>
      <description>&lt;p&gt;The InfluxDB platform has evolved a lot over the past decade. But with every innovation we’ve added to the platform, the focus behind our efforts has remained the same: &lt;b&gt;Build cool stuff for people who build cool stuff.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;What we mean by this is we want to make it incredibly easy for users to build valuable applications with their time series data. We do that by offering a wide range of tools, features, and resources that meet builders on their terms. Whether that means Telegraf plugins for your systems, &lt;a href="https://www.influxdata.com/products/data-collection/influxdb-client-libraries/"&gt;client libraries&lt;/a&gt; in the language(s) you know best, or providing &lt;a href="https://www.influxdata.com/products/data-collection/cloud-native/"&gt;cloud-to-cloud data collection&lt;/a&gt;, we want to facilitate as many options as possible to make every touchpoint experience with time series data as quick as possible.&lt;/p&gt;

&lt;p&gt;Today, at &lt;a href="https://www.influxdays.com/"&gt;InfluxDays 2022&lt;/a&gt;, our annual user and community event, we’re announcing even more new features – all built to deliver more capabilities and less code for our users.&lt;/p&gt;

&lt;p&gt;With no further ado, here are the latest updates to the InfluxDB platform:&lt;/p&gt;

&lt;h2 id="flux-10"&gt;Flux 1.0&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.influxdata.com/products/flux/"&gt;Flux&lt;/a&gt; is our native scripting and query language. It began as a re-thinking of our original query language, &lt;a href="https://docs.influxdata.com/influxdb/v1.8/query_language/"&gt;InfluxQL&lt;/a&gt;; however, Flux is much more powerful, capable of very complex tasks and data transformations. With Flux 1.0, we’re bringing even more collaboration, flexibility, and stability to the language:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;b&gt;Flux Editions:&lt;/b&gt; Flux is widely used across many applications so any breaking changes can disrupt your applications. Starting with Flux 1.0, you can control whether you want to accept a breaking change, ensuring Flux is always a stable platform. Along with Flux 1.0 we are introducing Flux Editions, a new feature that lets you opt into new features that would otherwise be breaking changes. Flux Editions are opt-in to give you control over when you want to upgrade to get new features. You can also have different segments of your systems running on different Flux Editions, so if a Flux feature update is critical to certain aspects of your system, you can choose to only update those. As a result, you can prioritize the stability of your system against the need for new features and capabilities available in Flux. Together these updates ensure Flux has the best of both worlds – a stable language platform and cutting-edge features.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;Modules:&lt;/b&gt; Modules are an exciting addition to Flux. They follow industry standards, so if you’re familiar with other language packaging systems you’ll get the gist of Flux modules pretty quickly. Each Flux module can be versioned and is immutable, providing a safe ecosystem for shared code.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;Polymorphism:&lt;/b&gt; This feature works in tandem with Flux modules. Polymorphism allows you to use labels to write Flux scripts that are independent of your data schema. Labels are a new type we’ve added to Flux to allow you to abstract column names in your code. This makes them easier to share and repurpose across disparate buckets, so you can do more with less effort.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;Dynamic Type:&lt;/b&gt; This last feature is another new type in Flux 1.0 that allows any JSON data to be accurately represented with Flux data types. Flux is a strictly typed language and JSON is not, so dynamic type allows users to correctly represent JSON data in Flux.
  &lt;img src="//images.ctfassets.net/o7xu9whrs0u9/1Qj35IIu3tQ5OtVC17RRxl/38943edd95c5f95d328c7ddc62604f0a/Dynamic_Type.png" alt="Get JavaScript builders to get setup" width="700" height="321" /&gt;  
  &lt;/li&gt;
 &lt;/ul&gt;

&lt;h2 id="telegraf-custom-builder"&gt;Telegraf Custom Builder&lt;/h2&gt;

&lt;p&gt;Of course, before you can work with your data you need to get it into InfluxDB. Custom Builder is a key new feature that makes &lt;a href="https://www.influxdata.com/time-series-platform/telegraf/"&gt;Telegraf&lt;/a&gt;, our open source data collection agent, more nimble and resource-friendly – especially for IoT users.&lt;/p&gt;

&lt;p&gt;As we continue to add new plugins to the Telegraf binary, its size increases significantly. But rarely do developers need to use each of these plugins, and still others have limited resources that would benefit from a smaller binary. Custom Builder allows you to quickly and easily build custom Telegraf agents that include only the plugins you need. This saves space and compute resources for your data collection. The tool scans a Telegraf configuration file and builds a binary that only includes the selected plugins.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/138OermonGL5A5j6IsC9bY/279cacd91badb4b962b05d8e75487857/Telegraf_Custom_Builder.png" alt="Telegraf Custom Builder" /&gt;&lt;/p&gt;

&lt;h2 id="query-experience"&gt;Query Experience&lt;/h2&gt;

&lt;p&gt;The last feature we’re announcing focuses on the Query Experience within &lt;a href="https://www.influxdata.com/products/influxdb-cloud/"&gt;InfluxDB Cloud&lt;/a&gt; to make it faster and easier to contextualize data in the UI. The biggest update is the new Script Editor, which includes an instructional, integrated schema browser so that you can better understand the shape of your data, as well as upcoming multi-language support for Python and JavaScript. You can also combine point-and-click schema selections with raw script editing, enabling you to work with code faster and in a way that suits your needs.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/3GVHwphNq6bRT2AXMNyJzK/df5a48206ff3a6b58bf22c937cc97d4e/Query_Experience-Data_Explorer.png" alt="Query Experience-Data Explorer" /&gt;&lt;/p&gt;

&lt;p&gt;The new editor framework also supports multiple query languages. This comes on the heels of last week’s announcement for the &lt;a href="https://www.influxdata.com/products/influxdb-engine/"&gt;new InfluxDB time series engine&lt;/a&gt;, which includes new language support for SQL queries natively, using the PostgresQL Wire protocol, and other third-party query tools that leverage that protocol as well.&lt;/p&gt;

&lt;h2 id="all-this-and-more-at-influxdays"&gt;All this and more at InfluxDays&lt;/h2&gt;

&lt;p&gt;Tune into &lt;a href="https://www.influxdays.com/"&gt;InfluxDays 2022&lt;/a&gt; to learn more about these new features from InfluxData’s product managers and engineers who built them. Throughout the event, we’ll be talking more about how these new features extend the flexibility of InfluxDB, Telegraf, and Flux, and make the development process more efficient than ever.&lt;/p&gt;

&lt;p&gt;It’s been a packed year for InfluxData and we’re excited to showcase the breadth and depth of our latest solutions, and outline the next evolution of  &lt;a href="https://www.influxdata.com/products/"&gt;InfluxDB: the Smart Data Platform&lt;/a&gt; during the next two days.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/5hpi1JhOrCExldaEx90tRT/a0281b78395fc376f1accff450bb5ef0/InfluxDB_Smart_Data_Platform.png" alt="InfluxDB Smart Data Platform" /&gt;&lt;/p&gt;

&lt;p&gt;We’re really excited to see what you build with these latest updates. Be sure to let us know what you think in our Slack channel and &lt;a href="https://community.influxdata.com/"&gt;community forums&lt;/a&gt;. See you there!&lt;/p&gt;
</description>
      <pubDate>Wed, 02 Nov 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/influxdays-2022-platform-features/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/influxdays-2022-platform-features/</guid>
      <category>Product</category>
      <category>Company</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>Building an IoT App with InfluxDB Cloud, Python and Flask (Part 3)</title>
      <description>&lt;p&gt;Last year I started an IoT project, Plant Buddy. This project entailed soldering some sensors to an Arduino, and teaching that device how to communicate directly with InfluxDB Cloud so that I could monitor those plants. Now I am taking that concept a step further and writing the app for plantbuddy.com. This app will allow users to visualize and create alerts from their uploaded Plant Buddy device data in a custom user experience.&lt;/p&gt;

&lt;p&gt;If you want some background, you can check out &lt;a href="/blog/prototyping-iot-with-influxdb-cloud-2-0/"&gt;Part 1&lt;/a&gt;, where I design the basics of the device and teach it to communicate and add some notifications, and &lt;a href="/blog/iot-prototyping-with-influxdb-cloud-2-0-part-2-queries-tasks-and-dashboards/"&gt;Part 2&lt;/a&gt;, where I start some downsampling and a dashboard. (In &lt;a href="/blog/plant-buddy-part-4-using-the-ui/"&gt;Part 4&lt;/a&gt;, &lt;a href="/blog/author/barbara-nelson/"&gt;Barbara Nelson&lt;/a&gt; provides a sequel to this article in which she shows how to use the UI, not the CLI, for the same functionality.)&lt;/p&gt;

&lt;h2&gt;Objectives (building on InfluxDB Cloud)&lt;/h2&gt;

&lt;p&gt;This tutorial is written very much with the application developer in mind. I happen to be using Python and Flask for this tutorial, but all of the concepts should apply equally to any language or web development framework.&lt;/p&gt;

&lt;p&gt;This tutorial will cover:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;How to bootstrap your development environment to get developing on InfluxDB Cloud&lt;/li&gt;
 	&lt;li&gt;How to use the InfluxDB Python Client library to receive data from your users, and write that data to InfluxDB Cloud&lt;/li&gt;
 	&lt;li&gt;How to query InfluxDB Cloud so that your app can visualize the data for your users.&lt;/li&gt;
 	&lt;li&gt;How to install a downsampling task in InfluxDB that will both save you money in storage costs as well as optimize the user experience for your users&lt;/li&gt;
 	&lt;li&gt;How to use InfluxDB Cloud's powerful "Checks and Notifications" system to help you provide custom alerting to your users&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This tutorial is meant to focus on InfluxDB, not IoT applications in general, so I will not be focused on things like:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Creating your device and sending data from it&lt;/li&gt;
 	&lt;li&gt;User authorization for your own web app&lt;/li&gt;
 	&lt;li&gt;Managing secrets in your own web app&lt;/li&gt;
 	&lt;li&gt;How to use the many powerful visualization libraries for the different platforms&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Where's the code?&lt;/h3&gt;

&lt;p&gt;The code for this tutorial is available on GitHub. There are two branches:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;If you want to see just the starting point, that is in the &lt;a href="https://github.com/rickspencer3/plant-buddy/tree/demo-start"&gt;demo-start branch&lt;/a&gt;.&lt;/li&gt;
 	&lt;li&gt;If you want to see the completed project, that is in the &lt;a href="https://github.com/rickspencer3/plant-buddy/tree/demo-done"&gt;demo-done branch&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;plantbuddy.com Overview&lt;/h2&gt;

&lt;h3&gt;Data segmentation strategies&lt;/h3&gt;

&lt;p&gt;Before we start off bootstrapping the development environment, here is a bit of food for thought regarding if and how to segment your user data. This section is fairly conceptual, so you can feel free to skip it.&lt;/p&gt;

&lt;p&gt;Most likely, your application will have many users. You will generally want to only display to each user the data that they wrote into the application. Depending on the scale of your application and other needs, there are 3 general approaches to consider.&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;&lt;strong&gt;Single Multi-User Bucket.&lt;/strong&gt; This is the default approach. In this approach, you put all of your users' data into a single bucket in InfluxDB and you use a different tag for each user to enable fast querying.&lt;/li&gt;
 	&lt;li&gt;&lt;strong&gt;Bucket per User.&lt;/strong&gt; In this approach you create a separate bucket for each user. This approach is more complex to manage in your app, but allows you to provide read and write tokens to your users so that they can read and write directly from their own bucket if needed for your application.&lt;/li&gt;
 	&lt;li&gt;&lt;strong&gt;Multi-Org.&lt;/strong&gt; In this approach, you create a separate organization for each user. This provides the ultimate separation of customer data, but is more complex to set up and manage. This approach is useful when you are acting as a sort of OEM, where your customers have their own customers and you need to provide management tools to your customers.&lt;/li&gt;
&lt;/ol&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="width: 19%;"&gt;&lt;span style="font-size: 18px; color: #292933; font-weight: 500;"&gt;Approach&lt;/span&gt;&lt;/td&gt;
&lt;td style="width: 27%;"&gt;&lt;span style="font-size: 18px; color: #292933; font-weight: 500;"&gt;Pros&lt;/span&gt;&lt;/td&gt;
&lt;td style="width: 27%;"&gt;&lt;span style="font-size: 18px; color: #292933; font-weight: 500;"&gt;Cons&lt;/span&gt;&lt;/td&gt;
&lt;td style="width: 27%;"&gt;&lt;span style="font-size: 18px; color: #292933; font-weight: 500;"&gt;Summary&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span style="font-size: 18px; color: #292933; font-weight: 500;"&gt;Multi-user Bucket&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;
&lt;ul&gt;
 	&lt;li&gt;Easy to aggregate across customers&lt;/li&gt;
 	&lt;li&gt;Easiest to set up&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;ul&gt;
 	&lt;li&gt;Requires authentication through custom gateway&lt;/li&gt;
 	&lt;li&gt;Effort to offboard customers&lt;/li&gt;
 	&lt;li&gt;One retention policy for all customers&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td&gt;Default approach, using tags to segment customers in a single bucket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span style="font-size: 18px; color: #292933; font-weight: 500;"&gt;Bucket per User&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;
&lt;ul&gt;
 	&lt;li&gt;Reads and writes with no gateway&lt;/li&gt;
 	&lt;li&gt;Multiple RP&lt;/li&gt;
 	&lt;li&gt;Easy to offboard customers&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;ul&gt;
 	&lt;li&gt;Harder to aggregate across customers&lt;/li&gt;
 	&lt;li&gt;Effort to manage&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td&gt;Useful when segmenting customer data is important, but more effort. Allows direct reads and writes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span style="font-size: 18px; color: #292933; font-weight: 500;"&gt;Multi-Org&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;
&lt;ul&gt;
 	&lt;li&gt;Full data and compute segmentation&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;ul&gt;
 	&lt;li&gt;Extra layer of complexity&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td&gt;Ultimate in customer data segmentation. Useful when your customers have their own customers.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Due to the relatively simple and low-sensitivity nature of Plant Buddy, the default approach of Multi-user Bucket makes the most sense. Plant Buddy has a fairly simple schema. But if you are interested in learning more about these topics, you can read up on data layout and schema design &lt;a href="/blog/data-layout-and-schema-design-best-practices-for-influxdb/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;IoT application and InfluxDB Cloud architecture overview&lt;/h3&gt;

&lt;p&gt;Zooming way out, Plant Buddy will work more or less like this:&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/IoT-application-and-InfluxDB-Cloud-architecture-overview.png" alt="IoT-application-and-InfluxDB-Cloud-architecture-overview" width="622" height="447" /&gt;&lt;/p&gt;

&lt;p&gt;At the top level is the user experience. They will be able to view the sensor readings in their web browser, and each user can have any number of Plant Buddies reporting data.&lt;/p&gt;

&lt;p&gt;The Plant Buddy application itself is roughly divided into two parts: what I call the “Write Gateway” and the “Read Gateway.” These “gateways” are, in fact, simply endpoints in the Plant Buddy Flask application, as you will see, but they could easily be divided into their own applications.&lt;/p&gt;

&lt;p&gt;User authentication for reads and writes between the user and their devices are handled by Plant Buddy itself. Each device and each browser request should be authorized by Plant Buddy.&lt;/p&gt;

&lt;p&gt;Finally there is the backend, which is InfluxDB. The read gateway and the write gateway write to the same bucket (to start, more on that later when we introduce downsampling). We will also be taking advantage of &lt;a href="https://docs.influxdata.com/influxdb/cloud/process-data/get-started/"&gt;InfluxDB’s task&lt;/a&gt; system and the provided _tasks and _monitoring &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/internals/system-buckets/"&gt;system buckets&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Authentication between the Plant Buddy code and the InfluxDB backend is managed by InfluxDB tokens, which are stored as secrets in the web app.&lt;/p&gt;

&lt;h3&gt;Plant Buddy starter code&lt;/h3&gt;

&lt;p&gt;To start out, Plant Buddy is able to receive information from a user’s device, but can only parse the data that is sent and print it out. Writing to InfluxDB is not yet implemented.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;@app.route("/write", methods = ['POST'])
def write():
   user = users.authorize_and_get_user(request)
   d = parse_line(request.data.decode("UTF-8"), user["user_name"])
   print(d, flush=True)
   return {'result': "OK"}, 200

def parse_line(line, user_name):
   data = {"device" : line[:2],
           "sensor_name" : sensor_names.get(line[2:4], "unknown"),
           "value" : line[4:],
           "user": user_name}
   return data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Likewise, there is an &lt;a href="https://github.com/rickspencer3/plant-buddy/blob/demo-start/src/templates/home.html"&gt;html template set&lt;/a&gt; up that contains a slot for rendering a graph, but so far only prints the user name extracted from the request object.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;@app.route("/")
def index():
   user = users.authorize_and_get_user(request)

   return render_template("home.html",
                           user_name = user["user_name"],
                           graph_code = "&amp;lt;i&amp;gt;put graph here&amp;lt;/i&amp;gt;"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can see it all put together in VSCode:&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Plant-Buddy-VSCode.png" alt="Plant Buddy-VSCode" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;You can see the full starter code &lt;a href="https://github.com/rickspencer3/plant-buddy/blob/demo-start/src/app.py"&gt;here&lt;/a&gt;. Essentially, the web application is set up with:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;A user's module for validating users and extracting the user name (&lt;a href="https://github.com/rickspencer3/plant-buddy/blob/demo-done/src/users.py"&gt;this is fake&lt;/a&gt;).&lt;/li&gt;
 	&lt;li&gt;A secrets store module to securely store secrets for InfluxDB (&lt;a href="https://github.com/rickspencer3/plant-buddy/blob/demo-done/src/secret_store.py"&gt;also fake&lt;/a&gt;).&lt;/li&gt;
 	&lt;li&gt;An endpoint for accepting data from the Plant Buddy devices.&lt;/li&gt;
 	&lt;li&gt;An html template for the Flask server to render&lt;/li&gt;
 	&lt;li&gt;An endpoint for rendering that html template&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As you can see, I am running the application on my local host, within a Docker container to simplify development.&lt;/p&gt;
&lt;h2&gt;Bootstrapping IoT application development for InfluxDB Cloud&lt;/h2&gt;
&lt;p&gt;Now that we have all of that introductory material out of the way, it’s time to get set up to use InfluxDB!&lt;/p&gt;

&lt;p&gt;Prerequisites:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Set up an &lt;a href="https://cloud2.influxdata.com/signup"&gt;InfluxDB Cloud account&lt;/a&gt;. A free account is fine.&lt;/li&gt;
 	&lt;li&gt;Set up &lt;a href="https://code.visualstudio.com/docs/introvideos/basics"&gt;VSCode&lt;/a&gt;.&lt;/li&gt;
 	&lt;li&gt;Download and install the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/get-started/"&gt;Influx CLI&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Set up the InfluxDB CLI&lt;/h3&gt;

&lt;p&gt;I have an empty Cloud account, and I have the CLI installed on my developer laptop. It is possible to use the CLI right out of the gate, but I find it a hassle to constantly supply the information like the region I am targeting, my organization name, etc — So, it is much easier to set up a config for the CLI. In fact, you can set up numerous configs and easily switch between them, but for now, I will only need one.&lt;/p&gt;

&lt;p&gt;The Influx CLI has a good help system, so I will just ask how to create one:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx config create -h
	The influx config create command creates a new InfluxDB connection configuration
	and stores it in the configs file (by default, stored at ~/.influxdbv2/configs).

	Examples:
		# create a config and set it active
		influx config create -a -n $CFG_NAME -u $HOST_URL -t $TOKEN -o $ORG_NAME&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This tells me that I will need to supply a token, the org name, the host url, as well as to name the config.&lt;/p&gt;

&lt;p&gt;First, I will need to navigate to the InfluxDB UI to &lt;a href="https://docs.influxdata.com/influxdb/cloud/security/tokens/create-token/"&gt;generate my first token&lt;/a&gt;. I do this by using the lefthand Nav in the UI to go to &lt;strong&gt;Data&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Tokens&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Load-Data-Tokens.png" alt="Load Data - Tokens" width="1000" height="462" /&gt;&lt;/p&gt;

&lt;p&gt;Then click &lt;strong&gt;Generate Token&lt;/strong&gt; and choose an &lt;strong&gt;All Access Token&lt;/strong&gt;, and provide it a name.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Generate-All-Access-Token.png" alt="Generate All Access Token" width="542" height="331" /&gt;&lt;/p&gt;

&lt;p&gt;Then I can grab the token string by clicking on the token in the list:&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/CLI-Token-1.png" alt="CLI-Token" width="980" height="824" /&gt;&lt;/p&gt;

&lt;p&gt;If I forgot my organization name, I can grab it from the About page.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/create-the-profile.png" alt="create the profile" width="1000" height="456" /&gt;&lt;/p&gt;

&lt;p&gt;Let’s stick it all into the CLI and create the profile:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx config create -t cgIsJ7uamKESRiDkRz2mNPScXw_K_zswiOfdZmIQMina1TCtWk2NGu3VssF7cJPPf-QR88nPdFFrlC9GleTpwQ== -o rick+plantbuddy@influxdata.com -u https://eastus-1.azure.cloud2.influxdata.com/ -n plantbuddy -a&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Produces the following output:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;Active  Name            URL                                             Org
*       plantbuddy      https://eastus-1.azure.cloud2.influxdata.com/   rick+plantbuddy@influxdata.com&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We now have a plantbuddy CLI config created.&lt;/p&gt;
&lt;h3&gt;Create a bucket to hold users' data&lt;/h3&gt;
&lt;p&gt;Next step is to create a bucket in InfluxDB to hold the users’ data. This is done with the &lt;a href="https://docs.influxdata.com/influxdb/cloud/organizations/buckets/create-bucket/"&gt;bucket create&lt;/a&gt; command:&lt;/p&gt;

&lt;p&gt;&lt;code class="language-bash"&gt;$ influx bucket create -n plantbuddy&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Which produces the following output:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;ID                      Name            Retention       Shard group duration    Organization ID
f8fbced4b964c6a4        plantbuddy      720h0m0s        n/a                     f1d35b5f11f06a1d&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This created a bucket for me with the default maximum retention period, the duration the bucket retains data for.&lt;/p&gt;
&lt;h3&gt;Upload some Line Protocol&lt;/h3&gt;
&lt;p&gt;Now that I have a bucket, I can upload some data to it so that I can start testing right away, rather than working on writing data from my Python code first. I generated 2 days of sample data.&lt;/p&gt;

&lt;p&gt;The line protocol format is &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/"&gt;well-documented&lt;/a&gt;. But here is an excerpt:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;light,user=rick,device_id=01 reading=26i 1621978469468115968
soil_moisture,user=rick,device_id=01 reading=144i 1621978469468115968
humidity,user=rick,device_id=01 reading=68i 1621978469468115968
soil_temp,user=rick,device_id=01 reading=67i 1621978469468115968
air_temp,user=rick,device_id=01 reading=72i 1621978469468115968
light,user=rick,device_id=01 reading=28i 1621978529468115968
soil_moisture,user=rick,device_id=01 reading=140i 1621978529468115968
humidity,user=rick,device_id=01 reading=65i 1621978529468115968
soil_temp,user=rick,device_id=01 reading=67i 1621978529468115968
air_temp,user=rick,device_id=01 reading=72i 1621978529468115968&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This shows that I am reporting 5 measurements: light, soil_moisture, humidity, soil_temp, and air_temp. For each point, there is a user tag and a device id tag. Note that this solution will only work when users are expected to have a limited number of devices. Otherwise, the combination of tag values could end up blowing out your cardinality. The field name is “reading” which has an integer value. Using a quick and dirty Python script, I’ve generated one of these measurements per minute for the last 48 hours, so it’s a long-ish file.&lt;/p&gt;

&lt;p&gt;&lt;code class="language-bash"&gt;$ influx write -b plantbuddy -f ~/generated.lp&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;No error was reported, so it looks like it worked, but let’s run a query to make sure.&lt;/p&gt;

&lt;h3&gt;Run a query&lt;/h3&gt;

&lt;p&gt;I can run a test query with the &lt;a href="https://docs.influxdata.com/influxdb/cloud/query-data/execute-queries/influx-query/"&gt;influx query command&lt;/a&gt; from the CLI like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-bash"&gt;$ influx query "from(bucket: \"plantbuddy\") |&amp;gt; range(start: 0)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This produces a lot of output, so I know that uploading the line protocol worked, but I can refine the query to get some tighter output.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-bash"&gt;$ influx query "from(bucket: \"plantbuddy\") |&amp;gt; range(start: 0) |&amp;gt; last() |&amp;gt; keep(columns: [\"_time\",\"_measurement\",\"_value\"]) |&amp;gt; group()"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This query has the following output:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;Result: _result
Table: keys: []
                    _time:time                  _value:float     _measurement:string  
------------------------------  ----------------------------  ----------------------  
2021-05-27T16:00:22.881946112Z                            70                air_temp  
2021-05-27T16:00:22.881946112Z                            69                humidity  
2021-05-27T16:00:22.881946112Z                            36                   light  
2021-05-27T16:00:22.881946112Z                           173           soil_moisture  
2021-05-27T16:00:22.881946112Z                            66               soil_temp&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, I already have data and can start exploring it with queries, but iterating on a query via the CLI is not particularly easy. So, let’s install and set up the InfluxDB VSCode plugin.&lt;/p&gt;

&lt;h3&gt;Set up VSCode and the Flux extension for InfluxDB Cloud&lt;/h3&gt;

&lt;p&gt;As a VSCode user, I naturally would like to use their editor when writing my Flux queries, as well keep my whole project under source control. So, the next step is to &lt;a href="https://docs.influxdata.com/influxdb/cloud/tools/flux-vscode/"&gt;set up the InfluxDB VSCode Extension&lt;/a&gt;. The &lt;a href="https://marketplace.visualstudio.com/items?itemName=influxdata.flux"&gt;Flux extension&lt;/a&gt; is easy to find by searching for “Influx” in the extension manager.&lt;/p&gt;

&lt;p&gt;After installing the extension, you can see an InfluxDB window added to the bottom left.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/VSCode-and-the-Flux-extension-for-InfluxDB-Cloud.png" alt="VSCode and the Flux extension for InfluxDB Cloud" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;Now I can use that window to set up a connection with my Cloud account by giving focus to the InfluxDB window and clicking the &lt;strong&gt;+&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;If I forgot any information that I need to configure the Flux extension, I can use the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/cli/influx/config/"&gt;influx config &lt;/a&gt;command to my credentials:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx config
Active  Name            URL                                             Org
*       plantbuddy      https://eastus-1.azure.cloud2.influxdata.com/   rick+plantbuddy@influxdata.com&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I can find my token strings with the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/cli/influx/auth/"&gt;influx auth list&lt;/a&gt; command like this:&lt;/p&gt;

&lt;p&gt;&lt;code class="language-bash"&gt;$ influx auth list&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Because my org name has some special characters, I need to provide my organization id instead of my org name. I can find the org id  with the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/cli/influx/org/list/"&gt;influx org list&lt;/a&gt; command like this:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx org list
ID                      Name
f1d35b5f11f06a1d        rick+plantbuddy@influxdata.com&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then complete the form.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Complete-the-form.png" alt="Complete the form" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;Using the &lt;strong&gt;Test&lt;/strong&gt; button, I can see that the connection worked, and then save it. After I save, notice that the InfluxDB window is now populated, and I can browse the measurements.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/InfluxDB-window-populated.png" alt="InfluxDB window populated" width="230" height="251" /&gt;&lt;/p&gt;

&lt;h3&gt;Query InfluxDB Cloud from the Code Editor&lt;/h3&gt;

&lt;p&gt;Now that the connection is set up, I typically add a scratch Flux file to the project to run Flux queries with.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Query-InfluxDB-Cloud-from-the-Code-Editor.png" alt="" width="980" height="281" /&gt;&lt;/p&gt;

&lt;p&gt;The first thing to notice is that formatting works. The Flux extension also includes statement completion and &lt;em&gt;in situ&lt;/em&gt; help as you would expect.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Browser-Preview-Plant-Buddy.png" alt="" width="980" height="398" /&gt;&lt;/p&gt;

&lt;p&gt;Then I can try to run the query directly from the editor.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Run-Query.png" alt="Run-Query" width="980" height="755" /&gt;&lt;/p&gt;

&lt;p&gt;This provides my results in a clean grid view, which is easier to use than results dumped to the terminal.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/grid-view-results.png" alt="grid view results" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;Note that information about the extension is written to the Flux language tab of the OUTPUT window. This is especially useful if your Flux has errors.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Output-Window.png" alt="Output Window" width="1000" height="277" /&gt;&lt;/p&gt;

&lt;p&gt;Now that I have my development environment set up for InfluxDB, let’s implement the backend.&lt;/p&gt;

&lt;h2&gt;Implement writes&lt;/h2&gt;

&lt;p&gt;Now it’s back to writing some Python to make the web app work. First I am going to handle the calls to the /write endpoint by converting those calls to actually put the data into InfluxDB.&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;I already have a bucket called "plantbuddy" created for this.&lt;/li&gt;
 	&lt;li&gt;Next I will create a new token that has permission to read and write to that bucket.&lt;/li&gt;
 	&lt;li&gt;Then I will import the &lt;a href="https://github.com/influxdata/influxdb-client-python"&gt;InfluxDB v2 Python client library&lt;/a&gt;.&lt;/li&gt;
 	&lt;li&gt;Then I will create &lt;a href="https://github.com/influxdata/influxdb-client-python/blob/f851303692f25596bca7d495ae383d05bb4c050e/influxdb_client/client/write/point.py#L45"&gt;Points&lt;/a&gt; and write them with the client library.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Create a token&lt;/h3&gt;

&lt;p&gt;First, I will need a token that has permissions to write and read from the bucket. You can see from the &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/cli/influx/auth/create/"&gt;influx auth create&lt;/a&gt; help, that there are a lot of options for controlling the permissions for a token.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;k$ influx auth create -h
Create authorization

Usage:
  influx auth create [flags]

Flags:
  -c, --active-config string          Config name to use for command; Maps to env var $INFLUX_ACTIVE_CONFIG
      --configs-path string           Path to the influx CLI configurations; Maps to env var $INFLUX_CONFIGS_PATH (default "/Users/rick/.influxdbv2/configs")
  -d, --description string            Token description
  -h, --help                          Help for the create command 
      --hide-headers                  Hide the table headers; defaults false; Maps to env var $INFLUX_HIDE_HEADERS
      --host string                   HTTP address of InfluxDB; Maps to env var $INFLUX_HOST
      --json                          Output data as json; defaults false; Maps to env var $INFLUX_OUTPUT_JSON
  -o, --org string                    The name of the organization; Maps to env var $INFLUX_ORG
      --org-id string                 The ID of the organization; Maps to env var $INFLUX_ORG_ID
      --read-bucket stringArray       The bucket id
      --read-buckets                  Grants the permission to perform read actions against organization buckets
      --read-checks                   Grants the permission to read checks
      --read-dashboards               Grants the permission to read dashboards
      --read-dbrps                    Grants the permission to read database retention policy mappings
      --read-notificationEndpoints    Grants the permission to read notificationEndpoints
      --read-notificationRules        Grants the permission to read notificationRules
      --read-orgs                     Grants the permission to read organizations
      --read-tasks                    Grants the permission to read tasks
      --read-telegrafs                Grants the permission to read telegraf configs
      --read-user                     Grants the permission to perform read actions against organization users
      --skip-verify                   Skip TLS certificate chain and host name verification.
  -t, --token string                  Authentication token; Maps to env var $INFLUX_TOKEN
  -u, --user string                   The user name
      --write-bucket stringArray      The bucket id
      --write-buckets                 Grants the permission to perform mutative actions against organization buckets
      --write-checks                  Grants the permission to create checks
      --write-dashboards              Grants the permission to create dashboards
      --write-dbrps                   Grants the permission to create database retention policy mappings
      --write-notificationEndpoints   Grants the permission to create notificationEndpoints
      --write-notificationRules       Grants the permission to create notificationRules
      --write-orgs                    Grants the permission to create organizations
      --write-tasks                   Grants the permission to create tasks
      --write-telegrafs               Grants the permission to create telegraf configs
      --write-user                    Grants the permission to perform mutative actions against organization users&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code class="language-bash"&gt;--write-bucket&lt;/code&gt; and &lt;code class="language-bash"&gt;--read-bucket&lt;/code&gt; are the options I am looking for. These options take the bucket id rather than the bucket name. The id is easy to find with the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/cli/influx/bucket/list/"&gt;influx bucket list&lt;/a&gt; command:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx bucket list
ID                      Name            Retention       Shard group duration    Organization ID
d6ec11a304c652aa        _monitoring     168h0m0s        n/a                     f1d35b5f11f06a1d
fe25b83e9e002181        _tasks          72h0m0s         n/a                     f1d35b5f11f06a1d
f8fbced4b964c6a4        plantbuddy      720h0m0s        n/a                     f1d35b5f11f06a1d&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I can create the token with the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/cli/influx/auth/create/"&gt;influx auth create&lt;/a&gt; command like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-bash"&gt;$ influx auth create --write-bucket f8fbced4b964c6a4 --read-bucket f8fbced4b964c6a4&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output includes the token string which I can then register with my app server’s secrets store.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;ID                      Description     Token                                                                                           User Name                       User ID                 Permissions
0797c045bc99e000                        d0QnHz8bTrQU2XI798YKQzmMQY36HuDPRWiCwi8Lppo1U4Ej5IKhCC-rTgeRBs3MgWsomr-YXBbDO3o4BLJe9g==        rick+plantbuddy@influxdata.com  078bedcd5c762000        [read:orgs/f1d35b5f11f06a1d/buckets/f8fbced4b964c6a4 write:orgs/f1d35b5f11f06a1d/buckets/f8fbced4b964c6a4]&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Import and set up the InfluxDB Python Library&lt;/h3&gt;

&lt;p&gt;After installing the &lt;a href="https://github.com/influxdata/influxdb-client-python"&gt;InfluxDB Python Client&lt;/a&gt; in your environment, the next steps are to:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Import the library.&lt;/li&gt;
 	&lt;li&gt;Set up the client.&lt;/li&gt;
 	&lt;li&gt;Create the write and query APIs.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;import influxdb_client

client = influxdb_client.InfluxDBClient(
   url = "https://eastus-1.azure.cloud2.influxdata.com/",
   token = secret_store.get_bucket_secret(),
   org = "f1d35b5f11f06a1d"
)

write_api = client.write_api()
query_api = client.query_api()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I added this code at the top of my code file so I can use the write and query APIs easily throughout the module.&lt;/p&gt;

&lt;h3&gt;Create and write Points&lt;/h3&gt;

&lt;p&gt;Now that I have the client library and the related API’s set up, I can change my code from simply printing out the data that the user is uploading to actually saving it. The first part is to create a Point, which is an Object that represents the data I want to write.&lt;/p&gt;

&lt;p&gt;I’ll do this by creating a new function “write_to_influx”:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;def write_to_influx(data):
   p = influxdb_client.Point(data["sensor_name"]).tag("user",data["user"]).tag("device_id",data["device"]).field("reading", int(data["value"]))
   write_api.write(bucket="plantbuddy", org="f1d35b5f11f06a1d", record=p)
   print(p, flush=True)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The method receives a Python dictionary, and extracts the values to use for tags and the field. You can include multiple tags and fields in a Point, but Plant Buddy only uses a single field, “reading.” It also prints out the point at the end of the function, mostly so it is possible to see it working.&lt;/p&gt;

&lt;p&gt;Now I can update my write endpoint to use that function:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;@app.route("/write", methods = ['POST'])
def write():
   user = users.authorize_and_get_user(request)
   d = parse_line(request.data.decode("UTF-8"), user["user_name"])
   write_to_influx(d)
   return {'result': "OK"}, 200&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I happen to have a Plant Buddy already writing points to the server, and it looks like it is working.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Plant-Buddy-writing-points-to-the-server.png" alt="Plant Buddy writing points to the server" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;By running a simple query, I can confirm that the data is being loaded into my bucket.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/data-is-being-loaded-into-bucket.png" alt="data is being loaded into bucket" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;h2&gt;Implement reads&lt;/h2&gt;

&lt;p&gt;The Plant Buddy webpage will be very simple to start. It will display a graph with the last 48 hours of Plant Buddy data. To make this work, I need to:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Write a query that fetches the data&lt;/li&gt;
 	&lt;li&gt;Use the &lt;a href="https://github.com/influxdata/influxdb-client-python"&gt;InfluxDB v2 Python client library&lt;/a&gt; to fetch the data&lt;/li&gt;
 	&lt;li&gt;Create a graph&lt;/li&gt;
 	&lt;li&gt;Loop through the results and add them to the graph&lt;/li&gt;
 	&lt;li&gt;Display the graph in the web page&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Write the query&lt;/h3&gt;

&lt;p&gt;This is a simple query to write.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;from(bucket: "plantbuddy")
   |&amp;gt; range(start: -48h)
   |&amp;gt; filter(fn: (r) =&amp;gt; r.user == "rick")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This brings back a lot of data. What is important to understand is that the data comes back organized by time series. A time series is organized first by measurements, and then further broken down by tag values and fields. Each time series is then essentially a separate table of related data ordered by timestamp. Perfect for graphing, each time series table is a line in a graph.&lt;/p&gt;

&lt;p&gt;However, I won’t know the user name at run time, so I need to parameterize the query. I am going to use simple string replacement for this. So, I just need to tweak the query a bit:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;from(bucket: "plantbuddy")
   |&amp;gt; range(start: -48h)
   |&amp;gt; filter(fn: (r) =&amp;gt; r.user == "{}")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Additionally, I am going to want to read this query at run time, so I added a file called “graph.flux” and saved the query in that file.&lt;/p&gt;

&lt;p&gt;Now that I have the query ready, I can use it to fetch the data.&lt;/p&gt;
&lt;h3&gt;Fetch the data from InfluxDB&lt;/h3&gt;
&lt;p&gt;In my index() function in app.py, I start by opening the Flux file, replacing the user name, and then using the query api that I previously instantiated to get back a results set, adding this to my index() function:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;query = open("graph.flux").read().format(user["user_name"])
   result = query_api.query(query, org="f1d35b5f11f06a1d")&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Visualize time series with mpld3 and InfluxDB Cloud&lt;/h3&gt;

&lt;p&gt;For Plant Buddy, I have selected the mpld3 library to create my graphs. I chose this library because it is very easy to use.&lt;/p&gt;

&lt;p&gt;After adding the &lt;a href="http://mpld3.github.io/"&gt;mpld3&lt;/a&gt; library and its dependencies to my environment, I need to import a couple of things:&lt;/p&gt;

&lt;p&gt;&lt;code class="language-python"&gt;import matplotlib.pyplot as plt, mpld3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now I can go ahead and build the graph with the results. To do this, for each measurement (light, humidity, etc.) I need to create a list of values for the x-axis, and a list of values for the y-axis. Of course, the x-axis will be time.&lt;/p&gt;

&lt;p&gt;As mentioned above, the Flux data model returns data in a perfect format for this application. It returns a table for each measurement, so I simply have to loop through the tables, then loop through each record in each table, build the lists, and ask matplotlib to plot them.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;fig = plt.figure()
   for table in result:
       x_vals = []
       y_vals = []
       label = ""
       for record in table:
           y_vals.append(record["_value"])
           x_vals.append(record["_time"])
           label = record["_measurement"]
       plt.plot(x_vals, y_vals, label=label)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I finish off by requesting a legend, converting the graph to html, and passing it to the template to be rendered.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-python"&gt;plt.legend()
   grph = mpld3.fig_to_html(fig)
   return render_template("home.html",
                           user_name = user["user_name"],
                           graph_code = grph)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And now, when I load the index page, you can see that the graph is working.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/index-page-with-the-graph.png" alt="index page with the graph" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;h2&gt;Aggregation and downsampling&lt;/h2&gt;

&lt;h3&gt;Aggregation&lt;/h3&gt;

&lt;p&gt;One thing to note is that it takes matplotlib a fair amount of time to create the graph, and a fair amount of time for the mpld3 to convert it to html. But users don’t really need every point graphed. Therefore, we can speed up the UI by adding a bit of aggregation to the Flux query. Rather than retrieving every point, let’s retrieve just the average for every 10 minutes. I just add an aggregation to the query with the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/aggregatewindow/"&gt;aggregateWindow()&lt;/a&gt; function:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;from(bucket: "plantbuddy")
   |&amp;gt; range(start: -48h)
   |&amp;gt; filter(fn: (r) =&amp;gt; r.user == "rick")
   |&amp;gt; aggregateWindow(every: 10m, fn: mean)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The page loads much faster now, and also the graph looks a bit nicer.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/nicer-graph.png" alt="nicer graph" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;Realistically, for such a simple query and data set, such a query will perform more than adequately for production use cases. However, we can demonstrate how to optimize UI latency further by downsampling.&lt;/p&gt;

&lt;h3&gt;Downsampling&lt;/h3&gt;

&lt;p&gt;&lt;a href="/blog/downsampling-influxdb-v2-0/"&gt;Downsampling&lt;/a&gt; entails calculating lower-resolution data to display from the high-resolution data, and saving that off pre-computed for displaying or further calculations. Other than making for a snappier user experience, it can also save you storage costs, because you can keep your downsampled data in a bucket with a longer retention period than your raw data.&lt;/p&gt;

&lt;p&gt;To accomplish this, I need to:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Create a new downsampling bucket and a new token that can read and write that bucket&lt;/li&gt;
 	&lt;li&gt;Create a downsampling Flux script&lt;/li&gt;
 	&lt;li&gt;Create a task to periodically run that script&lt;/li&gt;
 	&lt;li&gt;Change the flux in graph.flux to query the downsampled bucket&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Create a new bucket and token&lt;/h4&gt;

&lt;p&gt;First I’ll create a new bucket as before and name it “downsampled.”&lt;/p&gt;

&lt;p&gt;&lt;code class="language-bash"&gt;$ influx bucket create -n downsampled&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The output kindly gives me the bucket id:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;ID                      Name            Retention       Shard group duration    Organization ID
c7b43676728de98d        downsampled     720h0m0s        n/a                     f1d35b5f11f06a1d&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, for simplicity, I will create a single token that can read and write both buckets. First, list the buckets to get the bucket ids:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx bucket list
ID                      Name            Retention       Shard group duration    Organization ID
d6ec11a304c652aa        _monitoring     168h0m0s        n/a                     f1d35b5f11f06a1d
fe25b83e9e002181        _tasks          72h0m0s         n/a                     f1d35b5f11f06a1d
c7b43676728de98d        downsampled     720h0m0s        n/a                     f1d35b5f11f06a1d
f8fbced4b964c6a4        plantbuddy      720h0m0s        n/a                     f1d35b5f11f06a1d&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then create the token with read and write permissions for both buckets:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx auth create --write-bucket c7b43676728de98d --write-bucket f8fbced4b964c6a4 --read-bucket c7b43676728de98d --read-bucket f8fbced4b964c6a4
ID                      Description     Token                                                                                           User Name                       User ID                 Permissions
079820bd8b7c1000                        hf356pobXyeoeqpIIt6t-ge7LI-UtcBBElq8Igf1K1wxm5Sv9XK8BleS79co32gCQwQ1voXuwXu1vEZg-sYDRg==        rick+plantbuddy@influxdata.com  078bedcd5c762000        [read:orgs/f1d35b5f11f06a1d/buckets/c7b43676728de98d read:orgs/f1d35b5f11f06a1d/buckets/f8fbced4b964c6a4 write:orgs/f1d35b5f11f06a1d/buckets/c7b43676728de98d write:orgs/f1d35b5f11f06a1d/buckets/f8fbced4b964c6a4]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then grab that new token and replace the old token in my app’s secrets store. Restart and make sure everything is still working.&lt;/p&gt;

&lt;h4&gt;Create a downsampling Flux script&lt;/h4&gt;

&lt;p&gt;A downsampling script has two basic steps. First, do some aggregations, and second, write the aggregated data somewhere.&lt;/p&gt;

&lt;p&gt;I have already figured out the aggregation that I want to do from creating the new graph.flux file. I will add a “downsample.flux” file using the existing aggregation as a starting point. One key difference is that I want to downsample ALL the data, not just the data for a particular user. As such, my aggregation step will skip the filter:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;from(bucket: "plantbuddy")
   |&amp;gt; range(start: -48h)
   |&amp;gt; aggregateWindow(every: 10m, fn: mean)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running this, I can see that this will aggregate all of the data in the bucket.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/aggregate-all-of-the-data-in-the-bucket.png" alt="aggregate all of the data in the bucket" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;Now I just need to add the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/built-in/outputs/to/"&gt;to()&lt;/a&gt; function to write all of the downsampled data to my downsampled bucket:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;from(bucket: "plantbuddy")
   |&amp;gt; range(start: -48h)
   |&amp;gt; aggregateWindow(every: 10m, fn: mean)
   |&amp;gt; to(bucket: "downsampled")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Querying the downsampled bucket, I can see all the data is there.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/Querying-the-downsampled-bucket.png" alt="Querying the downsampled bucket" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;p&gt;I have downsampled all of the existing data, so next I will set up a task so that as data flows in, that will be downsampled as well.&lt;/p&gt;

&lt;h4&gt;Create a downsampling task from the Flux script&lt;/h4&gt;

&lt;p&gt;This is a simple matter of taking my downsample.flux script, and registering it as a task to run every 10 minutes.&lt;/p&gt;

&lt;p&gt;The first step is to change the range to only look back for the last 10 minutes. I don’t want to constantly re-downsample data I’ve already downsampled.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;from(bucket: "plantbuddy")
   |&amp;gt; range(start: -10m)
   |&amp;gt; aggregateWindow(every: 10m, fn: mean)
   |&amp;gt; to(bucket: "downsampled")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, I need to add an &lt;a href="https://docs.influxdata.com/influxdb/cloud/process-data/get-started/#define-task-options"&gt;option&lt;/a&gt; that every task needs. There are different fields available, but all I need is a name and “every” which tells the task system how often to run the Flux.&lt;/p&gt;

&lt;p&gt;Now my full downsample.flux looks like this:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;option task = {
   name: "downsampled",
   every: 10m
}
from(bucket: "plantbuddy")
   |&amp;gt; range(start: -10m)
   |&amp;gt; aggregateWindow(every: 10m, fn: mean)
   |&amp;gt; to(bucket: "downsampled")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I just need to register it with InfluxDB Cloud. This is a simple matter of using task create with the CLI:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx task create -f downsample.flux 
ID                      Name            Organization ID         Organization                    Status  Every   Cron
079824d7a391b000        downsampled     f1d35b5f11f06a1d        rick+plantbuddy@influxdata.com  active  10m&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since the output provides me with the task id, I can use that to keep an eye on the task:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-bash"&gt;$ influx task log list --task-id 079824d7a391b000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And I can see that it ran successfully, along with some other useful information, such as what Flux was actually run.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;RunID                   Time                            Message
079825437a264000        2021-05-28T01:30:00.099978246Z  Started task from script: "option task = {\n    name: \"downsampled\",\n    every: 10m\n}\nfrom(bucket: \"plantbuddy\") \n    |&amp;gt; range(start: -10m)\n    |&amp;gt; aggregateWindow(every: 10m, fn: mean) \n    |&amp;gt; to(bucket: \"downsampled\")"
079825437a264000        2021-05-28T01:30:00.430597345Z  trace_id=0adc0ed1a407fd7a is_sampled=true
079825437a264000        2021-05-28T01:30:00.466570704Z  Completed(success)&lt;/code&gt;&lt;/pre&gt;

&lt;h4&gt;Update the graph query&lt;/h4&gt;

&lt;p&gt;Finally, I can update the query powering the graph to read from the downsampled bucket. This is a much simpler and faster query.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;from(bucket: "downsampled")
   |&amp;gt; range(start: -48h)
   |&amp;gt; filter(fn: (r) =&amp;gt; r.user == "{}" )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Restart everything, and it’s all working fast!&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/legacy-uploads/after-restart.png" alt="graph after restart" width="1000" height="625" /&gt;&lt;/p&gt;

&lt;h2&gt;Notifications&lt;/h2&gt;

&lt;p&gt;The final feature we will add is the ability for plantbuddy.com to notify users if their soil gets too dry. With InfluxDB you can use tasks to create status checks and notification rules that will send a message to your application under whatever conditions you define. You should not add polling logic to your application to read a status; you should let InfluxDB do this for you!&lt;/p&gt;

&lt;p&gt;While there are simpler ways to implement this functionality using a single task, for example by doing a simple query and then just using &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/http/post/"&gt;http.post()&lt;/a&gt; directly in the same task, plantbuddy.com will take full advantage of the whole &lt;a href="/blog/influxdbs-checks-and-notifications-system/"&gt;Checks and Notifications System&lt;/a&gt;. The extra benefits particularly include records of the statuses and notifications stored in the _monitoring bucket. Additionally, since this data is stored in the standard format, tools such as dashboards and queries can be easily shared.&lt;/p&gt;

&lt;h3&gt;Threshold Check&lt;/h3&gt;

&lt;p&gt;The first kind of task I want to set up is a Threshold Check. This is a task that:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Queries the data for values of interest&lt;/li&gt;
 	&lt;li&gt;Checks if those values exceed certain thresholds&lt;/li&gt;
 	&lt;li&gt;Uses the Flux monitor library to write time series data to the _monitoring bucket&lt;/li&gt;
 	&lt;li&gt;Create a task from the Flux script and start it running&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;Query for the value you want to check&lt;/h4&gt;

&lt;p&gt;Plant Buddy will be able to provide a notification to any user if their soil moisture level becomes too dry. I will create the related Flux query (soon to be a task) in a new file, check.flux. A simple query can look back in time and find all (and only) the soil moisture levels:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;data = from(bucket: "plantbuddy")
   |&amp;gt; range(start: -10m)
   |&amp;gt; filter(fn: (r) =&amp;gt; r._measurement == "soil_moisture")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice I am querying the raw data in the plantbuddy bucket, rather than the downsampled data.&lt;/p&gt;

&lt;h4&gt;Define thresholds for the Value&lt;/h4&gt;

&lt;p&gt;Next I need to determine the different thresholds to watch for. The Checks system will use those thresholds to set the status for each row of data. There are 4 possible levels for a Threshold Check:&lt;/p&gt;
&lt;ul&gt;
 	&lt;li&gt;ok&lt;/li&gt;
 	&lt;li&gt;Info&lt;/li&gt;
 	&lt;li&gt;warn&lt;/li&gt;
 	&lt;li&gt;crit&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Set the threshold by creating a predicate function for each that returns a bool defining whether the value is within that threshold. You do not need to use all the levels in your threshold check. For Plant Buddy, soil moisture can be either “ok” or “crit” so I just define these two functions:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-javascript"&gt;ok = (r) =&amp;gt; r.reading &amp;gt; 35
crit = (r) =&amp;gt; r.reading &amp;lt;= 35&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Basically, if the soil moisture reading falls to 35 or below, it’s critical.&lt;/p&gt;

&lt;p&gt;As mentioned above, the  resulting status will be written to the _monitoring bucket. You also need to supply a predicate function to create a message that gets logged. This message can be complex or simple. I am keeping it simple:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-javascript"&gt;messageFn = (r) =&amp;gt; "soil moisture at ${string(v:r.reading)} for ${r.user}"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These predicate functions are parameters for the &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/monitor/check/"&gt;monitor.check()&lt;/a&gt; function which generates the statuses and writes them to the _monitoring bucket, which we’ll use below.&lt;/p&gt;

&lt;h4&gt;Generate the status for each record with the monitoring.check() function&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/monitor/"&gt;monitoring package&lt;/a&gt; has a function called &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/monitor/check/"&gt;monitoring.check()&lt;/a&gt; that now does the work of calculating and recording the statuses for you. It will go through each record in the data that was returned, calculate the status level (ok or crit in this case), calculate the message string, and then record that in the _monitoring bucket for you along with some other information.&lt;/p&gt;

&lt;p&gt;First, though, we need to supply some metadata to monitor.check. An id, name, type, and a list of tags is required. Tags are useful when you have multiple notification rules. Since currently, I am only planning one, I will leave the tags object empty.&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;check = {_check_id: "check1xxxxxxxxxx",
       _check_name: "soil moisture check",
       _type: "threshold",
       tags: {}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now my script can go ahead and call monitor.check. A couple of things to note:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Pipe forwarding through &lt;a href="https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/influxdb-schema/fieldsascols/"&gt;schema.fieldsAsCols()&lt;/a&gt; function is required. That is the shape that monitor.check function expects. It also helps make some checks easier to write because you can more easily write expressions that combine values from different fields.&lt;/li&gt;
 	&lt;li&gt;It's a little confusing because the check meta
&lt;/li&gt;&lt;/ol&gt;
</description>
      <pubDate>Fri, 02 Jul 2021 02:00:54 -0700</pubDate>
      <link>https://www.influxdata.com/blog/building-an-iot-app-with-influxdb-cloud-python-and-flask-part-3/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/building-an-iot-app-with-influxdb-cloud-python-and-flask-part-3/</guid>
      <category>Product</category>
      <category>Use Cases</category>
      <category>Developer</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
    <item>
      <title>How to Consolidate OSS Data into a Cloud Account</title>
      <description>&lt;p&gt;In this post, we will describe a simple way to share data from multiple InfluxDB 2.0 OSS instances with a central cloud account. This is something that community members have asked for when they have OSS running at different locations, but then they want to be able to visualize some of the data or even alert on the data in a central place.&lt;/p&gt;

&lt;p&gt;&lt;img class="aligncenter size-full wp-image-256134" src="/images/legacy-uploads/share-data-from-multiple-instances.png" alt="Cloud Account" width="405" height="333" /&gt;&lt;/p&gt;

&lt;p&gt;Please note that while the method presented here is simple and fast to set up, it has many limitations which may make it inappropriate for your product use case. This will be discussed at the end of the post.&lt;/p&gt;

&lt;p&gt;This method uses a combination of features available in InfluxDB today; namely, &lt;a href="https://docs.influxdata.com/influxdb/cloud/process-data/get-started/"&gt;tasks&lt;/a&gt;, &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/"&gt;line protocol&lt;/a&gt;, and the Flux &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/http/post/"&gt;http.post()&lt;/a&gt; method.&lt;/p&gt;
&lt;h2&gt;An InfluxDB IoT scenario&lt;/h2&gt;
&lt;p&gt;To demonstrate this functionality I created a pretend manufacturing plant where I am running an InfluxDB 2.0 OSS instance. I call this “Plant 001.” This site has two different kinds of sensors throughout the plant. These are differentiated with the tag “s_type”, and each sensor is either s_type=1 or s_type=2. The plant has 50 of each of these sensors, which report in every 5 seconds. The sensors each have a type code of either 1 or 2.&lt;/p&gt;

&lt;p&gt;Here is some example line protocol for the sensor data:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;sensors,s_type=1,s_id=s18 s_reading=40
sensors,s_type=2,s_id=s28 s_reading=91
sensors,s_type=1,s_id=s19 s_reading=36
sensors,s_type=2,s_id=s29 s_reading=99
sensors,s_type=1,s_id=s110 s_reading=33
sensors,s_type=2,s_id=s210 s_reading=71
sensors,s_type=1,s_id=s111 s_reading=37
sensors,s_type=2,s_id=s211 s_reading=67
sensors,s_type=1,s_id=s112 s_reading=45
sensors,s_type=2,s_id=s212 s_reading=75
sensors,s_type=1,s_id=s113 s_reading=31
sensors,s_type=2,s_id=s213 s_reading=61&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The users in Plant 001 can view the status of these sensors within their network, typically with the following dashboard.&lt;/p&gt;

&lt;p&gt;&lt;img class="aligncenter size-full wp-image-256140" src="/images/legacy-uploads/Plant-001-Readings-2-1.png" alt="Plant 001 Readings" width="1920" height="1200" /&gt;&lt;/p&gt;
&lt;h2&gt;Your InfluxDB Cloud account&lt;/h2&gt;
&lt;p&gt;But what if I am going to have many plants running in the same way? What if I want to be able to help monitor the plant, but the OSS instance is not running in an accessible place? To solve this, first, you will need an InfluxDB Cloud account.&lt;/p&gt;

&lt;p&gt;A free tier account will work just fine for this. You can easily sign up for an account &lt;a href="https://cloud2.influxdata.com/signup"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Therefore, you can go ahead and &lt;a href="https://docs.influxdata.com/influxdb/cloud/organizations/buckets/create-bucket/"&gt;create a bucket&lt;/a&gt;. Because you will be passing the bucket name as part of a url, I suggest not including any special characters in the bucket name to avoid issues with url encoding. For my example, I chose to name the bucket simply “remote.”&lt;/p&gt;

&lt;p&gt;After creating your bucket, go ahead and &lt;a href="https://docs.influxdata.com/influxdb/cloud/security/tokens/create-token/"&gt;create a write token&lt;/a&gt; for the bucket. Your cloud account is now set up to start collecting data from your OSS instance.&lt;/p&gt;
&lt;h2&gt;Create the task in your InfluxDB OSS instance&lt;/h2&gt;
&lt;p&gt;First thing you want to do is to store the write token for your cloud account bucket as a &lt;a href="https://docs.influxdata.com/influxdb/v2.0/security/secrets/"&gt;secret&lt;/a&gt; in your OSS instance. There is no UI for this, but it is simple enough to do with the CLI. Assuming that the &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/cli/influx/"&gt;influx CLI is configured&lt;/a&gt; to point to your OSS instance, you can use a command like this:&lt;/p&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-bash"&gt;$ influx secret update -k remote-token -v LN1lYeE3j0we0dji_E027UyOUrmi1vLJK2xz-N3z8cDzxqiqDjTdV3xrUAjsBLQ6AbNZf67Nxsu3pvBtg3tsrg==
Key		Organization ID
remote-token	6994b3b5a01a431c&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we are ready to write from our OSS instance. To do this, we just need to:&lt;/p&gt;
&lt;ol&gt;
 	&lt;li&gt;Pull in the secret from the secrets store.&lt;/li&gt;
 	&lt;li&gt;Create the url string for the API.&lt;/li&gt;
 	&lt;li&gt;Select and aggregate the data you want to send.&lt;/li&gt;
 	&lt;li&gt;Call &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/http/post/"&gt;http.post()&lt;/a&gt; for each row using &lt;a href="https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/built-in/transformations/map/"&gt;map()&lt;/a&gt; to send line protocol to your InfluxDB Cloud account.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="line-numbers"&gt;&lt;code class="language-javascript"&gt;import "http"
import "influxdata/influxdb/secrets"
 
token = "Token ${secrets.get(key: "remote-token")}"
url = "https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/write?orgID=27b1f32678fe4738&amp;amp;bucket=remote"
 
from(bucket: "readings")
|&amp;gt; range(start: -5m)
|&amp;gt; filter(fn: (r) =&amp;gt;
(r["_measurement"] == "sensors"))
|&amp;gt; mean()
|&amp;gt; map(fn: (r) =&amp;gt;
({r with http_code: http.post(url: url, headers: {"Authorization": token}, data: bytes(v: "sensors,plant=p001,s_id=${r.s_id} m_reading=${string(v: r._value)}"))}))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that the data is flowing in, I can go ahead and make a dashboard in my InfluxDB Cloud account so that I can keep an eye on the remote instances.&lt;/p&gt;

&lt;p&gt;&lt;img class="alignnone size-full wp-image-256369" src="/images/legacy-uploads/plant-buddy-remote-plant-monitoring-screenshot.png" alt="Plant Buddy - remote plant monitoring screenshot" width="1364" height="1054" /&gt;&lt;/p&gt;

&lt;p&gt;Of course, I can go ahead and repeat this process for more remote instances and thereby create a consolidated view.&lt;/p&gt;
&lt;h2&gt;Limitations&lt;/h2&gt;
&lt;p&gt;While this method has the advantage of being easy to set up and not requiring any additional software or integrations to make it work, it has some limitations that may make it inappropriate for your production setup.&lt;/p&gt;
&lt;h3&gt;Very small amounts of data&lt;/h3&gt;
&lt;p&gt;This method is only able to send a single line of line protocol at a time over the API. In my example, that means that every five minutes, it makes 100 separate calls to the write endpoint to write the data to the cloud account. For a meaningful number of points, this will become gothically slow. There is no batching or any other optimizations. As you can imagine, that means for this to work, the data must be aggressively downsampled.&lt;/p&gt;
&lt;h3&gt;Retry Logic&lt;/h3&gt;
&lt;p&gt;A write call can always fail for many reasons, such as the network in which the OSS instance is running becomes disconnected from the internet, etc…In this example, those failed writes are simply ignored. While there would be some techniques you could try to overcome this lack of resiliency, most likely if this level of availability is a concern, you will likely want to explore other options for such integration.&lt;/p&gt;
&lt;h2&gt;Next steps&lt;/h2&gt;
&lt;p&gt;As mentioned above, this technique is simple but has some significant limitations. I may follow up with more details about addressing some of these limitations.&lt;/p&gt;
&lt;h3&gt;Checks and notifications&lt;/h3&gt;
&lt;p&gt;The example presented here shows how to visualize data from a remote OSS instance, but it is more likely that you will be interested in &lt;a href="https://www.influxdata.com/blog/influxdbs-checks-and-notifications-system/"&gt;alerting&lt;/a&gt; based on data collected in those remote instances. This is possible by creating a check in your task that writes to your cloud account’s _monitoring bucket.&lt;/p&gt;
&lt;h3&gt;Telegraf&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.influxdata.com/time-series-platform/telegraf/"&gt;Telegraf&lt;/a&gt; has many desirable features built in that can improve the resiliency of this system. By configuring a Telegraf instance that proxies your cloud account, you can improve the resiliency of this setup without significant code changes.&lt;/p&gt;
</description>
      <pubDate>Mon, 24 May 2021 04:00:05 -0700</pubDate>
      <link>https://www.influxdata.com/blog/how-to-consolidate-oss-data-into-cloud-account/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/how-to-consolidate-oss-data-into-cloud-account/</guid>
      <category>Product</category>
      <category>Use Cases</category>
      <category>Developer</category>
      <author>Rick Spencer (InfluxData)</author>
    </item>
  </channel>
</rss>
