<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>InfluxData Blog - Susannah Brodnitz</title>
    <description>Posts by Susannah Brodnitz on the InfluxData Blog</description>
    <link>https://www.influxdata.com/blog/author/susannah-brodnitz/</link>
    <language>en-us</language>
    <lastBuildDate>Wed, 14 Dec 2022 07:00:00 +0000</lastBuildDate>
    <pubDate>Wed, 14 Dec 2022 07:00:00 +0000</pubDate>
    <ttl>1800</ttl>
    <item>
      <title>The Immutability of Time Series Data</title>
      <description>&lt;p&gt;&lt;em&gt;This article was originally published in &lt;a href="https://thenewstack.io/the-immutability-of-time-series-data/"&gt;The New Stack&lt;/a&gt; and is reposted here with permission.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time series data often comes in large volumes that need to be handled carefully to produce insights in near real time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’re constantly moving through time. The time it took you to read this sentence is now forever in the past, unchangeable. This leads to something unique about data with a time dimension: It can only go in one direction.  &lt;a href="https://www.influxdata.com/what-is-time-series-data/"&gt;Time series data&lt;/a&gt;  is different from other data for many reasons. It often comes in large volumes that need to be handled carefully to produce insights in near real time. This blog post focuses on the unchangeable, immutable nature of time series data.&lt;/p&gt;

&lt;h2 id="the-past-is-the-past"&gt;The past is the past&lt;/h2&gt;

&lt;p&gt;In our world, time is immutable, which means once it’s in the past, it can’t be changed, like an immutable object in programming. In a perfect world, data reflects that; you can’t change time series data any more than you can rewind the clock. Data should reflect reality, but sometimes bad data points get written into a database. Bad data points don’t reflect reality either, and it makes sense in this case to delete those points.&lt;/p&gt;

&lt;p&gt;Deletion needs to be handled with care.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;You need a database that can delete points without shifting other points around.&lt;/li&gt;
  &lt;li&gt;You might also need to edit the historical record by adding late-arriving points.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There’s a balance to strike so you’re not constantly rewriting the past in such a way that your data loses meaning, but you can still make necessary changes that enhance the context presented by time. When deciding if it makes sense to make an edit, consider whether the edit brings the data closer to reflecting reality or further from it.&lt;/p&gt;

&lt;h2 id="time-keeps-moving"&gt;Time keeps moving&lt;/h2&gt;

&lt;p&gt;The other thing about time is that it never stops. The present is always continuously moving forward. Because time is always moving, time series data updates continuously. When you think of a database, you might think of a place for storage where you write data and later read that data without changing it very often. A  &lt;a href="https://www.influxdata.com/time-series-database/"&gt;time series database&lt;/a&gt;  is constantly being changed and updated because time is always moving and changing.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/3IHPspiuXpq3Gvcq3quY8a/741c0837305785c88cef8d154e898fd7/The-Immutability-of-Time-Series-Data-OG.jpg" alt="The-Immutability-of-Time-Series-Data" /&gt;&lt;/p&gt;

&lt;p&gt;You can’t collect data with the infinite precision of reality, but you can choose the level of precision that makes sense for your application. For example, averages are one of the most common and useful calculations. If you’re working with data that isn’t a time series, you might average the number of people per square mile in a state. With time series data, you might average the number of people entering a building every hour. The difference here is that at each moment, the start and end of the last hour changes. Here’s some example code for this sort of calculation:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-javascript"&gt;from(bucket: "sample")
    |&amp;gt; range(start: 2022-01-01, stop: 2022-01-31)
    |&amp;gt; filter(fn: (r) =&amp;gt; r["_measurement"] == "foot_traffic")
    |&amp;gt; aggregateWindow(column: "number_of_people",every: 1h, fn: mean, createEmpty: false)
    |&amp;gt; yield(name: "running mean")&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When you take a  &lt;a href="https://docs.influxdata.com/flux/v0.x/stdlib/universe/movingaverage/"&gt;moving average&lt;/a&gt;, you calculate a new average at specified intervals so you can see how your calculation changes over time, resulting in a new time series. You need to consider your data set to know what sort of interval makes sense. If you choose too broad an interval, you lose information and context, but if you choose one that is too precise, you’ll have windows without any data points, and your results will drop to zero in a way that doesn’t make sense and isn’t helpful.&lt;/p&gt;

&lt;h2 id="the-context-of-time"&gt;The context of time&lt;/h2&gt;

&lt;p&gt;No matter how close to real time your data architecture is, there will be some infinitesimal amount of lag between when your data is collected and when it lands in a database, ready to be queried. If you have automated queries or processing set up, this can skew your results. For example, if you calculate the mean of a metric from the last five minutes, the data that arrived at the database in the last five minutes might not include the full set of measurements that were taken at the edge in the last five minutes.  &lt;a href="https://www.influxdata.com/the-best-way-to-store-collect-analyze-time-series-data/"&gt;InfluxDB&lt;/a&gt;  lets you handle this with  &lt;a href="https://youtu.be/SSVz6GtKd7U"&gt;task offsets&lt;/a&gt;.&lt;/p&gt;

&lt;div class="youtube-container"&gt;
  &lt;iframe class="responsive-iframe" src="https://www.youtube.com/embed/SSVz6GtKd7U" title="Flux Task Offsets in InfluxDB"&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;You can schedule tasks to run calculations like this while including some extra buffer time to allow all data to arrive in the database first. This is important to preserve the full context of when each point was collected.  &lt;a href="https://www.influxdata.com/time-series-platform/telegraf/"&gt;Telegraf&lt;/a&gt;, InfluxData’s open source data-collection agent, also allows for offsets.&lt;/p&gt;

&lt;p&gt;There are many reasons to downsample data. Sometimes you don’t have enough storage space for the full raw data set. Sometimes an averaged signal cuts through the noise and gives you more valuable information. When you take an average, some information is lost and some new information is added.  &lt;a href="https://youtu.be/ZZ7KfVVUE44"&gt;Averages aren’t the only way of downsampling either&lt;/a&gt;. Sometimes instead of maintaining the shape of the data, it might make more sense to count the number of times a metric goes above a set threshold.&lt;/p&gt;

&lt;p&gt;Whatever downsampling method you use, everything you do should be intentional so you don’t lose data you realize later is important. If you aren’t careful while downsampling and aren’t handling all your timestamps properly, downsampling can skew your time series. InfluxDB is built to handle downsampling using a variety of tools and processes, and it creates multiple backup copies of your data in InfluxDB Cloud, so you don’t accidently delete a point you need. In order to keep as much context as possible InfluxDB also supports nanosecond precision. Here’s some example code for  &lt;a href="https://docs.influxdata.com/influxdb/cloud/process-data/common-tasks/downsample-data/"&gt;downsampling in Flux&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-javascript"&gt;from(bucket: "sample")
    |&amp;gt; range(start: 2022-01-01, stop: 2022-01-31)
    |&amp;gt; filter(fn: (r) =&amp;gt; r["_measurement"] == "foot_traffic")
    |&amp;gt; aggregateWindow(column: "number_of_people",every: 1h, fn: mean, createEmpty: false)
    |&amp;gt; to(bucket: "sample-downsampled")&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course, time isn’t the only important context to consider, and time series data isn’t the only important kind of data. Information like customer details, location or the version of a machine being used aren’t time series data, but they are important to record. Fortunately, InfluxDB allows you to join these other types of data with time series data to produce deeper insights into your systems and processes.&lt;/p&gt;

&lt;p&gt;Time is one of the fundamental building blocks of our reality, and understanding its nature helps you better understand the world and get more useful information out of your data.&lt;/p&gt;
</description>
      <pubDate>Wed, 14 Dec 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/immutability-time-series-data/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/immutability-time-series-data/</guid>
      <category>Product</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>How Prescient Devices Uses Time Series Data for IoT Automation</title>
      <description>&lt;p&gt;&lt;em&gt;This article was originally published in &lt;a href="https://thenewstack.io/how-prescient-devices-uses-time-series-data-for-iot-automation/"&gt;The New Stack&lt;/a&gt; and is reposted here with permission.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Companies need to consider both how fast they can put edge applications into action and update them, and how quickly they can process incoming data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Industrial processes are becoming  &lt;a href="https://www.influxdata.com/what-is-industry-4-0/"&gt;increasingly automated&lt;/a&gt;  as sensors on machines collect a growing amount of data. Much of this data is  &lt;a href="https://www.influxdata.com/what-is-time-series-data/"&gt;time-stamped&lt;/a&gt;  and can help companies improve processes. This large volume of  &lt;a href="https://www.influxdata.com/sensor-data-is-time-series-data/"&gt;sensor data&lt;/a&gt;  can become unwieldy if companies don’t manage it properly.&lt;/p&gt;

&lt;p&gt;Companies need to be able to handle the life cycle of time series data, from real-time analysis to downsampling of historical data. Many companies use purpose-built data management tools and products to help with this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.prescientdevices.com/"&gt;Prescient Devices&lt;/a&gt;  built an industrial Internet of Things (&lt;a href="https://thenewstack.io/managing-time-series-data-in-industrial-iot/"&gt;IoT&lt;/a&gt;) data management platform, powered by InfluxDB, that helps companies build applications to manage edge devices quickly and easily.&lt;/p&gt;

&lt;h2 id="connecting-edge-and-cloud"&gt;Connecting edge and cloud&lt;/h2&gt;

&lt;p&gt;Prescient Devices created Prescient Designer, a SaaS platform to help businesses manage distributed edge devices. It has an agent that users install on edge devices and an interface where they build and implement applications. InfluxDB is the backbone of Prescient Designer and handles time-stamped data.&lt;/p&gt;

&lt;p&gt;Prescient Devices built the platform using Node-RED, an open source project that has a low-code visual interface. Prescient Designer also uses InfluxDB Cloud, TensorFlow, and Grafana and can integrate with other tools that companies require.&lt;/p&gt;

&lt;p&gt;Prescient Edge is Prescient Devices’ edge management software. It uses  &lt;a href="https://thenewstack.io/node-red-hits-1-0-looks-to-a-serverless-microservices-future/"&gt;Node-RED&lt;/a&gt;, InfluxDB OSS, TensorFlow, and a runtime system. Prescient Edge deployments can include applications built by users in Prescient Designer. Data from edge devices come in any number of formats. Prescient Devices enables users to collect a wide variety of sensor data, such as temperature, humidity, and acceleration, as well as data from cameras, APIs, and industrial equipment.&lt;/p&gt;

&lt;p&gt;Data come into Prescient Designer in two separate streams, one for user data and one for application management and deployment data. Prescient Devices includes a broker to connect Prescient Designer and devices running Prescient Edge.&lt;/p&gt;

&lt;p&gt;Their system also supports custom brokers if users need support for compliance requirements like HIPAA or GDPR. For data visualization, Prescient Designer leverages Grafana dashboards to provide users insight into their applications in real time.&lt;/p&gt;

&lt;h2 id="low-code-framework"&gt;Low-code framework&lt;/h2&gt;

&lt;p&gt;Prescient Designer is based on Node-RED and lets users create applications using minimal code. This creates a lower barrier to entry and makes it easier for companies to develop applications and solutions quickly.&lt;/p&gt;

&lt;p&gt;This approach also enables companies to include subject-matter experts who aren’t developers more directly in building applications. So, key stakeholders like data engineers, system integrators, and other innovators can all use Prescient Devices to build better edge-to-cloud data solutions, without having to be experienced coders. Prescient Designer’s visual workspace also lets users get a view of their whole system in one place.&lt;/p&gt;

&lt;h2 id="saving-time"&gt;Saving time&lt;/h2&gt;

&lt;p&gt;The two key issues with data that Prescient Devices helps solve are scale and speed. IoT environments have huge volumes of data coming from hundreds of devices that each process a significant amount of data. Companies need to consider both how fast they can put edge applications into action and update them and how quickly they can process incoming data.&lt;/p&gt;

&lt;p&gt;Prescient Devices initially used  &lt;a href="https://www.influxdata.com/time-series-platform/telegraf/"&gt;Telegraf&lt;/a&gt;, InfluxData’s open source plug-in–driven agent for collecting metrics, to collect and process data from edge devices. They ultimately chose to use InfluxDB Cloud as their  &lt;a href="https://www.influxdata.com/the-best-way-to-store-collect-analyze-time-series-data/"&gt;time series database&lt;/a&gt;  because it supports Node-RED and runs well on edge devices with limited resources. Many of Prescient Devices’s customers already used InfluxDB and specifically requested InfluxDB support in the platform.&lt;/p&gt;

&lt;p&gt;Prescient Designer lets IoT companies create distributed systems quickly. Users can build applications within weeks and update and deploy changes to those applications within hours to seconds. Because users can build applications within a visual interface, experts can build applications even if they aren’t developers. This lets IoT manufacturers automate their systems quickly and get better insights from their data.&lt;/p&gt;
</description>
      <pubDate>Wed, 07 Dec 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/how-prescient-devices-uses-time-series-data-iot-automation/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/how-prescient-devices-uses-time-series-data-iot-automation/</guid>
      <category>Developer</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>TL;DR Python, Pandas Dataframes, and InfluxDB</title>
      <description>&lt;p&gt;InfluxDB has over a dozen &lt;a href="https://www.influxdata.com/products/data-collection/influxdb-client-libraries/"&gt;client libraries&lt;/a&gt; so developers can get started more easily and program in the language they’re most comfortable with. One of our most popular options is the &lt;a href="https://www.influxdata.com/integration/python-client-library/"&gt;Python client library&lt;/a&gt;. InfluxDB supports not just Python but pandas, a tool popular with data scientists for analyzing and manipulating data. You can use the client library to output data from InfluxDB into a DataFrame format pandas can ingest, and you can write pandas DataFrames directly to InfluxDB. This makes it simple to incorporate InfluxDB into your data science applications and take advantage of real-time monitoring and alerting.&lt;/p&gt;

&lt;h2 id="querying-data-from-influxdb-to-a-pandas-dataframe"&gt;Querying data from InfluxDB to a Pandas Dataframe&lt;/h2&gt;

&lt;p&gt;To query and return a pandas DataFrame in InfluxDB you first need to install the Python client library. To set it up you need to download that library into your Python project and set your credentials, including your URL, token, and the name of the bucket you want to send data to and read data from. Once you’ve set up your credentials you can query data.&lt;/p&gt;

&lt;p&gt;Here’s an example query:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;query= ‘’’
from (bucket: “system”)
|&amp;gt; range(start: -5m, stop: now())
|&amp;gt; filter(fn: (r) =&amp;gt; r._measurement = “cpu”)
|&amp;gt; pivot(rowKey: [“_time”], ColumnKey: [“tag”], ValueColumn: “value”)
|&amp;gt; keep(columns: [“_time”, “usage_user”, “cpu”])
‘’’
client = InfluxDBClient(url, token, org, debug = false)
system_stats = client.query.api().query_data_frame()
display(system_stats.head())&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This query is written in Flux, InfluxDB’s scripting and query language. It asks for data from the bucket “system” from the past five minutes and filters on the measurement “cpu.” The two most important functions to keep in mind when you’re working with InfluxDB and pandas are &lt;code class="language-python"&gt;pivot&lt;/code&gt; and &lt;code class="language-python"&gt;keep&lt;/code&gt;. &lt;code class="language-python"&gt;Pivot&lt;/code&gt; changes the data format to a column. Pandas dataframes are two-dimensional tables so pandas functions and tools expect data to be in that format and you’ll need to pivot the data you query from InfluxDB. &lt;code class="language-python"&gt;Keep&lt;/code&gt; tells the query what data to keep in this column and gets rid of any other columns you don’t want to include in the output pandas DataFrame.&lt;/p&gt;

&lt;p&gt;After you’ve written your query, you need to call the InfluxDB client with your credentials and use the query API with the &lt;code class="language-python"&gt;query_data_frame()&lt;/code&gt; function to get the DataFrame format. Then you can display the data inside of the pandas DataFrame with &lt;code class="language-python"&gt;.head&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id="writing-data-from-a-pandas-dataframe-to-influxdb"&gt;Writing data from a Pandas Dataframe to InfluxDB&lt;/h2&gt;

&lt;p&gt;You can also write data from a pandas dataframe into InfluxDB. Here’s some example code that does that:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-python"&gt;system_stats.drop(columns = [‘result’, ‘table’, ‘start’, ‘stop’])
system_stats.set_index(“_time”)
_write_client.write(bucket.name, record = system_stats)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Any data that you don’t remove using the &lt;code class="language-python"&gt;drop&lt;/code&gt; function from your pandas DataFrame becomes a field when you write the data to InfluxDB. In this example query we’ve removed columns we don’t want to send as fields to InfluxDB because we want to make them tags. You also need to set your index, which will be your timestamp in InfluxDB. Then you need to use the Python client library &lt;code class="language-python"&gt;write&lt;/code&gt; function to write the data to InfluxDB. It takes in a bucket name and the record you want to write to that bucket. You also need to set up credentials for the client, as in the first example.&lt;/p&gt;

&lt;p&gt;One thing to consider is the timestamp. Pandas comes with a datetime format. You can use that or you can use one of the following time formats:&lt;/p&gt;

&lt;p&gt;“2018-10-26”, “2018-10-26 12:00”, or “2018-10-26 12:00:00-05:00”&lt;/p&gt;

&lt;p&gt;Flux is a powerful analytics language, but you might want to use pandas because it’s very popular within the Python community. It has a lot of support and many people are already familiar with it and comfortable working with DataFrames. Combining pandas with the InfluxDB Python client library lets you easily incorporate InfluxDB into your applications to get the benefits of a purpose-built time series database.&lt;/p&gt;

&lt;p&gt;To learn more about using pandas DataFrames with InfluxDB you can watch our &lt;a href="https://www.youtube.com/watch?v=cMkQXLCbFQY"&gt;Meet the Developer video&lt;/a&gt; on this topic.&lt;/p&gt;

&lt;div style="padding:56.25% 0 0 0;position:relative; margin-top: 20px;"&gt;&lt;iframe src="https://player.vimeo.com/video/745895643?h=c715811838&amp;amp;badge=0&amp;amp;autopause=0&amp;amp;player_id=0&amp;amp;app_id=58479" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen="" style="position:absolute;top:0;left:0;width:100%;height:100%;" title="Setup Pandas Dataframes with InfluxDB using Python"&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;script src="https://player.vimeo.com/api/player.js"&gt;&lt;/script&gt;

</description>
      <pubDate>Wed, 09 Nov 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/tldr-python-pandas-dataframes-influxdb/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/tldr-python-pandas-dataframes-influxdb/</guid>
      <category>Product</category>
      <category>Use Cases</category>
      <category>Getting Started</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>Intro to InfluxDB IOx</title>
      <description>&lt;p&gt;&lt;a href="https://www.influxdata.com/blog/influxdata-deploys-next-generation-influxdb-time-series-engine/"&gt;Last week InfluxData announced IOx&lt;/a&gt;, the new time series engine for InfluxDB. We’ve revisited the core of our database to achieve big things with the underlying technology. Users can expect higher performance and more options for querying data. Here’s a quick intro to some of the most exciting things coming with &lt;a href="https://www.influxdata.com/products/influxdb-engine/"&gt;InfluxDB IOx&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id="unbounded-cardinality"&gt;Unbounded cardinality&lt;/h2&gt;

&lt;p&gt;At InfluxDB when we say &lt;a href="https://www.influxdata.com/glossary/cardinality/"&gt;cardinality&lt;/a&gt; we mean the number of unique measurement, tag set, and field key combinations in a bucket. In the past, we discouraged using large numbers of tags or tags that contained unbounded data in InfluxDB because it would slow down performance. This limited use cases, like observability and tracing, that relied on high-cardinality data like logs and traces. InfluxDB IOx removes cardinality limits. Now users can write in data with infinite cardinality and monitor and query their time series data along any dimension they want without impacting performance.&lt;/p&gt;

&lt;h2 id="real-time-analytics"&gt;Real-time analytics&lt;/h2&gt;

&lt;p&gt;InfluxDB IOx is a columnar time series database. Columnar databases represent their data as tables and let you execute queries very quickly and at scale. Basing our new engine on a columnar database solves performance problems in several ways. Per-column compression helps with space on disk. Dictionaries make sure you’re not repeatedly storing strings when you have tag values that apply to many timestamps and run length encoding helps with efficient storage for legitimate repeated values. Vectorized execution lets you organize data so that the CPU can run through it very quickly. And queries are parallelized to run faster.&lt;/p&gt;

&lt;p&gt;One of the most important parts of the new engine is partitioning. With IOx, time series data is broken into partitions based on time (such as by days) and also by tags (such as regions). This lets you easily filter out sections of data that don’t fall into a query’s time range or other specification. Queries operate on all rows from included partitions one-by-one, and because of vectorized execution, IOx lets you query around 1 billion rows per second per core. This partitioning and incredibly fast query response means you can have large numbers of tags and still get real-time results to build dashboards and monitoring and alerting systems.&lt;/p&gt;

&lt;h2 id="sql-support"&gt;SQL support&lt;/h2&gt;

&lt;p&gt;InfluxDB IOx was written in the Rust programming language and uses &lt;a href="https://www.influxdata.com/glossary/apache-parquet/"&gt;Apache Parquet&lt;/a&gt; files for on-disk storage and &lt;a href="https://www.influxdata.com/glossary/apache-arrow/"&gt;Apache Arrow&lt;/a&gt; for operations between components. Apache Arrow is an in-memory specification for columnar data that makes analytical queries very fast. IOx also uses the &lt;a href="https://github.com/apache/arrow-datafusion"&gt;DataFusion&lt;/a&gt; library, a native SQL query engine, as its parser, planner, optimizer, and execution engine. This means that for the first time InfluxDB supports the PostgreSQL dialect and wire protocol allowing you to connect to third-party libraries and BI tools like PSQL, Grafana, Tableau and Apache Superset. Compatibility is a key focus at InfluxData and this new engine supports many querying options. In addition to the new SQL support, InfluxDB IOx continues to support versions of the API and our own query and scripting languages. In the API layer you can communicate with the new engine using the 1.x or the 2.0 API, and you can query the database using Flux, InfluxQL, SQL, or any of the 12+ client libraries we offer.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/3mHkG61KEim64x0XLWpARR/9164eb4065c1a1f00f792be310d3b7d2/InfluxDB-powered-by-IOx.png" alt="InfluxDB-powered-by-IOx" /&gt;&lt;/p&gt;

&lt;h2 id="sign-up-for-the-influxdb-iox-beta-program"&gt;Sign up for the InfluxDB IOx Beta program&lt;/h2&gt;

&lt;p&gt;You can sign up to be a part of the &lt;a href="https://www.influxdata.com/influxdb-engine-beta/"&gt;InfluxDB IOx Beta program here&lt;/a&gt;. You’ll get the SQL compatibility, unbounded cardinality, and faster performance within InfluxDB Cloud. If you use OSS or InfluxDB Enterprise you can expect builds next year with the new engine, these compatibility layers, and migration tooling.&lt;/p&gt;
</description>
      <pubDate>Thu, 03 Nov 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/intro-influxdb-iox/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/intro-influxdb-iox/</guid>
      <category>Use Cases</category>
      <category>Product</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>Getting Started with Python and Geo-Temporal Analysis</title>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;This article was originally published in &lt;a href="https://thenewstack.io/getting-started-with-python-and-geo-temporal-analysis/"&gt;The New Stack&lt;/a&gt; and is reposted here with permission.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Working with geo-temporal data can be difficult. In addition to the challenges often associated with &lt;a href="https://www.influxdata.com/time-series-analysis-methods/"&gt;time-series analysis&lt;/a&gt;, like large volumes of data that you want real-time access to, working with latitude and longitude often involves trigonometry because you have to account for the curvature of the Earth. That’s computationally expensive. It can drive costs up and slow down programs. Fortunately, &lt;a href="https://docs.influxdata.com/influxdb/cloud/query-data/flux/geo/"&gt;InfluxDB’s geo-temporal package&lt;/a&gt; is designed to handle these problems.&lt;/p&gt;

&lt;p&gt;The package does this with &lt;a href="https://s2geometry.io"&gt;S2 geometry&lt;/a&gt;. The S2 system divides the Earth into cells to help computers calculate locations faster. Unlike a lot of other projections, it’s based on a sphere instead of a flat surface, so there are no gaps or overlapping areas. You can choose different levels to vary the size of each cell. With this system, a computer can check how many cells away two points are to get an estimate of their distance.&lt;/p&gt;

&lt;p&gt;In a lot of cases, the estimates you can get from S2 calculations are precise enough and much faster for computers than trigonometry. In cases where you need exact answers, using S2 geometry can still speed things up because computers can get rough estimates first and then do only the expensive calculations that are truly needed.&lt;/p&gt;

&lt;h2 id="getting-started-with-an-example"&gt;Getting started with an example&lt;/h2&gt;
&lt;p&gt;For this example, we’re going to calculate the average surface temperature of the ocean over a specified region and at different time windows, as well as the standard deviation. We’re going to be working in InfluxDB’s Python client library in a Jupyter notebook. &lt;a href="https://www.influxdata.com/blog/getting-started-with-python-and-influxdb-v2-0/"&gt;Here are&lt;/a&gt; &lt;a href="https://www.influxdata.com/blog/streaming-time-series-with-jupyter-and-influxdb/"&gt;some other&lt;/a&gt; blog posts on these topics.&lt;/p&gt;

&lt;p&gt;I used Jupyter notebooks through Anaconda, so the first step for me was to install InfluxDB in Anaconda by typing the following in the command window.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;conda install -c conda-forge influxdb&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, as prompted, I had to update anaconda with&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;conda update -n base -c defaults conda&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The file used in this example is in NetCDF format, so I also had to install the following to read it&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;conda install -c conda-forge netcdf4&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;NetCDF files are a common format for scientific data. The data used in this example is the Roemmich-Gilson Argo temperature climatology, which is available &lt;a href="https://sio-argo.ucsd.edu/RG_Climatology.html"&gt;here&lt;/a&gt; from the second link on the page entitled “2004-2018 RG Argo Temperature Climatology.” This data comes from measurements made by thousands of floats throughout the ocean taking measurements at irregular times and in irregular locations, which are averaged into a gridded product with monthly values on a 1° grid.&lt;/p&gt;

&lt;p&gt;In your Jupyter notebook, start by running the following commands to import various packages needed for this example.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;import matplotlib.pyplot as plt
import numpy as np
import datetime
import pandas as pd
import influxdb_client, os, time
from influxdb_client import InfluxDBClient, Point, WritePrecision, WriteOptions
from influxdb_client.client.write_api import SYNCHRONOUS
import netCDF4 as nc&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="cleaning-up-the-data"&gt;Cleaning up the data&lt;/h2&gt;
&lt;p&gt;Run the following commands to read the file.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;file_name = '/filepath/RG_ArgoClim_Temperature_2019.nc'
data_structure = nc.Dataset(file_name)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you run&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;print(data_structure.variables.keys())&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then the following should be the output.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;dict_keys(['LONGITUDE', 'LATITUDE', 'PRESSURE', 'TIME', 'ARGO_TEMPERATURE_MEAN', 'ARGO_TEMPERATURE_ANOMALY', 'BATHYMETRY_MASK', 'MAPPING_MASK'])&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;These are the various arrays within the file, which store the data. The ones we’re interested in are &lt;code class="language-markup"&gt;LONGITUDE&lt;/code&gt;, &lt;code class="language-markup"&gt;LATITUDE&lt;/code&gt;, &lt;code class="language-markup"&gt;TIME&lt;/code&gt;, &lt;code class="language-markup"&gt;ARGO_TEMPERATURE_MEAN&lt;/code&gt;, and &lt;code class="language-markup"&gt;ARGO_TEMPERATURE_ANOMALY&lt;/code&gt;. To read the data from the file run&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;lon = data_structure.variables['LONGITUDE']
lat = data_structure.variables['LATITUDE']
time = data_structure.variables['TIME']
temp_mean = data_structure.variables['ARGO_TEMPERATURE_MEAN']
temp_anom = data_structure.variables['ARGO_TEMPERATURE_ANOMALY']&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To see the dimensions of each array, run&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;print(lon.shape)
print(lat.shape)
print(time.shape)
print(temp_mean.shape)
print(temp_anom.shape)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You should see&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;(360,)
(145,)
(180,)
(58, 145, 360)
(180, 58, 145, 360)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are 360 longitude values, 145 latitude values and 180 time values. There are also 58 pressure values. Since we’re only interested in the ocean temperature at the surface, which is the lowest pressure, we’re going to subset the array to take the first pressure index.&lt;/p&gt;

&lt;p&gt;You will also notice that the mean temperature doesn’t have a time dimension. This dataset separates out temperature into the mean value over the whole time series and the monthly differences from the mean, called anomalies. To get the actual temperature for each month, we just add the mean and anomaly.&lt;/p&gt;

&lt;p&gt;If you try to display the value of a random point within the array, you should see&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;temp_anom[0,0,0,0]

masked_array(data=1.082,
             mask=False,
       fill_value=1e+20,
            dtype=float32)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Right now, each value is within an array as an artifact of being within a netCDF file. Run the following to get simple arrays of values:&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;lon = lon[:]
lat = lat[:]
time = time[:]
temp_mean = temp_mean[:]
temp_anom = temp_anom[:]

temp_anom[0,0,0,0]

1.082&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The unit of time is the number of months since the start of the dataset. To make an understandable date-time vector, run the following.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;time_pass=pd.date_range(start='1/1/2004', periods=180, freq='MS')&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To create an array of the temperature just at the surface, add together the mean and anomaly arrays at the first pressure index with the following code.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;ocean_surface_temp=np.empty((180,145,360))
for itime in range(180):
ocean_surface_temp[itime,:,:]=temp_mean[0,:,:]+temp_anom[itime,0,:,:]&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="writing-data-to-influxdb-cloud"&gt;Writing data to InfluxDB Cloud&lt;/h2&gt;
&lt;p&gt;Now that the data we want is in a nicely formatted array, we can get started sending it to &lt;a href="https://cloud2.influxdata.com/signup/"&gt;InfluxDB Cloud&lt;/a&gt;. To save storage space, for this example we’re not going to upload the whole thing, just 10 years of data from a 10-degree-by-10-degree box in the Atlantic Ocean. The coordinates I picked for this are 15.5°N to 25.5°N and 34.5°W to 44.5°W, or latitude indices 80 to 90 and longitude indices 295 to 305.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/1m3REf9rPUwjXfE0mpIcCW/1672ede043d288a7ac0719264a4d3891/data_from_Atlantic_Ocean.png" alt="data from Atlantic Ocean" /&gt;&lt;/p&gt;

&lt;p&gt;To send data to InfluxDB most efficiently, we’re going to create an array of data points. We’re going to call each point &lt;code&gt;ocean_temperature&lt;/code&gt;, and that name will be set as its measurement.&lt;/p&gt;

&lt;p&gt;To use the geo-temporal package in InfluxDB, you need to send in your data with latitude and longitude as fields, so each of our points will have latitude, longitude and temperature fields. In InfluxDB, if there are two field values at the same time stamp, the next value you upload will overwrite the previous one. This is a problem for us because our data has many values for latitude, longitude and temperature at the same time.&lt;/p&gt;

&lt;p&gt;A simple way to prevent data from being overwritten is by giving each point a location tag. In our data, there are many measurements at the same time but no two at the same location and time. You can develop other unique tags for points at the same time you don’t want to be overwritten, as described in more detail &lt;a href="https://docs.influxdata.com/influxdb/v2.4/write-data/best-practices/duplicate-points/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;points_to_send = []
for itime in range(120):
    for ilat in range(80, 90):
        for ilon in range(295, 305):
            p = Point("ocean_temperature")
            p.tag("location", str(lat[ilat]) + str(lon[ilon]))
            p.field("lat", lat[ilat])
            p.field('lon', lon[ilon])
            p.field('temp', ocean_surface_temp[itime,ilat,ilon])
            p.time(time_pass[itime])
            points_to_send.append(p)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then in the InfluxDB UI, create a bucket and token from the “load data” sidebar, as shown in these screenshots.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/4wr7nkFTcnFeVJWXiI1Vhm/6cf54f3500b54c75acc6d989bbf539a3/Creat_Bucket.png" alt="Creat Bucket" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/22p5GZLLFM2E50l4vsahPp/95beebef4661d77ff2a2073899fdd6b1/Create_API_Token.png" alt="Create API Token" /&gt;&lt;/p&gt;

&lt;p&gt;Your org is your email for the account, and the url is the url of your cloud account.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;token=token
org = org
url = url
bucket = bucket&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here we set the batch size to 5,000 because that makes things more efficient, as described in the docs &lt;a href="https://docs.influxdata.com/influxdb/v2.4/write-data/best-practices/optimize-writes/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;with InfluxDBClient(url=url, token=token, org=org) as client:
    with client.write_api(write_options=WriteOptions(batch_size=5000)) as write_api:
        write_api.write(bucket=bucket, record=points_to_send)&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="querying-the-data"&gt;Querying the data&lt;/h2&gt;
&lt;p&gt;Now that the data is in InfluxDB, we can query it. I’m going to use several queries to show you what each command does step by step. First, set up the query API.&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-markup"&gt;client = influxdb_client.InfluxDBClient(url=url, token=token, org=org)
query_api = client.query_api()&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Within your Jupyter notebook, each query is a string of Flux code, which you then call with the query API. In all of these cases, we have many options for what to do with the results of our queries. For the first ones, I’m going to print the first few results to make sure they’re working, and then finally we’ll end with a plot.&lt;/p&gt;

&lt;p&gt;This query simply gathers all of the latitude, longitude and temperature fields.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;query1='from(bucket: "sample_geo")\
  |&amp;gt; range(start: 2003-12-31, stop: 2020-01-01)\
  |&amp;gt; filter(fn: (r) =&amp;gt; r["_measurement"] == "ocean_temperature")\
  |&amp;gt; filter(fn: (r) =&amp;gt; r["_field"] == "lat" or r["_field"] == "temp" or r["_field"] == "lon")\
  |&amp;gt; yield(name: "all points")'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This query returns latitude.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;query2='from(bucket: "sample_geo")\
  |&amp;gt; range(start: 2003-12-31, stop: 2020-01-01)\
  |&amp;gt; filter(fn: (r) =&amp;gt; r["_measurement"] == "ocean_temperature")\
  |&amp;gt; filter(fn: (r) =&amp;gt; r["_field"] == "lat")\
  |&amp;gt; yield(name: "lat")'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To execute either of these queries, and print the amount of points there are and the first few results, you can run the following, changing what query you’re passing in.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;result = client.query_api().query(org=org, query=query2)
results = []
for table in result:
    for record in table.records:
        results.append((record.get_value(), record.get_field()))
print(len(results))
print(results[0:10])&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now we’re going to get started using the geo-temporal package. The &lt;code class="language-markup"&gt;geo.shapeData&lt;/code&gt; function reformats the data and assigns each point an S2 cell ID. You specify what your latitude and longitude field names are, “lat” and “lon” in this case, and what S2 cell level you want. In this case, I’ve chosen 10, which corresponds to an average of 1.27 kilometers squared. You can read about the cell levels &lt;a href="https://s2geometry.io/resources/s2cell_statistics.html"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next we’re going to use the &lt;code class="language-markup"&gt;geo.filterRows&lt;/code&gt; function to select the region we want to calculate the average temperature of. I’m picking a 150 kilometer circle centered around 20.5°N and 39.5°W, but you can pick any sort of box, circle or polygon as described &lt;a href="https://docs.influxdata.com/influxdb/cloud/query-data/flux/geo/filter-by-region/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;By default, the data is grouped by s2_cell_id, so to calculate a running mean over this whole region, we have to run the group function and tell it to group by nothing so all the data in the region is grouped together. Then you can use the &lt;code class="language-markup"&gt;aggregateWindow&lt;/code&gt; function to calculate running means and standard deviations over time windows of your choice.&lt;/p&gt;

&lt;p&gt;Putting this all together, the code below calculates and plots the mean over this circle every three months and every year, and the standard deviation every three months, which I’ve put as error bars in the plot below.&lt;/p&gt;

&lt;pre class="line-numbers"&gt;&lt;code class="language-markup"&gt;query3='import "experimental/geo"\
from(bucket: "sample_geo")\
    |&amp;gt; range(start: 2003-12-31, stop: 2020-01-01)\
    |&amp;gt; filter(fn: (r) =&amp;gt; r["_measurement"] == "ocean_temperature")\
    |&amp;gt; geo.shapeData(latField: "lat", lonField: "lon", level: 13)\
    |&amp;gt; geo.filterRows(region: {lat: 20.5, lon: -39.5, radius: 150.0}, strict: true)\
    |&amp;gt; group()\
    |&amp;gt; aggregateWindow(column: "temp",every: 3mo, fn: mean, createEmpty: false)\
    |&amp;gt; yield(name: "running mean")\
    '

query4='import "experimental/geo"\
from(bucket: "sample_geo")\
    |&amp;gt; range(start: 2003-12-31, stop: 2020-01-01)\
    |&amp;gt; filter(fn: (r) =&amp;gt; r["_measurement"] == "ocean_temperature")\
    |&amp;gt; geo.shapeData(latField: "lat", lonField: "lon", level: 13)\
    |&amp;gt; geo.filterRows(region: {lat: 20.5, lon: -39.5, radius: 150.0}, strict: true)\
    |&amp;gt; group()\
    |&amp;gt; aggregateWindow(column: "temp",every: 3mo, fn: stddev, createEmpty: false)\
    |&amp;gt; yield(name: "standard deviation")\
    '

query5='import "experimental/geo"\
from(bucket: "sample_geo")\
    |&amp;gt; range(start: 2003-12-31, stop: 2020-01-01)\
    |&amp;gt; filter(fn: (r) =&amp;gt; r["_measurement"] == "ocean_temperature")\
    |&amp;gt; geo.shapeData(latField: "lat", lonField: "lon", level: 13)\
    |&amp;gt; geo.filterRows(region: {lat: 20.5, lon: -39.5, radius: 150.0}, strict: true)\
    |&amp;gt; group()\
    |&amp;gt; aggregateWindow(column: "temp",every: 12mo, fn: mean, createEmpty: false)\
    |&amp;gt; yield(name: "running mean")\
    '

result = client.query_api().query(org=org, query=query3)
results_mean = []
results_time = []
for table in result:
    for record in table.records:
        results_mean.append((record["temp"]))
        results_time.append((record["_time"]))

result = client.query_api().query(org=org, query=query4)
results_stddev = []
for table in result:
    for record in table.records:
        results_stddev.append((record["temp"]))
result = client.query_api().query(org=org, query=query5)

results_mean_annual = []
results_time_annual = []
for table in result:
    for record in table.records:
        results_mean_annual.append((record["temp"]))
        results_time_annual.append((record["_time"]))

plt.rcParams["figure.figsize"] = (10,7)
plt.errorbar(results_time,results_mean,results_stddev)
plt.plot(results_time_annual,results_mean_annual)
plt.xlabel("Time")
plt.ylabel("Degrees C")
plt.title("Average Ocean Surface Temperature")
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/6eHlEGPoFuQbQOUT2vpjfr/5cac234b33e99b207669589bcab92660/geo_blog_plot.png" alt="Geo-temporal graph" /&gt;&lt;/p&gt;

&lt;h2 id="further-resources"&gt;Further resources&lt;/h2&gt;
&lt;p&gt;Using InfluxDB in Python makes geo-temporal analysis more efficient. I hope this example of the kinds of calculations you can do with this package sparks some ideas for you. This is really the tip of the iceberg. You can also use the package to calculate distances, find intersections, find whether certain regions contain specific points and more. And it can make an even bigger difference in saving computations with more complicated data sets with more points.&lt;/p&gt;

&lt;p&gt;The combination of a platform built for &lt;a href="https://www.influxdata.com/what-is-time-series-data/"&gt;time-series data&lt;/a&gt; and the S2 cell system is very powerful. For more information, you can read about the Flux geo-temporal package in our docs &lt;a href="https://docs.influxdata.com/influxdb/cloud/query-data/flux/geo/"&gt;here&lt;/a&gt; and watch our Meet the Developers mini-series on the subject &lt;a href="https://www.youtube.com/watch?v=OlT1-kMNdCs"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;div style="padding:56.25% 0 0 0;position:relative; margin: 30px 0px;"&gt;&lt;iframe src="https://player.vimeo.com/video/767146468?h=38354445e8&amp;amp;badge=0&amp;amp;autopause=0&amp;amp;player_id=0&amp;amp;app_id=58479" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen="" style="position:absolute;top:0;left:0;width:100%;height:100%;" title="Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays 2022"&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;script src="https://player.vimeo.com/api/player.js"&gt;&lt;/script&gt;

</description>
      <pubDate>Mon, 31 Oct 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/getting-started-python-geo-temporal-analysis/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/getting-started-python-geo-temporal-analysis/</guid>
      <category>Product</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>A Sneak Peek at InfluxDB University’s Upcoming Live Telegraf Training</title>
      <description>&lt;p&gt;&lt;a href="https://www.influxdata.com/time-series-platform/telegraf/"&gt;Telegraf&lt;/a&gt;, InfluxDB’s open source data collection agent, uses hundreds of plugins to gather data, transform it, and send it where you want it to go. It’s a very powerful tool to incorporate into your data pipeline. If you’ve been wanting to get started with Telegraf but haven’t known where to start, a great way to learn is at InfluxDB University’s live virtual training on November 1st. You can &lt;a href="https://www.influxdays.com/taming-the-tiger-tips-and-tricks-for-using-telegraf/"&gt;register here&lt;/a&gt; for free to learn from experts in a hands-on lab setting. By the time the training is done you’ll be a Telegraf pro and have a digital badge to show for it. This training is part of &lt;a href="https://www.influxdays.com"&gt;InfluxDays&lt;/a&gt;, our free virtual user conference on November 2-3, 2022. Anyone who signs up for the training will be automatically registered for InfluxDays. Here’s a sneak peek at some of the topics covered in the training.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/5VR7wlP9QqPwwJkLo4TRnH/aa747279ccdad81759c1b0eef0f2ce34/influxdb-university-telegraf-training.png" alt="InfluxDB U Telegraf Training" /&gt;&lt;/p&gt;

&lt;h2 id="understanding-telegraf"&gt;Understanding Telegraf&lt;/h2&gt;

&lt;p&gt;During the training you’ll learn about all the things you can do with Telegraf. There are hundreds of plugins for collecting data from sensors, systems, and stacks, but Telegraf goes beyond that. You’ll learn about processor and aggregator plugins that let you filter and transform data before it reaches its endpoint. Processors plugins can do things like print metrics as they pass through Telegraf and add tags to metrics. Aggregator plugins create new metrics, such as running means, standard deviations, or quantiles. You can choose within Telegraf whether to pass through both the original metrics and the aggregates or to drop the originals and just send the aggregate values to your endpoint.&lt;/p&gt;

&lt;p&gt;You’ll also get hands-on experience configuring and running Telegraf. The instructors leading this course have designed exercises to give you the skills you need to set up your local environment, configure your cloud environment, and enable plugins to collect and filter data. This training includes several exercises you can complete live, including learning to debug and learning dual writes to send data to two different outputs. The great thing about taking a course live is that you can get your questions answered as they come up, and you can also hear the responses to questions other participants bring up that you might not have thought to ask.&lt;/p&gt;

&lt;h2 id="edge-data-replication"&gt;Edge Data Replication&lt;/h2&gt;

&lt;p&gt;This training will also show you how to use InfluxDB with &lt;a href="https://www.influxdata.com/products/influxdb-edge-data-replication/"&gt;Edge Data Replication&lt;/a&gt; (EDR). Edge Data Replication lets you automatically replicate data from an edge OSS instance of InfluxDB to InfluxDB Cloud. This is great for situations where you want to work with data in detail locally and also in aggregate in the Cloud. For example, a factory floor manager might want to keep tabs on the status of a machine in detail, but a data scientist at the company might want to work with downsampled data from many factories in the Cloud. Edge Data Replication uses a durable queue so it’s also great for situations with intermittent network connectivity. In this training you’ll learn to work with InfluxDB OSS locally and replicate data to InfluxDB Cloud.&lt;/p&gt;

&lt;p&gt;If you can’t take the live course but still want to learn more about Telegraf, InfluxDB University has you covered! &lt;a href="https://university.influxdata.com/courses/data-collection-with-telegraf-tutorial/"&gt;Data Collection with Telegraf&lt;/a&gt; is a free self-paced course you can take to become a Telegraf expert and earn a shareable badge. &lt;a href="https://university.influxdata.com"&gt;InfluxDB University&lt;/a&gt; also has courses on Flux, InfluxDB Cloud, InfluxDB Enterprise, and more. Don’t forget to register for &lt;a href="https://www.influxdays.com"&gt;InfluxDays&lt;/a&gt;, There will be some exciting announcements in InfluxDB University in the session on November 3rd. And the Become an InfluxDB Pro in 20 Minutes and Become a Flux Pro are a must see. If you can’t make it live, register for free to get a link to all the on-demand sessions when they become available.&lt;/p&gt;
</description>
      <pubDate>Mon, 24 Oct 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/sneak-peek-influxdb-university-live-telegraf-training/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/sneak-peek-influxdb-university-live-telegraf-training/</guid>
      <category>Product</category>
      <category>Use Cases</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>TL;DR InfluxDB’s MQTT Native Collector</title>
      <description>&lt;p&gt;There are a lot of ways to get data into InfluxDB, including over a dozen client libraries, hundreds of Telegraf plugins, and Native Collectors. Native Collectors let you directly ingest data from a cloud broker to InfluxDB Cloud so you don’t need to install anything else or write extra code.&lt;/p&gt;

&lt;p&gt;This blog will focus on the &lt;a href="https://www.influxdata.com/integration/mqtt-native-collector/"&gt;MQTT Native Collector&lt;/a&gt;. &lt;a href="https://www.influxdata.com/mqtt/"&gt;MQTT&lt;/a&gt; is one of the most popular IoT message brokers because it’s lightweight and works well in environments with limited bandwidth and intermittent network connectivity. The MQTT Native Collector is the fastest way to get your data into InfluxDB so you can analyze it, transform it, create visualizations, and set up alerts. That speed is very important for many IoT applications that rely on real-time analysis to keep smart devices running. To get started with the MQTT Native Collector you just need to have a message broker in place and an InfluxDB Cloud account.&lt;/p&gt;

&lt;h2 id="when-to-use-influxdb-mqtt-native-collector"&gt;When to use InfluxDB MQTT Native Collector&lt;/h2&gt;
&lt;p&gt;Because there are so many ways to get data into InfluxDB, it’s helpful to know when to use each method. A Native Collector is useful when you’re really busy and don’t want to go through the additional step of writing software using a client library to ingest data. But in that case you may also want to use &lt;a href="https://www.influxdata.com/integration/mqtt-telegraf-consumer/"&gt;Telegraf&lt;/a&gt;, and it includes processor and aggregator plugins to transform and downsample your data before it reaches InfluxDB.&lt;/p&gt;

&lt;p&gt;However, Telegraf requires you to install it somewhere. If your message broker is in the cloud and you’re using InfluxDB Cloud there might not be a convenient place to install Telegraf, especially if you’re using a third party cloud service. And it can also be more efficient to keep your data pipeline in the cloud rather than needlessly downloading your data onto a server before uploading it back to InfluxDB Cloud. You can sometimes install Telegraf in a virtual machine in your cloud environment, such as if you’re using &lt;a href="https://www.influxdata.com/partners/hivemq/"&gt;HiveMQ&lt;/a&gt; or Mosquito, but if you’re using a third party cloud-based message broker like HiveMQ Cloud that’s not possible.&lt;/p&gt;

&lt;p&gt;The MQTT Native Collector is a great solution for people who want to keep their data in the cloud and get it into InfluxDB quickly. You save on time by getting rid of download or upload speed and bandwidth issues because you don’t need to download data to wherever you’re running Telegraf. You also save compute time associated with running Telegraf on a virtual machine.&lt;/p&gt;

&lt;p&gt;Using a client library gives you the most flexibility because the code you write is specifically designed for your application, and that may be a good option if there are complex transformations you need to do on your data. But if you’re trying to ingest raw data directly from IoT devices to InfluxDB and speed matters most, the MQTT Native Collector is the simplest and quickest method.&lt;/p&gt;

&lt;h2 id="configuring--parsing-mqtt-data-with-influxdbs-native-collector"&gt;Configuring &amp;amp; parsing MQTT data with InfluxDB’s Native Collector&lt;/h2&gt;
&lt;p&gt;It’s simple to &lt;a href="https://docs.influxdata.com/influxdb/cloud/write-data/no-code/native-subscriptions/"&gt;configure native collection&lt;/a&gt; and get started. First you need to configure the IP address or URL of your MQTT message broker and any authentication parameters. Then you need to set the topic(s) you want to subscribe to. If there are multiple topics you want to subscribe to, you can repeat this process for each topic or you can use wildcards.&lt;/p&gt;

&lt;p&gt;Topics generally come in the format &lt;code class="language-bash"&gt;my/fav/set/topics&lt;/code&gt; separated by tree levels. At any tree level you can use a &lt;code class="language-bash"&gt;+&lt;/code&gt; to indicate that you want to subscribe to anything from that tree level, such as &lt;code class="language-rust"&gt;my/+/set/topics&lt;/code&gt; or a &lt;code class="language-bash"&gt;#&lt;/code&gt; to indicate you want to subscribe to anything from that tree level down, such as &lt;code class="language-bash"&gt;my/fav/#&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Finally, you need to set parsing rules. This tells InfluxDB how to read raw data from the MQTT broker and where to put it. Data in an MQTT broker can be in any format, and InfluxDB needs to know the measurement, timestamp, fields, and tags. You don’t need parsing rules if you’re subscribing to a topic where the data is already in line protocol format.&lt;/p&gt;

&lt;p&gt;One of the most common data formats is JSON, so InfluxDB has a specific parser already set up for it. If your data is in another format you can use string parsing to let InfluxDB know what variables in your data should be read in as measurements, fields, timestamps, and tags. It’s that simple and can be set up in minutes.&lt;/p&gt;

&lt;p&gt;To learn more about InfluxDB’s MQTT Native Collector you can watch our series of videos on this topic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=J3bQrL8ihSU"&gt;Part one&lt;/a&gt; is an introduction to Native Collectors.&lt;/p&gt;

&lt;div class="youtube-container"&gt;
  &lt;iframe class="responsive-iframe" src="https://www.youtube.com/embed/J3bQrL8ihSU" title="Introduction to InfluxDB Native Collector"&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=I583LSj0bgs"&gt;Part two&lt;/a&gt; explains when to use Native Collectors.&lt;/p&gt;

&lt;div class="youtube-container"&gt;
  &lt;iframe class="responsive-iframe" src="https://www.youtube.com/embed/I583LSj0bgs" title="When to use InfluxDB Native Collector for MQTT"&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;And &lt;a href="https://www.youtube.com/watch?v=uI_HYgx_PIQ"&gt;part three&lt;/a&gt; gives examples for how to set up the MQTT Native Collector.&lt;/p&gt;

&lt;div class="youtube-container"&gt;
  &lt;iframe class="responsive-iframe" src="https://www.youtube.com/embed/uI_HYgx_PIQ" title="Configuring &amp;amp; Parsing MQTT Data with InfluxDB's Native Collector"&gt;&lt;/iframe&gt;
&lt;/div&gt;
</description>
      <pubDate>Tue, 18 Oct 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/tldr-influxdb-mqtt-native-collector/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/tldr-influxdb-mqtt-native-collector/</guid>
      <category>Product</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>We’re Going to AWS re:Invent 2022</title>
      <description>&lt;p&gt;From November 28–December 3, InfluxData is heading to Las Vegas for &lt;a href="https://www.influxdata.com/reinvent/"&gt;AWS re:Invent 2022&lt;/a&gt;. We’re super excited to connect with our community in person and take part in this huge event. You can get InfluxDB on AWS and we have some great demos prepared to show how well they work together. Stop by our booth (#3648) in the Data and Analytics Competency Area, to get your questions answered by our experts. We also have a virtual booth if you can’t make it in person.&lt;/p&gt;

&lt;h2 id="watch-live-demos"&gt;Watch live demos&lt;/h2&gt;

&lt;p&gt;Every day of the expo, we will have live demos at our booth showing how you can use InfluxDB to build powerful time series applications. Come check out what we have planned. Ever wondered how much energy it takes to power the coffee needs of an events team? We even have a connected Lego robot for the IoT fans out there. These demos are super fun and a great way to learn about all the built-in developer tools that make InfluxDB so powerful.&lt;/p&gt;

&lt;h2 id="meet-with-influxdb-experts"&gt;Meet with InfluxDB experts&lt;/h2&gt;

&lt;p&gt;Stop by our booth to meet with InfluxDB pros. These events are exciting for us because we love to talk directly with people like you! If you’re interested in InfluxDB and want to ask how it can help with your time series applications, swing on by. You can book a 1:1 onsite meeting from our &lt;a href="https://www.influxdata.com/reinvent/"&gt;event page&lt;/a&gt; and get a free IoT weather station. Plus, if you come to the booth and set up a Cloud account and register an OSS instance with us you can win a drone.&lt;/p&gt;

&lt;p&gt;You can also stop by the Marketplace Pavilion Wednesday and Thursday at Kiosk #3525-9 to ask any and all questions about running InfluxDB in AWS. Be sure to ask one of our team members how getting InfluxDB through the AWS Marketplace can simplify billing and help meet AWS annual spend commitments.&lt;/p&gt;

&lt;h2 id="interviews-you-can-use"&gt;Interviews you can use&lt;/h2&gt;

&lt;p&gt;Catch some of InfluxData’s executives on theCUBE and Screaming in the Cloud discussing the latest InfluxDB enhancements. This includes &lt;a href="https://www.influxdata.com/products/influxdb-edge-data-replication/"&gt;Edge Data Replication&lt;/a&gt;, as well as the company’s broader focus on expediting time series data pipelines from edge to cloud.&lt;/p&gt;

&lt;h2 id="win-a-free-conference-pass"&gt;Win a free conference pass&lt;/h2&gt;

&lt;p&gt;Enter to win a free full conference pass for AWS re:Invent! Simply visit our &lt;a href="https://www.influxdata.com/win-a-free-ticket-aws-reinvent/"&gt;contest page&lt;/a&gt; to sign up for, and complete a qualified meeting with our sales team by November 2, 2022. The page is also packed with resources for using InfluxDB with AWS like blogs, videos, case studies, templates, and more.&lt;/p&gt;

&lt;h2 id="see-you-soon"&gt;See you soon!&lt;/h2&gt;

&lt;p&gt;We hope to see you at AWS re:Invent November 28–December 3! We love talking about the cool things you can do with InfluxDB and we hope you’ll stop by our booth so we can clear up any questions, spark some ideas for using InfluxDB with AWS, and give out swag (Did someone say socks?!). See you soon!&lt;/p&gt;
</description>
      <pubDate>Mon, 17 Oct 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/aws-reinvent-2022/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/aws-reinvent-2022/</guid>
      <category>Company</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
    <item>
      <title>Why Use a Purpose-Built Time Series Database?</title>
      <description>&lt;p&gt;&lt;em&gt;This article was originally published in &lt;a href="https://thenewstack.io/why-use-a-purpose-built-time-series-database/"&gt;The New Stack&lt;/a&gt; and is reposted here with permission.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For many workloads, using a time series database is a smart choice that saves time and storage space.&lt;/p&gt;

&lt;p&gt;Developers and companies have more database choices than ever. Choosing the right database for a project saves time when writing and querying data. As companies work with larger datasets to make increasingly intelligent and automated systems, efficiency is key. For many workloads, using a time series database is a smart choice that saves time and storage space.&lt;/p&gt;

&lt;h2 id="how-time-series-data-is-different"&gt;How time series data is different&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.influxdata.com/what-is-time-series-data/"&gt;Time series data&lt;/a&gt;  is any metric with a timestamp. It includes many kinds of variables, from weather patterns to CPU usage. It often comes from sensors, systems or applications that need to make real-time decisions. This data is vital to understanding past performance and creating models to  &lt;a href="https://www.influxdata.com/time-series-forecasting-methods/"&gt;predict future outcomes&lt;/a&gt;. The amount of data involved in these calculations can quickly add up, and it’s important not to lose resources to an inefficient data architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.influxdata.com/time-series-database/"&gt;Time series databases&lt;/a&gt;  are designed to handle typical time series workloads. They’re optimized to  &lt;a href="https://thenewstack.io/you-dont-need-a-blockchain-you-need-a-time-series-database/"&gt;measure change over time&lt;/a&gt;, rather than relationships between data points. The two main kinds of time series data are metrics, which are taken at regular intervals, and events, which are taken at irregular intervals due to outside events or user measurements. It’s important that a time series database is able to handle both metrics and events, and is able to average events and convert them to metrics.&lt;/p&gt;

&lt;h2 id="storing-data"&gt;Storing data&lt;/h2&gt;

&lt;p&gt;A good database needs to store data securely and efficiently. Users have to be able to write data to it quickly and feel confident that it can handle the volume of data they plan to store in it. Time series data can have huge volumes, and the databases it’s stored in need to be built to accommodate that. Time is linear, and time series databases can take advantage of this by appending new data to existing data. They’re optimized to quickly write in time-stamped data the way it’s most commonly used to save time from the moment users begin writing in data.&lt;/p&gt;

&lt;p&gt;Time series databases also may have lifecycle management built in. It’s common for developers or companies to initially collect and analyze highly detailed data and, as time goes on, to want to store smaller, downsampled datasets that describe trends without taking up as much storage space. Time series databases can take this into account and automatically aggregate and delete data as necessary for each application. If developers use a more basic database, they often need to create new systems to manage data in this way. With a time series database, it’s already taken care of and developers can focus on their applications.&lt;/p&gt;

&lt;p&gt;Time series databases also need to be easily scalable. For example in IoT use cases, as more  &lt;a href="https://www.influxdata.com/sensor-data-is-time-series-data/"&gt;sensors&lt;/a&gt;  are added and projects expand, data increases exponentially. This is common in time series workloads, and the databases used for these projects need to be able to accommodate it.&lt;/p&gt;

&lt;h2 id="querying-data"&gt;Querying data&lt;/h2&gt;

&lt;p&gt;Using a time series database also speeds up query time for time series workloads. One of the most common things to do with time series data is to summarize it over a large period of time. This kind of query is very slow when storing data in a typical relational database that uses rows and columns to describe the relationships of different data points. A database designed to process time series data can handle queries exponentially faster. Time series databases also may have built-in  &lt;a href="https://www.influxdata.com/how-to-visualize-time-series-data/"&gt;visualization tools&lt;/a&gt;  or advanced functions to make it simpler to do common kinds of time series analysis.&lt;/p&gt;

&lt;h2 id="choosing-a-time-series-database"&gt;Choosing a time series database&lt;/h2&gt;

&lt;p&gt;There are a few time series databases out there that you can explore. For this blog post, we will look at the leading time series database according to  &lt;a href="https://db-engines.com/en/ranking/time+series+dbms"&gt;DB-Engines&lt;/a&gt;, InfluxDB. InfluxDB assigns a measurement name and timestamp to data and uses key/value pairs for data values and metadata. It keeps measurement names and sets of tags in an inverted index, which speeds up queries. Users can write queries based on measurement, tag and/or field across a time range and receive results in milliseconds. A single InfluxDB server can handle over 2 million writes per second.  &lt;a href="https://www.influxdata.com/products/compare/"&gt;Compared to a NoSQL database&lt;/a&gt;  like Cassandra, InfluxDB writes data 4.5 times faster, uses storage space 2.1 times less and returns queries 45 times faster.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/1PDmtUKTAPN2JvOiJuUuNM/befbf5d5a4316175ecf03bacb065c770/db-engines-influxdb-time-series-database.png" alt="db-engines-influxdb-time-series-database" /&gt;&lt;/p&gt;

&lt;p&gt;Databases are the backbones of many applications and working with time-stamped data in a time series database saves developers time and storage. Choosing the right database for an application lets developers focus on building cool projects, rather than spending time managing architecture before they can get started.&lt;/p&gt;
</description>
      <pubDate>Thu, 13 Oct 2022 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/why-use-purpose-built-time-series-database/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/why-use-purpose-built-time-series-database/</guid>
      <category>Developer</category>
      <author>Susannah Brodnitz (InfluxData)</author>
    </item>
  </channel>
</rss>
