Exploring Geo-Temporal Flux

Navigate to:

Flux recently added geo-temporal capabilities to its arsenal, and I have been exploring how to effectively use this new combination of time series and geolocation data. To help get you started, we’ll cover a geo-temporal overview and then look into a few examples.

If you would like to follow along using InfluxDB 2.0 OSS (open source) or InfluxDB Cloud, deploy this AWS Lambda to collect hourly earthquake data from the U.S. Geological Survey (USGS).

Geo-temporal Flux: an overview

Flux’s geo-temporal powers are derived from a Golang S2 Geometry library. S2 is based on spherical geometry, which makes it well-suited for answering geolocation questions.

S2 has many details, but the key concepts in relation to Flux are S2 Cells and their level (from 0 to 30). S2 Cells are bounded by four spherical geodesics. At level 0, an S2 Cell represents approximately 85M km2, and as the level increases, S2 Cells are subdivided into smaller areas. At level 30, a single subdivision represents an area of a little less than 1 cm2.

Technically, geo-temporal analysis in Flux requires only latitude and longitude. However, tagging your data with S2 Cell Identifiers (IDs) helps to efficiently analyze data at scale. S2 Cell IDs — sometimes referred to as S2 Cell tokens — uniquely identify an S2 Cell at the designated level. To use S2 Cell IDs, you can either calculate them at query time or as the data is written to InfluxDB.

Calculating at query time uses Flux to calculate the S2 Cell ID. Calculating the S2 Cell ID with each query takes time and is inefficient when multiple queries execute on the same dataset. In contrast, preparing the S2 Cell ID while ingesting the data skips query time calculations at the cost of increased cardinality. For those that use Telegraf for data acquisition, there is an S2 Geo processor plugin that allows you to calculate and add the S2 Cell ID to your latitude and longitude data.

If you decide to add the S2 Cell ID at ingest time, consider the precision of your geo-temporal calculations and cardinality, and adjust your S2 level accordingly. For example, at level 0, six S2 Cell IDs represent earth, but at level 16, over 25 billion are required. Of course, if you are only tracking the movement of something on land, this can significantly reduce the number of S2 Cells you need; meaning you can likely up the precision — assuming the latitude and longitude data you’ve captured has sufficient precision as well.

The earthquake Lambda calculates and tags earthquakes with level 9 S2 Cell IDs at ingest time?. At level 9, each S2 Cell ID is roughly 12 miles (19.31 km) wide and there are 1.5M possible S2 Cell IDs on earth. For analyzing earthquake data, this strikes a nice balance between granularity and cardinality. For your use case, you might need a higher or lower level of precision.

Geo-temporal Flux examples

Let’s dig into a real-time analysis of USGS earthquake data. This initial example finds earthquakes that occurred in the last 24 hours within a region that roughly represents the west coast of the continental United States:

import "experimental/geo"

from(bucket: "Earthquake")
  |> range(start: -24h)
  |> filter(fn: (r) => r["_measurement"] == "geo")
  |> filter(fn: (r) => r["_field"] == "lat" or r["_field"] == "lon")
  |> geo.filterRows(
    region: {
      minLat: 32.245914,
      maxLat: 49.259394,
      minLon: -126.860380,
      maxLon: -114.715901
    }
  )

We defined a box with a maximum and minimum latitude and longitude, and the geo function makes sure to only provide us the list of earthquakes within those borders.

For those times when a simple box does not fit your use case, Flux provides circles and general polygons. For example, here I investigate the number of earthquakes in the last 7 days in an area roughly bounded by the five San Francisco Bay Area counties I regularly travel to (San Francisco, Marin, Alameda, Contra Costa, and San Mateo County):

import "experimental/geo"

from(bucket: "Earthquake")
  |> range(start: -7d)
  |> filter(fn: (r) => r["_measurement"] == "geo")
  |> filter(fn: (r) => r["_field"] == "lat" or r["_field"] == "lon")
  |> geo.filterRows(
    region: {
        points: [
        {lat: 38.296395, lon: -123.022551}, {lat: 38.320102, lon: -122.879729}, {lat: 38.190695, lon: -122.580351}, {lat: 38.060455, lon: -122.394375},
        {lat: 38.043152, lon: -121.751675}, {lat: 38.112338, lon: -121.581387}, {lat: 37.825442, lon: -121.536068}, {lat: 37.546042, lon: -121.552121},
        {lat: 37.482863, lon: -121.466977}, {lat: 37.451254, lon: -121.931149}, {lat: 37.454980, lon: -122.112390}, {lat: 37.214760, lon: -122.148095},
        {lat: 37.054918, lon: -122.406274}, {lat: 37.990496, lon: -123.027002}
        ]
    }
  )

Though, if you happen to live in the Bay Area, it’s probably best not to dwell too long on the results.

This last example illustrates an iterative query by first finding the strongest earthquake the USGS observed in the last 24 hours worldwide. Then, we look at the last 7 days and count how many earthquakes occurred within 200 km of that event:

import "experimental/geo"

max_eq = 
   from(bucket:"Earthquake")
       |> range(start: -24h)
       |> filter(fn: (r) => r._measurement == "geo")
       |> geo.toRows()
       |> group()
       |> max(column: "mag")
       |> findRecord(fn: (key) => true, idx: 0)

from(bucket:"Earthquake")
    |> range(start: -7d)
    |> geo.filterRows(region: {lat: max_eq.lat, lon: max_eq.lon, radius: 200.0}, strict: false)
    |> group()
    |> count(column: "mag")

What's next?

To explore more of InfluxDB’s emerging geo-temporal capabilities, check out Tim Hall’s InfluxDays roadmap presentation. Tim discusses more details about the work we’ve done on geo-temporal data acquisition, queries, and visualization. Also, I will continue publishing new blogs exploring Flux’s advanced geo-temporal capabilities.

Want to analyze your own data today? Check out Flux’s geo-temporal features in InfluxDB Cloud 2.0, InfluxDB Enterprise 1.8, and InfluxDB OSS 2.0.

As always, if you have questions or feedback, please let us know in our community forum or join us in Slack!