Learning Flux (#fluxlang) is About As Difficult As Learning an API

Navigate to:

Flux (#fluxlang) is the new data scripting language that we’re creating to make querying and analyzing time series and other kinds of data quick and easy. Flux will be able to work with data from InfluxDB, Prometheus, relational databases, CSV files, S3, and any other kind of API or data source. One of the concerns we heard when I first started talking about Flux (previously IFQL) was that asking users to pick up a new language would present too steep a learning curve. However, I think that learning Flux presents no more difficulty than learning any new API and that it might even be easier than using SQL even if you already “know” the latter.

I’ll lay out the basics of the language before jumping to the specific API stuff. Let’s start with an introduction to Flux with some code. Consider the following script:

// Here are some basic parts of the language. This is a comment

// you can assign variables, like a string
s = "this is a string"

// or an int64
i = 1

// or a float64
f = 2.0


// or an object
o = {foo: "bar", asdf: 2.0, jkl: s}
// now access one of those properties
foo = o.foo
asdf = o["asdf"]

// you can also create an object with a shorthand notation
// where key will be the same as variable names
o = {s, i, f}
// equivalent to o = {s: s, i: i, f: f}


// here's an array
a = ["a", "b", "c"]

// here's a duration
d = 2h10m5s
// they're actually represented as seconds, days, and months.
// this is because those units can vary with time zones, Gregorian 
// calendar and all that. Here's one with months and days
d = 1mo7d

// here's a time
t = 2018-08-28T10:20:00Z

// define a function that takes argument named n. Flux only has
// named arguments (no positional ones)
square = (n) => {
  // the standard math operators are in the language
  return n * n
}
// call that function
num = square(n: 23)

// or if a function is one statement you can exclude the braces
square = (n) => n * n

// Now let's do a query. The functions in this query work
// with a stream of data. The stream is made up of tables
// that have columns and records (like in CSV).
// Conceptually, functions are applied to each table in
// the stream

// start with getting data from the InfluxDB server telegraf DB
from(host: "https://localhost:9070", bucket:"telegraf/default")
    // here's the pipe-forward operator. It says to send the
    // output of the previous function to the next one (range)
    // range will filter by time. You can also pass start as
    // a time, so functions have polymorphic arguments
    |> range(start:-1h)

    // now filter specific data. Here we pass an anonymous function.
    // note that brackets and a return statement aren't required if
    // the function is a single statement. Also note that we have
    // comparison and boolean operators
    |> filter(fn: (r) => r._measurement == "cpu" and r.host == "serverA")

    // now group all records from all rows into a table for
    // each region and service that we have. This converts 
    // however many tables we have into one table for each
    // unique region, service pair in the data.
    |> group(keys: ["region", "service"])

    // now compute some aggregates in 10 minute buckets of time.
    // each of these aggregates will get applied to each table.
    // Note that we're passing in an array of function pointers.
    // That is min, max, and mean are all functions that are
    // pipe-forwardable and can be applied to streams.
    |> applyWindow(aggregates: [min, max, mean], every: 10m)

    // And we can iterate over the rows in a table and add or 
    // modify the returned result with the map function. This
    // will add a new column in every table called spread.
    |> map(fn: (r) => {return {spread: r.max - r.min}})

    // Return output of the function chain as a stream called
    // "result". This isn't necessary if only one function
    // chain in the script.
    |> yield(name:"result")

// Finally, here's how we can define a function
// that is pipe-forwardable:
// Filter the stream to a specific set of measurements. This
// can be used with other filter functions and they'll combine.
filterMeasurements = (names, tables=<-) {
  return tables |> filter(fn: (r) => r._measurement in names)
}
// Note that tables=<- defines a parameter called tables that
// can be passed in as a function argument, or pipe-forwarded
// from another function.
// Now we can call to it through a pipe-forward operation
from(bucket:"telegraf/default") |> range(start: 2018-08-27)
    |> filterMeasurements(names: ["cpu", "mem"])
    |> yield(name:"cpu_mem")

// Or we can pass the tables input as an argument like this
filterMeasurements(names: ["cpu", "mem"], tables: (
    from(bucket:"telegraf/default")
        |> range(start:2018-07-27)
    ))
  |> yield(name:"cpu_mem")

That short script introduces the main syntactic constructs of the language and shows an example of querying data from InfluxDB. There’s a little bit more to the language, but this is enough to do many things in Flux. The rest of the learning curve is pure API. That is seeing what functions exist, their arguments, what they do and what they return.

In practice, learning Flux is mostly about learning the API to get things done. This would be true if we had chosen to use Lua, Javascript, or SQL with our own defined functions. The syntactic elements of the language are pretty familiar for anyone that’s had a small amount of experience with Javascript. The strangest thing in there is the pipe-forward operator, but that is something you get used to very quickly.

Learning the API means learning what kind of data goes into a function, what the arguments are, what the function does, and what it outputs. That would be important regardless of what language a user was working with. User interfaces with builders that include all this information will go a long way towards introducing new users to the language without them even having to learn these constructs.

Conceptually, functions in Flux can be split across four major boundaries. First, we have input functions, which read from some source like InfluxDB, a CSV file, Prometheus or wherever. Next we have functions that either combine tables in the resulting stream or split them apart. The functions group, join, and window fit this role. Then we have functions that get applied to each table in a stream like aggregates, selectors, and sorting. Finally, there are output functions for sending the result to some data sync. Yield returns those results to the user while there will be other outputs for sending results to InfluxDB, Kafka, S3, files, and others.

If you’ve read this far, you’re now about as expert on Flux as you’ll need to be to do most querying and data tasks as long as you’re paired with an API reference. Not bad for under 1,000 words of prose and code.

Now, at the risk of getting a mob of SQL enthusiasts angry, let’s revisit the idea of learning Flux vs. using SQL when you already know SQL. Across the group of developers I know, there is huge variance in the level of SQL knowledge they have. Frequently, this is driven by how often they actually have to write SQL statements. In many cases, developers work with relational databases every day without writing any SQL at all because of ORMs like ActiveRecord. They only drop down to writing SQL once in a great while, which means their knowledge quickly atrophies.

For myself, I learned SQL back in 2001 and then again in 2008 and I can safely say that I’ve forgotten more about SQL than I currently know. Anything more complex than basic SELECT, INNER JOIN, WHERE and HAVING statements require looking up documentation and references. So the fact that I already “know” SQL is of little help to me when it comes to actually getting things done. It’s presumptuous of me, but I think that a vast number of programmers are in the same boat.

For more complex time series and analytics queries, you almost certainly need to look up WINDOW, stored programs and probably a few other things that I’m forgetting. Pair that up with the fact that SQL’s syntax doesn’t look like any other language and frequently looks like Yoda querying data and you’ve got something with a real learning curve even if you’ve already been introduced to it. Unless you’re writing queries on a regular basis, the syntax will probably require documentation lookups.

One of our goals with Flux is that should be readable and understandable even for newcomers to the language. We want developers to have as little cognitive load as possible when working with the language itself so they can spend their mental effort thinking about what they want to do with data. Making the language look like many other popular languages was a specific design goal. We’ve already been iterating out in the open, but we’re always interested in hearing more feedback.

We’ll be releasing early builds of Flux that are more suitable for everyone’s use in the next few months. We’ll include it in the open source release of InfluxDB 1.7, an alpha of InfluxDB 2.0, and a separate flux executable that can be run like the Ruby or Python interpreters. In the meantime, you can see Flux specific issues, the Flux language spec, more of our motivations in building Flux, or the Flux talk I gave at InfluxDays London in June.