InfluxDB Tech Tips – August 2016
Recorded: August 2016
In this session, Regan Kuchan from the InfluxData Support and Services team will walk you through the following key topics: GROUP BY time () queries, Continuous Queries, and a couple of tips on querying data from a non-DEFAULT retention policy base, querying data that occurs in the future, as well as acceptable boolean syntax use.
Watch the Webinar
Watch the webinar “InfluxDB Tech Tips – August 2016” by clicking on the download button on the right. This will open the recording.
Transcript +
Here is an unedited transcript of the webinar “InfluxDB Tech Tips – August 2016.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
• Regan Kuchan: Technical Writer, InfluxData
Regan Kuchan 00:03.002 Hello. My name is Regan and I’m on the support team here at InfluxData. I also write a weekly blog that Jimmy mentioned that covers some of the frequently asked questions that we get on the support team as well as some general tips and tricks for working with our database. And today I’m going to be going over some of the topics that I’ve covered in that blog. So I grouped topics into three main categories. The first category is group by time queries. The behavior of group by time queries can be a little bit weird the first time you experience them so I’ll be going into detail about them today and hopefully, I’ll get them to a point where they’ll work for you. The second topic that I’m going to cover is continuous queries. The syntax for continuous queries, how often they run, and then how they kind of work underneath it all are really common questions that we get on the support team so I’ll be going over that today. And then the last category is just a couple of tips that I wanted to share with you.
Regan Kuchan 01:00.581 So let’s start with group by time queries. Group by time queries are InfluxQL queries that group query results into distinct time buckets. Group by time queries include an InfluxQL function and a group by time clause. And why are they important? Why am I talking to you about them today? Well, group by time queries are great for analyzing time series data and they’re good for doing things like down sampling. Down sampling is just taking something like 10-second resolution data and then aggregating those data to the 1-hour resolution. So now that you know kind of what they are and why they’re important, let’s get into an example to kind of see how they work in practice. So I have a sample data set up here that shows the number of sunflower orders that I get at 15-minute intervals. But what I’m actually interested in is the average number of sunflower orders that I get per hour. And I can do this for all of my data with a single query and that query is a group by time query. So this query is selecting the average of the field sunflowers from the measurement flower orders and it’s covering the time range between 6:00 PM and 7:45 PM and it’s asking InfluxDB to group the results into 1-hour time buckets. And here are the results for that query. You can see that you have two averages. The first average is for that 6:00 PM time bucket and the second average is for the 7:00 PM time bucket. Those purple frames up there show you what points go into each average.
Regan Kuchan 02:31.180 So that was a pretty simple example but when you start playing around with time ranges with GROUP BY time queries you can start seeing some weird behaviors. So here are the same sample data that I had before and the same group by time query except this time I’ve shifted out the start of the time range in the WHERE clause by 15 minutes. So instead of starting at 6:00 PM it’s going to start at 6:15 PM. And here are the results for that query. And there are two kinds of weird things going on here the first time you look at this. The first one, which is probably the most obvious, is the first timestamp in your results. And you’ll see that that timestamp says 6:00 PM. And 6:00 PM actually occurs before the start of the time range that we specified in our WHERE clause. So that’s the first weird thing to notice. The second one is what points actually go into each average. So that first average is made up of three points and those points occur between 6:15 PM and 6:45 PM. And then the second average is made up of four points and those points occur between 7:00 PM and 7:45 PM. So if you’re like me when you first ran this query, you may have thought that every single mean would consist of four points. So the first one would have points from 6:15 all the way through 7:00 PM and the second one would have points from 7:15 all the way through 8:00 PM.
Regan Kuchan 03:57.696 So how can we explain what we’re seeing on this slide? InfluxDB has preset time buckets and by default, it automatically maintains those time buckets. So the 6:00 PM hour is a time bucket and then the 7:00 PM hour is a time bucket. So when I increased the WHERE time clause by 15 minutes InfluxDB will no longer include any data that occurred before 6:15 in that 6:00 PM time bucket but any point that makes it into that 6:00 PM time bucket has to occur in that 6:00 PM hour. And the same goes for the 7:00 PM time bucket. Any data that makes it into the 7:00 PM time bucket has to occur in the 7:00 PM hour. If you don’t like this default behavior, you can change it and you can change it by using an offset interval. So in this query, I’ve included an offset interval and that’s that bolded text in the group by time clause that says 15 minutes. And what the offset interval does is it shifts those preset time buckets that I was talking about in the previous slide. So here are the results for that query and the first thing to notice is that now the first timestamp in the results is for 6:15 instead of 6:00 PM. And then the points that go into each average—you have four points that go into each average so it’s a little bit weird when you first encounter it but once you’re aware of that behavior of group by time queries, you can pretty much make any group by time query work for you.
Regan Kuchan 05:25.437 So moving on to CONTINUOUS QUERIES. CONTINUOUS QUERIES are InfluxQL queries that run automatically and periodically within a database and they store the results of that query in a specified measurement. For those of you who have never used CONTINUOUS QUERIES, I will be going into more detail so bear with me. But first, why are they important? CONTINUOUS QUERIES are great for automatic down sampling. So instead of having to down sample your data by manually running the same query over and over again, you can execute a CONTINUOUS QUERY once and InfluxDB will take care of it for you from then on.
Regan Kuchan 06:06.347 So now that you know what they are and why they’re important, let’s get into the basic syntax. So this is a basic syntax for a continuous query. This is the minimum for what you would need to create one. And I have grouped the syntax into three main sections to try to make it a little bit easier to read. In that first section, you basically announce that you are creating a continuous query and you give that continuous query a name and then you tell it what database you’re going to be working with. The second section on the screen is where you specify the actual InfluxQL query that InfluxDB will be executing over and over again after you run this. And in that query, you specify how you want to aggregate your data; that’s the function stuff part. You specify where you want to write the results of the query; that’s the into clause. You specify what data you want to be working with; that’s the from part. And then finally you specify the group by time interval. And then the last section there is where you say, you’re done, yeah, with end.
Regan Kuchan 07:12.688 So now I’m going to walk through a basic continuous query example. So I’ve sampled it up here for my ice cream consumption at the minute level and what I’m actually interested in is my average hourly ice cream consumption. And I want to store that average hourly data in a separate measurement. So I can do this automatically, very easily with a continuous query and this is that continuous query. It’s called hourly ice cream and it’s on the database, food consumption. And what it’ll do is it’ll run the query that asked for the average of the field ice cream from the measurement minute intake and it asks InfluxDB to group the results by one-hour intervals and it will also write the results of that query into the measurement average hourly ice cream. So here’s an example of what that would end up looking like. And notice in the second code block, I am querying data from average hourly ice cream because the continuous query wrote the data there. To give you kind of a better idea of how continuous queries are working in the background to go from the minute resolution data to the hourly resolution data, I put two log entries on the slide to make it clearer. And then that first entry you can see that InfluxDB is executing the query at 7:00 PM. And when it executes the query, it runs a single query and that query covers a time range between 6:00 PM and 7:00 PM. The next time it runs the continuous query is at 8:00 PM, so an hour later. And when it runs the query at 8:00 PM, it runs a single query that covers the time range between 7:00 PM and 8:00 PM. So there are two main things to take away from this. The first is that with the basic CONTINUOUS QUERY, InfluxDB will execute the continuous query at the same interval as the group by time interval. So if you remember in the previous slide, the group by time interval was one hour so this continuous query is going to run every hour.
Regan Kuchan 09:19.750 The second thing to notice is that when InfluxDB does execute the continuous query, the continuous query is going to run one query and that query will have a time range between now and now minus the group by time interval. So if you go back and look at the logs, you can see that—let’s say now is 7:00 PM. So the query that’s run is between 7:00 PM and then now minus the group by time interval which is 7:00 PM minus one hour which is 6:00 PM. So these two things are actually configurable and you can configure them using the advanced syntax for continuous queries. So once again, I’ve broken the syntax into separate chunks again. The new part that I want you to focus on is that RESAMPLE clause in that second part. And the EVERY interval determines how often the CONTINUOUS QUERY will be run. So this setting overrides the default behavior of running the continuous query at the same interval as the group by time interval. And then the for interval determines the time range over which the continuous query will run queries. So you can use this RESAMPLE clause with both the EVERY interval and the for interval, just the EVERY interval, or just the for interval. To make it a little bit clearer, here’s the same continuous query but this time it has a RESAMPLE clause. And this RESAMPLE clause is telling InfluxDB to run the continuous query every two hours. And when it runs that continuous query, the CONTINUOUS QUERY will run queries for the past four hours. And because our GROUP BY time interval is one hour, that means every time the CONTINUOUS QUERY executes it will run four separate queries, one for each of the four past hours.
Regan Kuchan 11:09.159 So here are the logs once again, to show you how that actually works in practice. So you can see that InfluxDB executed the continuous query at 6:00 PM and when it did, it ran four separate queries, one query for each of the four past hours. Then you can see the InfluxDB executed the CONTINUOUS QUERY again at 8:00 PM, so two hours later, and it ran for four queries. So what to take away from the advanced syntax is that the every interval determines how often InfluxDB executes the continuous query. And the for interval determines the time range over which the continuous query runs queries. How you end up actually configuring your CONTINUOUS QUERIES if you’re using the basic syntax or the advanced syntax is all going to depend on your data. So now you’re aware of this and you can figure it out.
Regan Kuchan 12:02.206 So now I’m going to combine information kind of from the first part of this talk with the CONTINUOUS QUERY part of the talk. You may have noticed that CONTINUOUS QUERIES are actually just another form of group by time queries. Which means that you can use an offset interval with CONTINUOUS QUERIES. So in this CONTINUOUS QUERY, I’ve included an offset interval of 15 minutes, that’s the bold part. And what that’ll do is that’ll shift the execution time of the continuous query by 15 minutes and it’ll also shift the time window that’s covered by the continuous query. So in the lines below you can see that InfluxDB executes the CONTINUOUS QUERY at 4:15. Usually, if it was executing an hourly CONTINUOUS QUERY it would execute at 4:00. So this just shifts when it runs. And then you can see in the WHERE time clause it goes from 3:15 to 4:15. So this just lets you know that you can configure your continuous query’s time buckets.
Regan Kuchan 13:02.046 So we’re in the last part of my talk right now and I just have a couple quick tips for you. And the first one is how to query data from a non-default retention policy. So in this code block, you can see that I have a database called my DB. And my DB has two retention policies, autogen and Atlantis. And notice that autogen is the default retention policy for the database. And let’s say I’ve written a ton of data only to the Atlantis retention policy. If I go connect to the CLI and then I do a basic SELECT star from measurement name query, I’m not going to be able to see any of my data. And the reason for that is InfluxDB automatically queries the default retention policy. So if you want to query data that are in a non-default retention policy you have to fully qualify the measurement. And that’s what I’m doing in the second query. So I do SELECT star from and I specify the database, and then the retention policy, and then the measurement name, and then you can find all of the data that you wrote.
Regan Kuchan 14:02.093 My second quick tip is how to query data that occur in the future. So here I’m writing a single point with the timestamp April 16th, 2020 at 10:00 PM. Incidentally, this is when I will turn 30. So if I go and if I just do a basic SELECT star from party, I’m not going to get anything back. And the reason for this is because InfluxDB uses now as a upper bound on queries—as an upper bound on your time for queries. And now is your server’s local timestamp. So if you want to query data that occur after now you have to specify this explicitly in the WHERE clause. And that’s what I’m doing in the second query. So I’m asking for data WHERE time is less than now, plus 208 weeks from now. And that’s how you can figure out what to do on my 30th birthday. Celebrate. Yeah, all right.
Regan Kuchan 14:58.377 And my last tip for today is when to use what Boolean syntax where. And you can see in my code block that I have successfully written data to my database using an uppercase T and uppercase F as Booleans but when I go to query the data using the same uppercase T and uppercase F, I get nothing. But I can query the data if I’m using an all lowercase true or all lowercase false. And the reason this happens is because acceptable Boolean syntax differs across queries and writes. So in the table down there I just have what works where. And basically, if you’re using just a one letter be wary of it because it will not work in queries.
Regan Kuchan 15:42.913 So that’s it for me. If you’d like more information I recommend checking out the documentation. You can also read the blog that comes out every Thursday morning. And if you like longer form stuff, there are a bunch of awesome papers at the technical papers link, so. Thanks so much.
[applause]