Coming soon! Our webinar just ended. Check back soon to watch the video.
Webinar Date: 2018-12-13 08:00:00 (Pacific Time)
In this session you will learn how to tune your queries for performance plus strategies for effective schema design.
Watch the webinar “Optimizing Your TICK Stack” by filling out the form and clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “Optimizing Your TICK Stack.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Chris Churilo: Director Product Marketing, InfluxData
• Dave Patton: Director of Sales Engineering, InfluxData
Chris Churilo 00:00:02.683 All right. Three minutes after the hour, I promised we would get started. Good morning, good afternoon, everybody. My name is Chris Churilo. And I am your host today for today’s training on optimizing your TICK Stack. We have our Director of Sales Engineering, Dave Patton, who has a lot of experience with ensuring that you have a really great implementation of one or all four of our projects. And so with that, I’m going to let Dave take over and start the training.
Dave Patton 00:00:30.264 All right. Thanks, Chris. Good morning everybody. As she said, yeah. My name is Dave Patton. I’m a director of Sales Engineering here at Influx. And part of my day job is working with the sales teams to answer any technical questions. And a lot of that is making sure that people that are adopting Influx do things correctly, recognize and know some of the best practices. So we’ve put together some of those here and hopefully, we can learn some little pearls of wisdom today. So in terms of questions, if you guys have some questions, don’t wait till the end; just go ahead and raise your hand because probably by the end of it I’m going to forget what it was I was talking about anyway. So raise your hand at any time and jump on in.
Dave Patton 00:01:14.644 So with that, what we’re going to talk about today, the data model. We will talk about kind of what a cluster looks like, some of the different nuances and what we do call minefields of setting up a cluster, setting up a data model as well as what are some of the best ways to do some replication. And just some other little nuggets that we should get. All right. So for those of you that aren’t familiar – some of you might be with this – but Influx comes in kind of two flavors. We have the open source version, which is really just a single node. And then we have InfluxDB Enterprise, which providers clustering. It provides actual fine-grained authorization, whereas the open source just does authentication. And it has some additional backup and recovery modes.
Dave Patton 00:02:03.108 But one of the questions we get a lot is, what does a cluster look like? What kind of hardware, what do I need to do to build out a cluster? So this diagram here just gives you a general overview of what that cluster’s going to look like. And we have two different kinds of nodes. One is called the meta nodes and the other, the data nodes. And the meta nodes is really, it’s just a quorum, and they’re very lightweight. It just keeps track more or less of what shard lives on what node. To give you kind of a benchmark of what hardware you use. We also have InfluxDB Cloud, which is basically InfluxDB Enterprise that we run in AWS. And we use T2 micros for the meta nodes. So they’re very lightweight, like a gig of RAM, maybe of that. They hardly do anything. They’re just keeping consensus and what lives where.
Dave Patton 00:02:53.184 The other kind of nodes are the data nodes. And those are actually where your instance of Influx and/or Kapacitor will reside. And those could be beefy, or beefier, or beefiest depending upon kind of your volume. So again, to give you a comparison, we use the R4 series on AWS for InfluxDB Cloud. And that’s R4 large all the way up to extra large. Or even larger than that depending upon your volume. There is some debate about should you put Kapacitor with Influx on the same machine. I don’t think we have a definitive answer. I kind of lean towards the side of yes only because my background is in big data. And that whole concept of data locality was just beating into my head over the past few years that I’ve been in there. But there are some considerations you need to take into account about how to actually isolate resources. Because both Influx and Kapacitor will gladly use as much memory as they can get their grubby little paws on.
Dave Patton 00:03:57.640 So in order to isolate them, you may have to set up some sort of C group containerization to limit CPU and memory resources. A load balancer in front I think is probably a must. Unless you want to manually route request to individual nodes, just do a round robin. That could be Nginx, it could be HAProxy, it could be whatever you want. But all it’s going to do is just going to round robin the reads and the writes to the data nodes. And I think we’ll get into what a write looks like later on. And that is pretty much a cluster. So by far—another question we get is how many nodes does a cluster need? If we apply the 80/20 rule, by far, most of the clusters we set up are either two or four nodes. Although the meta nodes have three, you really don’t want to set up three nodes on the data nodes, or really any prime number. And because of the way the shard groups work and how shards are distributed, you really want the number of nodes to be a multiple of your replication factor. So if you have an RF of 2, you want two or four nodes or six. If you have an RF of 4, then you want four or eight nodes. So something along that lines.
Dave Patton 00:05:23.199 All right. So the meta nodes, and we talked about them a little bit. They keep track [inaudible] a consistent state. It is basically R implementation of the Raft protocol, is what they are running here. There is really never any reason to do anything other than three nodes. I suppose if you had a giant cluster, and by giant cluster, I mean we have not yet hit that point. You could go to five or seven. Or if you’re just really truly paranoid, you could do that. But virtually, every single cluster that we set up is just three nodes. And that works from like 99.99% of the use cases out there. So like I said they will keep track of the retention policies, the continuous query. They will do all the good housekeeping that we need to keep track of what lives where.
Dave Patton 00:06:15.386 And the data nodes. So the data nodes will actually hold all your data. You see here I mentioned that the number of data nodes should be divisible by the replication factor. So again, two nodes, four nodes, six nodes, definitely, prime numbers. Just remember that. One takeaway from here, prime numbers, bad. Don’t set up prime numbers. But the data nodes that we set that will actually be where the data lives. And with the new TSI feature, which we will get into, that will actually live on disk as well, which is the new indexing feature that we’ve created. The data nodes, they do not participate in any sort of consensus model. They do talk to each other, but they will definitely also talk to the meta nodes. So in terms of networking, make sure that the ports that all the nodes can talk to each other preferably when you set this up, they should all be on the same subnet.
Dave Patton 00:07:14.501 Do not split clusters across regions. I’m not even a big fan of splitting it across Availability Zones. I think we are starting to explore that and experiment with that option. But in any distributed system, the closer you can get them together and reducing the number of hops between nodes, generally the better. So try to keep them as close as possible, logically if you can. Don’t put them on the same rack because if that rack goes down, you’ve just lost all your nodes. So you may want to split them across racks. If you are using virtual machines, I’ve seen this happen, is someone will spin up three virtual machines and they think they’re covered. But what they don’t know is that those virtual machines are actually on the same physical host. So if that one physical host goes down, all three of those VMs have gone down. So make sure if you do VMs that they are actually spread across multiple physical hosts. Any questions, guys? If you do, like I said, chime in at any time.
Chris Churilo 00:08:19.844 Yeah. There is a question in the Q&A.
Dave Patton 00:08:23.932 I can’t see it because I’m in presentation. I mean, I’m sorry.
Chris Churilo 00:08:25.090 Sure. No problem. So [inaudible] asks, what’s the benefit that he gets by keeping meta nodes on different hosts?
Dave Patton 00:08:35.849 Well, the meta nodes, they should be on different hosts, only because it is a quorum. So what you can do is because you set up a quorum, you can withstand, what is it? The old math rule of N divided by 2 plus 1 node failures. So if you have three, you can lose one of the meta nodes and still remain in a consistent and up state. With two meta nodes, those since they do a leader election, it’s a potential to get in what you call a split-brain scenario. But generally, put them on different hosts because if you have a physical hardware failure, you still have two meta nodes that are running. And you can still stay up and then bring up a third meta node at the earliest opportunity. So does that answer your question?
Chris Churilo 00:09:24.002 Let’s see. Nope. But hang on. [inaudible] says—
Dave Patton 00:09:26.553 Okay [laughter].
Chris Churilo 00:09:27.250 —”Sorry, I meant not on data mode itself.”
Dave Patton 00:09:32.219 I’m sorry. I didn’t hear that, Chris.
Chris Churilo 00:09:33.661 Oh. He said, “Sorry. I meant not on data node itself.” What is the benefit by keeping data nodes on different hosts? I guess if I combine this [crosstalk]
Dave Patton 00:09:44.809 Oh, data nodes, not meta nodes. Okay. Well, more or less the same applies if one goes down. If I have a cluster of two nodes and they are in different hosts, if one of those hosts goes down, I could still remain operational. The way it writes work and I think we have a slide in this later. But if not, we’ll cover it now. When a write comes in, the write goes to one of the nodes and eventually is replicated to the other node. So if you have multiple nodes on multiple hosts and a host goes down, you are still in an operational state. When the other host comes back up, all the data that was written since that host went down will then be replicated back across to the other data node when it comes up so that you’ll remain eventually consistent. So if you had a larger number of nodes, putting them on physical host is going to help protect you and keep you into a highly available state.
Chris Churilo 00:10:47.953 Perfect. Right?
Dave Patton 00:10:48.942 Right. Sweet. On disk, I want to hit this point real fast. On disk the data’s organized. So below the parent directory which is /var/lib/influxdb/data. I’m sorry, influxdb/data. Below that, you will see things organized by the database name, the retention policy, and then the shard ID if you go digging down. If you were to now go into the shard ID, you would see two things. One would actually be the shard file which are TSM files. And there could be multiple ones of that because it does go through compactions. In addition, you will now also see a TSI folder. And that is where the index for that shard currently resides. And that is the new feature that we’ll get into when we talk about it a little bit here.
Dave Patton 00:11:39.923 So we talked a little bit about the hardware. I would say, at a minimum, the meta nodes we mentioned half a gig to a gig. It could be a hard drive of any size. It could be a spinning platter as well. Can be run in VMs or containers. One core I think is probably sufficient. Two cores maybe. In terms of the data nodes, I would say, eight, in a production environment. So again, Influx—you can spin up a cluster on your laptop if it’s something you just wanted to play around with. But in a production sense, I would say, at a minimum, you’re going to want eight cores, 64 gigs plus and the important part here is the disk IO. Depending upon your volume, our official recommendation is a single directed tasked SSD. That’s the party line. Now, given that—and all the folks I’ve talked to and I’ve talked to a lot of customers running a lot of different setups—to me, what the important point here is that you want to have at least 1,000 IOPS of write capability, read and write capability to and from your disk. So I have seen folks that are running ZFS. I have seen folks that are running RAID setups.
Dave Patton 00:12:54.221 The most interesting one to me is a customer, a very large customer is running RAID 6. And if you guys know anything about RAID, you will know that RAID 6 is generally not known for its performance. It’s from one of those RAID modes that’s very good at protecting your data, but it’s not really great for performance. I was a little surprised that they did that. But they said they’ve been running it for two years and have not had any problems. What you do not want to do is use NAS or really spinning platters in a production environment. On your laptop, totally fine. Because you’re not going to get 1,000 IOPS out of that. So ZFS is very interesting to us for a number of reasons. And if you’re not familiar with ZFS, you should be. It’s fantastic. It basically creates a pool of disks. And I think we’re doing some really good exploration onto that. It will give you a lot of speed. It will also give you a lot of volume, which kind of solves the problem of a single SSD not being enough volume to store your data over time.
Dave Patton 00:14:02.062 Network, I know we have 10 gig NIC. That is pretty much the standard now in a lot of production systems if you were to go buy a new rack server. 1 gig is probably fine, depending upon your volume, but 10 gigs seems to be in the standard. You could also do a 1 gig bonded or two 1 gig bonded to get a 2 gigs in that sense. But 10 gig, we’re seeing that more and more. It’s pretty hard to saturate network channels at this point. But I think of any of these, you will definitely be RAM-bound and potentially IO-bound depending upon your write volume. Those would be the first things that you’re going to be bound by. RAM is cheap. You can always have more RAM. Questions on hardware?
Chris Churilo 00:14:51.563 Nope.
Dave Patton 00:14:52.269 Good. Right on. All right. So there is no tuning, no OS tuning other than just common-sense principles for any kind of Linux. Open source will run on Linux; it will run on Windows. Specifically, we support Red Hat / CentOS / Fedora and Ubuntu / Debian. We actually, I believe, support Solus. We do have some customers running on Solus. Not a lot. And if you ask me a question about Solus, my answer’s going to be, “I don’t know.” In terms of the Enterprise, we use Linux. There is no Windows version that you can run for Enterprise. So Linux again, either Red Hat or Ubuntu by far are the two most common. However, we also do ARM. So you can run the open source version on ARM. So if you want to put it on like a Raspberry Pi, to do direct monitoring at the source for a lot of IoT scenarios, you can do that. And one of our guys, David Simmons, has done a lot of that. He’s built out a lot of IoT devices and actually put the entire TICK Stack on it, which is pretty cool. So unlike like a large Hadoop system, there is no specific OS tuning, I—
Chris Churilo 00:16:44.510 Hey, Dave. Can you hear us? We have lost your audio.
Dave Patton 00:16:50.561 Sorry, I hit my—accidentally, I hit my mute button. So not sure what the last thing you guys heard was.
Chris Churilo 00:16:56.275 We heard on just the 70% utilization common sense.
Dave Patton 00:17:02.659 Yeah. Versions of the OS that you can run on. You can’t run it on anything earlier than Red Hat 6 or Ubuntu 14. Only because there’s no Go runtime for a Red Hat 5 or a Ubuntu 12 or earlier. So the entire stack is written in Go, so you need that Go runtime. So generally, Ubuntu 16 and the Red Hat 7.x line are what you want to do. So there is no specific OS tuning. You don’t have to worry about yield limits or anything like that. Just stock system is fine. For open source, 70% utilization, you still want to leave some headroom. You don’t want to run a system that’s 80, 90% only because you’re going to need room for compactions. You want to do some peak capacity planning, things of that nature and backfilling data if you have it.
Dave Patton 00:17:58.628 If you’re running two—a cluster of two nodes, I would say don’t really run it over 50%. And the reason being, one, you still need that room for compactions and everything else that the open source is doing. But if you are running two nodes and one node fails, that other node will take up that volume. So if I am running both of them at 80%, 80 plus 80 is 160, that’s more resources than we have available. So the other node will more than likely go down following the first node that went down. So if you’re running two nodes, 50%, only because you want that failover, that cushion. If you’re running more than a two-node cluster, then you can probably push them a little higher because if any single node fails, the rest of them can take up the slack. You will see spikes, CP spikes that is probably the compaction process, so.
Dave Patton 00:18:57.853 Retention policies. So in a cluster, we have this concept of retention policies and shard duration. And the difference is this. The retention policy is how long the system will keep your data around before it kind of puts it out as exhaust onto the data room floor. The shard duration, which we’ll talk about here in a little bit, is how long a given shard is open for a write. And those two really work hand in hand. And we’ll see how they work. So the retention policies are unique per database, along with the measurement and the tag set. And together, they define what really was called a series. So a series has definite implications and an impact on what we call cardinality. Cardinality matters, so we say. There is somewhat of an upper bound. You can’t have infinite cardinality. And by that, I mean, you can’t have an infinite number of unique series in your database. That number is capped, but it’s a very big number at this point. So when you create a database, Influx will—it has what we call a default retention policy.
Dave Patton 00:20:12.236 So you can use that one, or you can create your own retention policy when you create a database. So the default obviously has a replication factor of 1 on OSS. I think it’s an RF of 2 on a cluster. And it’s an infinite duration. It means it will always keep your data, and the shard group duration is seven days. So the shard duration, like I said, is how long that shard is open for write. So if I have a seven-day retention policy and a one-day or 24-hour shard duration, what that means is that at any given time if I were to go look on disk, I would probably have, let’s say, I would have seven shard files, one of which is currently open that is receiving all my hot writes. So that would be a [inaudible] windows every day. My latest one would probably go away. A new shard file would be created and opened up for writes. And the one for yesterday would be closed and kind of archived. When it’s archived, it means it’s still queryable, but it’s not currently open for writes.
Dave Patton 00:21:17.540 So that’s how it’s going to really store things. So it stores it in shards and shard groups that does [inaudible]. And with the retention policy, I’m sorry, a shard duration, there is a Goldilocks principle to it. It is a tunable thing. So you don’t want to set it so low that you’re going to get a giant shard file. But you also don’t want to set it so high—I’m sorry, you don’t want to set it so low you get a gazillion shard files. And you don’t want to set it so low or high that you get one very large file. So in running InfluxDB Cloud, which we run is about 200 to 300 different clusters, we have kind of learned some lessons. And we’ve seen that you want to shoot for a shard file that is about 4 to 5 gigabytes on disk. So what that means is you might want to do some production tuning on that shard duration for a couple different things. One, to get that shard file sized to about 4 to 5, but also look at your general query time. So if you query for generally over a one-day period, having a shard duration that’s two times that means that on most of my queries, I’m only going to be reading from one shard. Which makes that query a lot more efficient. If I had to go run a query and had it open up a bunch of different shard files, that’s a lot of overhead. That’s a lot of extra work, and it’s going to make your query less-performing. Questions?
Chris Churilo 00:22:50.246 None right now.
Dave Patton 00:22:51.243 Okay. Yes, so you don’t want to set it so low again, that you’re going to have a gazillion files. You don’t want to set it so low that each file might only have one or two points. So a lot of this when you’re setting this up, look at your use case, look at your data, and look at your volume of data coming in, and make some determinations off of that. So if you have a very low volume, you might want to have a very high shard duration. Because we don’t want a shard that’s just got 10 points. That’s not very efficient. If you have a very high shard volume, you might want to tune that down a little bit because you’re going to fill up a shard pretty quickly. But again, also keep in mind how you’re going to query the data. So the hinted handoff queue. You shouldn’t have to ever interact with this directly. When we talked about a cluster, a write comes into a node, and then it’s going to be replicated to a different node. But what happens if that other node is down? So in that situation, what will happen is there’s this concept of the hinted handoff queue. So the writes will go into that queue, and they will be stored on disk. So when that other node comes back up, that hinted handoff queue will be drained out to the other node. That’s how it keeps itself in an eventually consistent state. So if you are monitoring your cluster, and you should be monitoring your cluster, one of the things that you will see if you’re watching a hinted handoff queue depth is one of the things you can watch.
Dave Patton 00:24:26.441 Having a sawtooth pattern on that graph is okay. What that means is that periodically, the other node is not accessible. And that could be a network hiccup. It could be that the other node is doing a big compaction. It could be that it’s running a big query, whatever. It’s just saying: “I can’t receive data right now.” But then that hinted handoff queue will drain out. So you’ll see that sawtooth pattern on the graph. What you don’t want to see is a hinted handoff queue depth chart that looks like a slope. It’s just continually going up. That is a really good indication that something is not right, and something is bad. What that means that this node cannot write to the other node, and the backup is just happening. And it’s going to keep filling up until something goes pop. So you should probably set a threshold, an alert on that or keep an eye on that. Again, that sawtooth pattern is fine. And interestingly enough, if you look at the CPU usage, that hinted handoff queue pattern will generally match up with the spike in CPU on the nodes. Which again, probably means that when it’s doing a big compaction or a big query, it’s like, “Hey, I need to take a timeout processing.” And then when it’s done, it can drain out that hinted handoff queue. So the hinted handoff queue is definitely something to keep an eye on in a clustered situation. And one of the newer features we have introduced is anti-entropy. And what that will do is help maintain that eventual consistency between the nodes and make sure that the shards stay in a consistent state.
Dave Patton 00:26:04.684 All right. Continuous queries and Kapacitor. So in the database, there is a feature called continuous queries. Excuse me. Let me have a sip of water. That was an early-on feature. And continuous queries are really meant to do kind of aggregations and rollups. They’re not tremendously efficient at large volumes. So if you are familiar with the TICK stack, we have introduced Kapacitor. And Kapacitor is the data processing engine. And Kapacitor was originally built to be more or less just for alerting, but it has since assumed a much bigger role in terms of all processing. So if you are running on your laptop and just want to play around, continuous queries are probably fine. If you are running in production, I would definitely push your aggregations and rollups and all your alerting to Kapacitor. Kapacitor’s going to be able to handle it a lot better. It is multithreaded whereas continuous queries are not. They are serial. Kapacitor can run them in parallel. So you’re going to get a lot more bang for your buck out of Kapacitor.
Dave Patton 00:27:16.366 You can also do a lot more because Kapacitor uses what’s called TICKscript, which is a functional-based query language. Whereas continuous queries use InfluxQL, which is a SQL-like language. So you’re going to be able to do a lot more in Kapacitor. So my general recommendation is, in production, run Kapacitor concurrent with the database and build out all your aggregations in there. So when we’re building a data model on a flow-through model, we want to keep in mind our ingest, we want to keep in mind our retention policies. We also want to keep in mind our aggregations and rollups. So if I am receiving 10-second interval data that might be my raw data from my systems monitoring, I might want to keep that for about three days and then roll that up to, let’s say, a minute using Kapacitor. Keep that, say, for I don’t know a month or two, whatever. Whatever works for your business needs. And then do another rollup to, say, an hour resolution and keep that indefinitely. So all of this goes along with setting up your data pipeline and your data model.
Chris Churilo 00:28:23.958 So Dave, can we just spend a little bit more time on that because we have a question from Eli. And he wants you to explain a little bit more. What are those advantages of Kapacitor versus using continuous query?
Dave Patton 00:28:36.594 Well, CQs are serial, Kapacitor’s parallel. Kapacitor is multithreaded. Kapacitor was actually specially built to do data processing. And I think as a company and a platform the direction we’re going is a specialization of concerns with the different pieces of the platform. We want the database to be good at doing what a database is supposed to do. And by that we mean receiving data, storing data, and serving data up very efficiently. We don’t want the database to process data. So that is really Kapacitor assuming a much bigger role in the stack. Some of you may have heard of IFQL, I think. Since Chris is in marketing. Chris, it’s going to get a much better name at some point in the near future [laughter]?
Chris Churilo 00:29:25.517 Yeah. Exactly.
Dave Patton 00:29:26.010 No [laughter]. So IFQL is the Influx Functional Query Language. And over the past few years—Paul Dix is our CTO and the really the father of Influx—and talking to customers, we’ve learned that time series data and working with time series data lends itself more towards a functional-based approach versus a SQL approach. There are a lot of things that we want to be able to do that we really can’t do in SQL, or to do it in a SQL language is very difficult. So IFQL is really a superset of TICKscript. And we’re going to come, like, I want to select a big bucket of data and then apply different filters to that data to, I don’t know. Which you can think of just like WHERE clauses in a SQL statement and then do something to that data set. So maybe kick and alert, calculate a mean, do something about, kick off some sort of predictive algorithm, something of that nature. So all of that falls in that bucket of data processing. And that is where Kapacitor, and next year a new daemon that is much more specially built for that, will assume that role. So again, CQs, they’re fine for play, they’re fine for proofing. Anything in production, I would definitely run it in Kapacitor.
Chris Churilo 00:30:51.170 So, Eli, hopefully that answered your original question as well as your subsequent question. Where you wanted to know just from the other perspective why you use CQ. And hopefully Dave’s answered that for you. And just let me know, Eli, if you need more clarification. And then Abhishek asks, he wants to know: “Do you have any recommended good practices to push certain metrics to some kind of external database like Cassandra for historical storage or—?”
Dave Patton 00:31:23.502 Why would you want to push it to [crosstalk]—?
Chris Churilo 00:31:25.155 Yeah, then he also says, “From InfluxDB”. So maybe, Abhishek, you can clarify that a little bit more and we’ll wait for your answer—we’ll continue in the meantime.
Dave Patton 00:31:37.900 Yeah. Well, I can make one general recommendation. I come from the big data world. I’m a huge fan of keeping my raw data indefinitely. Now, that doesn’t mean I need to keep it indexed in the database indefinitely. Because if you’re talking about metrics, the value of high resolution data decreases rapidly as my timescale from that data increases. So by that, I mean, two years from now, do I really care what my CPU usage was at a 1 second interval? Probably not. I’m more concerned with the pattern. But in the spirit of keeping my raw data, what you can do, remember we talked about using rollups to downsample and aggregate your data. So when you’re doing your raw sample, before that retention policy expires, and data just—poof—goes away. Why not just export it, and then take that data and store it to something long-term and cheap? Put it in Glacier, put it in S3, put it in HDFS if you want. And there’s a couple of reasons to do that. If you have all your raw data, you can always rebuild your system from source. Plus if you have your raw data, you can also do—later down the road—different kinds of analysis that you may not have originally thought you might want to do when you built the system in the first place. So having access to raw data, to me, has value. But there’s no reason to keep it in the database and keep it indexed. I can just export it and store it in a longer-term, colder storage. All right. So hopefully, I answered this question.
Dave Patton 00:33:21.337 So some other pearls of wisdom. If any of you guys have ever played, probably with Cassandra or TSDB or Hadoop. You know that probably the power and the peril of the systems is that there are about a thousand different knobs and dials that you can twist and turn to try to tune things out. We do not have a thousand. We have a very much smaller number. So there’s just a few things that you might want to tune out here. Out of the box, we are very tuned, very much towards the write side. We’re very write-efficient. If you want to tune things towards the read side, there’s a couple tuning parameters that you can do. Which is the ‘max-concurrent-queries’, ‘max-select-points’, ‘max-select-series’, and ‘max-select-buckets’. They already have pretty high values. Try not to set it to 0, which seems to be what everybody does. They won’t set it any higher. They’ll just set it to 0, and 0 means that it’s unlimited. You can just select an unlimited amount. That might sound great, but like I said, it is memory-hungry, and it will use gladly as much memories it can get its hands on.
Dave Patton 00:34:37.319 So try tuning those things. Just don’t turn it off to 0. When you’re doing a query, the first query—if we’re coming from the database background, the first query we want to run to test things is ‘SELECT *’ or ‘SELECT COUNT(*) from everything. Because it’s a very simple query to write. And it usually gives me some output, and I can make sure things are working. That is the worst query you can possibly give a Time Series Database. You should always include a time slice on every single query that you do. And by time slice, if you’ve ever seen a query in Influx, you’re just going to select whatever from my measurement where time equals now minus some arbitrary data. It could be minus 1 day, minus 1 week, minus 1 month, whatever. But always put that time slice on your query. In fact, we’ve debated putting a feature in where a query won’t even run unless you have a time slice on it. And that time slice could be NOW minus an entire year. It could be a fairly large time slice. But always put that time slice in. It is actually going to improve the efficiency of your query, but it’s just a good practice to get into.
Dave Patton 00:35:54.721 If you are backfilling data—by backfilling I mean, if I want to pump some historical data into the system—a couple things to do is that data should be in a chronologically ordered format. So if you were to give it just a list of historical data, having all those points just in a chronologically arbitrary order is pretty inefficient. Because the first point, it might have to open a shard file, do the write, do the reindexing, and then close it. And then the second point might be a completely different shard, and it’s got to do the same thing. So historical data is an expensive operation, to begin with. Don’t make it more expensive than it needs to be. If all that data is in chronological order, it could just open the one shard file, start pumping points into it in order, and then close it when it has to write to different shards. It’s a lot more efficient. Another tuning thing for historical data—and it’s a minor one, but it adds up—is your tags. If they are in alphabetical order, that will actually help. If they’re not, it’s not a huge deal. But the system’s going to put them into alphabetical order anyways. So those are few cycles that you could probably spare by just making sure you’re doing data cleansing before you put stuff then. So it’s a minor thing, but it will help on the aggregate.
Dave Patton 00:37:18.358 Fine-grained authorization. So previous to v0, I think it was the 1.3 line, we didn’t have fine-grained authorization. So people were using databases as that security container and trying to set up authorization by setting in multiple measurements or multiple databases. And that will eventually lead them into trouble. You don’t want too many databases. So for Fine-Grained authorization, use that instead of a database as your security container. So you can set up fine-grained authorization down to the series level. So that gives you a lot of stuff to work with there. In the database, there is the internal database. And that will monitor the Influx itself. Don’t use that in production. And there’s a couple reasons. If you’re running on your laptop or in a pre-product environment, great, use it. It’s very easy. That’s that _internal database if any of you’ve used it. Don’t use that in production. One, it puts a little bit of the load on the system. And those are resources that you could be using for other things.
Dave Patton 00:38:31.597 Number two, if you use production to monitor production, and production goes down, how do you know that production went down? So kind of like if a tree falls in the forest, how do you know it fell if no one can hear it? The best practice is to set up a separate OSS instance to monitor all your production clusters. And we follow this ourselves. So for InfluxDB Cloud, like I said, we’re running anywhere between 200 and 300 different clusters. We actually have a separate instance of OSS that we use to monitor all those other clusters. So if anything happens, that’s what kicks off our alerts, and we know about it. That is the best practice. So turn off in the internal database. And actually if you use Telegraf; put that in your data nodes and your meta nodes. There is an Influx—there’s actually an Influx and a Kapacitor input plugin for Telegraf. There’s also a Telegraf input plugin for Telegraf. Although I cannot, for the life of me, think of why you would need to use that. That’s a little too meta for me, but they have that as well. I think yeah, I’m sure something else will come up. But I think, those are the pearls of wisdom here.
Dave Patton 00:39:47.926 TSI. So this has been a big feature that has been in the works for quite some time. And if you were not aware, we just released 1.5 of both Enterprise and OSS. That came out last week. Last Tuesday, I think, it was. So for a little bit, just to clarify, it’s a little confusing. On the OSS line, we had 1.3, 1.4, 1.5. On the Enterprise line, it went from 1.3 straight to 1.5. There is no 1.4 Enterprise database. We just made the decision to skip that. So as part of 1.5, time series indexing is on by default now. It has gone GA. And what that is, is previous to TSI, the index was stored in memory and that—although it was very fast, it did eventually cause problems as we got into high production loads. So it put an arbitrary cap on, you’ve heard me mention the series cardinality, because every unique series has to be indexed, and again that index is stored in memory, so the more unique series, the higher my cardinality. The more things are stored in memory, the more memory I’m using. And eventually, I’m going to run out of memory. So I could OOM the box very easily.
Dave Patton 00:41:11.653 At startup time it also built that index because it was in memory. So if I had a high series cardinality, it could take a long time for my system to start up. So I think previous, we saw a series cardinality cap depending on your resources anywhere between 7 and 13 million unique series, let’s say. So we thought, “How do we fix this?” So we built TSI. And what TSI is—a gross simplification is that it’s a spill-to-disk feature of the indexing. So I mention that each shard file will have that TSI folder, that is the index for that shard. So what is now stored in memory is the index of indexes, so the index for each shard at pointer using the memory map files for the operating system is stored there, as well as the hot series. So if the last couple series for each different database retention policy is stored in memory, as that cools out, it’ll eventually be pruned out of memory. But if you need it, you can go to that index at TSI index on disk and bring it back.
Dave Patton 00:42:25.371 So with that—what the result is—is that startup times are virtually a lot faster, virtually instantaneous. And the cap on series cardinality, our goal is to get up to a billion. We have hit 700 million, but that was really just doing writes and just storing things. I think for production purposes, you’re probably safe in the high tens, low hundreds of millions of series cardinality. And again, this depends on resources that you have available to the box. But that cardinality cap has been lifted at least one or two orders of magnitude. So what that means is that it will not help you if you still try to do that query across all series, across all time, but where it very much will help you is if you have an ephemeral series. So think in terms of container monitoring. Generally, we have the container ID as one of the tags or an index value. Those are very ephemeral because when a container goes down, I’m not going to see that container ID again.
Dave Patton 00:43:34.444 So anytime you have a scenario where one of those tags or many of those tags are very ephemeral in nature, TSI will definitely help you. If all of your tags are not ephemeral and persistent and you still need to query across all those tags, you’re not going to get much bang for your buck out of TSIs as in the first scenario. Questions on TSI? Oh, well because I think I know one of those questions will be, let me cover it. You can go back and forth between in-memory and TSI. So we did release a tool. So if you have been using the system, you can run this tool and what it will do is read the shards and create TSI indexes for all those existing shards. Once you switch on TSI, that means any new shards that are created will be with TSI. If for whatever reason, you wanted to go back, all you have to do is just really erase all those TSI folders, switched the config back to in-mem and you’re good to go. So it’s pretty easy to get back and forth between the two. So, questions?
Chris Churilo 00:44:37.584 We have questions not about TSI, but Anthony asks: “Is there anything like fine-grained auth in open source to avoid the problems described in having many databases?”
Dave Patton 00:44:48.199 No. There is authentication in the open source, but fine-grained authorization is only in Enterprise.
Chris Churilo 00:44:56.988 And then, Abhishek added some more color to this original question about good practices pertaining to pushing certain metrics to some external database. So he was talking about custom metrics. “How many exceptions happen when a trade flow between different systems?” I’m not really sure I understand that. You know, Abhishek, why don’t we do this? Let’s maybe get you on the phone at the end, and we can get some clarifying—a little bit more understanding what you’re asking there? All right. Let’s keep going.
Dave Patton 00:45:31.603 Okay. Hey, Chris. Can you give me a time check or—?
Chris Churilo 00:45:35.457 Yeah. You’ve got about 10 minutes.
Dave Patton 00:45:38.135 Oh. Oh, jeez. Okay. Maybe I should hurry up, then. All right. Let’s talk about multi DC replication. This comes up a lot. And it’s something we can do. We don’t offer anything like GoldenGate replication if that’s something that you are used to or have used in the past. I probably wouldn’t try to do. But again, we’re talking about metrics. Capturing is by far the biggest use case we deal with. So trying to set up two data centers where they’re hot, hot is generally an inexpensive proposition. But we can keep them in an eventually consistent state. You can do a hot, warm. That’s pretty easy. But there’s two different kinds of data centers that we’re concerned with when we set up replication. One is that bucket of data that’s been ingested by new data. And that’s actually the easiest to deal with. So with that, you could use straight Telegraf. So have Telegraf write to each cluster. That’s probably the easiest and most straightforward to set up.
Dave Patton 00:46:40.245 If you need some more guarantees on that delivery, you can use an interstitial queue, something like Kafka, NATS, NATS Streaming, or MQTT, or something of that nature that’s going to give you that durable persistence and guarantee delivery eventually. So Telegraf could write to Kafka, and then you can pull off of the Kafka queue again with Telegraf and push it down to each of the clusters. So those are probably the two easiest patterns there. You could use subscriptions; however, they are not as efficient as the first two models. With subscriptions, you do need to be careful to not set up an infinite loop because that will, as we know, don’t cross the streams. The second set of data—we’ll talk about that real fast. I don’t think I have a slide, but it’s important—is what we call derived data. And that derived data could be output of a Kapacitor jobs. One of those aggregations or rollups that we talked about. It could also be the output of some of the ‘SELECT INTO’ query. So how do we keep in C? The Kapacitor job is easiest because if I set up the same job on both data centers, and I know I’m getting the same data in, therefore my aggregations and rollups will be the same on each cluster.
Dave Patton 00:48:00.229 So instead of transferring data, you could just set up the same job and make sure they’re in sync that way. The output of some of those jobs, for that, probably the only way to do that is with subscriptions, to make sure that their output goes to a subscription that is being replicated, or to have them do it manually, or as a last resort, you could use backup and restore features on the cluster. So all these methods we talked about are for keeping things in sync, in an operational state. They’re still also backup and restore which you should probably be doing anyways. If you’re running this on AWS, like we are with InfluxDB Cloud, we actually just use EBS Snapshots. And that’s just a lot easier because if the node goes down, I can just reattach my last snapshot. So that’s how we do backup and restore in InfluxDB Cloud.
Dave Patton 00:48:54.354 As a last pattern, we call this really the mothership pattern. When first we’re doing data center monitoring, instead of having kind of one cluster that is responsible for everything, I’m going to set up a different cluster in each of my data centers. And that cluster’s responsible for all the hosts in just that data center. All its queries, all its dashboards are very specifically geared towards the host-level monitoring in that data center. So if I had three data centers, I would set up three different clusters there, they’re going to first be identical to each other. Then we’re going to set up a higher-level cluster we call the mothership cluster. And its scope of concerns is really, and its dashboards and queries are concerned with, monitoring at the data center level. So each of those data center clusters could do its aggregations and rollups and send that to the mothership cluster as well as itself. But it’s also going to be sending its state to the mothership cluster. So that cluster will know, is a data center down, versus an individual host in a data center. It will also kind of give me a broader shallower view. We joke and we say you could put that dashboard in CEO mode that just has a lot of flashing lights and blinking things that your CEO thinks is, “Ooh. Uh. Look at that. It’s fantastic,” Versus having him look at an individual host and get a lot of detail, but it’s there if someone needs it. So that’s a very common pattern we’re seeing.
Dave Patton 00:50:29.221 So in terms of data, things to remember. Tags are indexed. Tags, if you need to ‘Group By’ when we talk about line protocol, that’s our native format. It is composed of a measurement name, a tag set, and a field set. Tags are meta data about our fields. So tags could be like my host name, or my data center name. Those are indexed, and this is what that index is built off of. Big tags can only be strings, and anything that I need to Group By in a query has to be a tag. Conversely, fields are what it is I’m measuring. So it might be like memory usage, CPU usage etc. Fields can be strings, ints, floats, or booleans. Fields are not indexed, and you can only apply functions to fields. So like a MIN or some sort of derivative, can only be done on a field. It can’t be done on a tag. And then a timestamp, and the timestamp is down to nanosecond precision. So if you send in a line, you can send it with the timestamp, and the database gladly use the timestamp you give it. If you send in a line that does not have a timestamp, the database will append the timestamp when it receives that data. So you can kind of see a scenario. Sometimes I want to know when something has happened at its source, and then in that case, I want to have a timestamp. Sometimes I might want data to know about it when it arrives at the database. And in that case, you may not want to put a timestamp and let the database append it.
Dave Patton 00:52:10.015 So some good schema design. Don’t encode data into the measurement name; this is important, especially when you’re coming from things like Graphite format. Graphite, TSDB, Datadog, they have a different data format that is ‘this.this.this.this’ equals some value. Conversely, line protocol, I can contain multiple values on the same line, so it’s not hierarchical. Line protocol’s not at all hierarchical. That Graphite format is very hierarchical. But it’s a lot flatter. But in my mind, it gives me a lot more flexibility because on the single line, I can contain a lot of different values. So you see here, we have a tag that’s a cpu.server-5.us-west. Don’t do that. It is much better to say CPU might be my measurement name, host is a tag, so the name of the host. And then region or data center us-west. And then we have that tag—I’m sorry, the field. And in this case, the field key is the value and the field values too. I think there’s a thing in here later, but try to use descriptive field names. To me, a field key that says value doesn’t mean anything. It doesn’t tell me anything. What is it I’m measuring? So a lot of plugins will send data kind of in that hierarchical format. So if it does, odds are there’s probably a Telegraf plugin that will break it up.
Dave Patton 00:53:46.333 If there is not, you’re going to have to do that yourself. It’s pretty easy, but again, try to break that hierarchical format into a flatter thing, and take advantage of the tags and then the fields. So here our field name is rx_packets, tx_bytes, rx_bytes. That is descriptive, that actually mean something to me. So don’t overload tags. So inside the field value again—so we said don’t encode things into the tag name. Same thing with the value. Don’t really try to encode things into the value. So if I had a value that is localhost.us-west, break it up into a host tag and a region tag. That’s just the better practice. In the end, it’s going to make it a lot easier to run a query. In the first example, there’s no way for me to group things by us-west or group things by region or data center. If I have a tag that is region or data center, I can absolutely group things that way. Don’t use the same name for a field or a tag. Hopefully, that should be just common sense. If it is, combine them, look at why am I doing this? Can I combine them? And should it be a field? Or should it be a tag? Not everything should be a tag. I want to put that out again. Don’t make everything a tag. It can be tempting to say, “Oh. You know, I want everything indexed.” Don’t do that. I would actually be ruthless about what is a tag, and what is not a tag. So don’t use too few tags [laughter], but don’t use too many tags. This I think—yeah.
Chris Churilo 00:55:30.820 Hey, Dave. I’m just going to interrupt you really quickly. So just want to let attendees know that we are recording the session. And I think we’re going to just continue to go and go a little bit long. If you have to drop off, feel free. We’ll send you a note later on letting you know where the recording is.
Dave Patton 00:55:47.479 I think I’m almost—I think there’s only maybe one slide left after this. I’m not sure this is actually still an issue, but try to write your data with the correct precision. As I mention the timestamp can go down to nanosecond, which is beyond like—nanosecond is beyond human comprehension. I mean, can you really see a nanosecond? No. So if you’re writing in data, getting it down to the second level is fine. If your system is writing data, the general timestamp that comes out is fine, but just try to maintain a general consistency level, so.
Dave Patton 00:56:30.774 We talked about this, not creating too many logical containers. So don’t create too many databases. We had one customer that had 1,500 databases at one point, and that was bad. Every database creates a lot of database overhead on the system, and with 1,500 databases, they didn’t have a lot of room left to really do much of anything. I would say this says dozens, hundreds are okay. I would probably lower that down. I would say, try to keep the number of databases under like 50. There’s really no reason to have more than 50 databases. If you really need to, you might want to look at setting up a different cluster.
Dave Patton 00:57:12.325 Oh, and last write wins. So we are not a relational system. There is no concept of updates in Influx. Just get it in your mind, and it’s going to help you in the long run. Just think of it as an append-only system. Just start thinking in those terms. It is possible to overwrite a point. It isn’t easy, but it is possible. So by the last write wins, what that means is that if I were to send in the exact same line with the exact same measurement, the exact same tag set, the same field keys, and the only thing that changed was the field value, and it had the exact same timestamp, that will overwrite the point. If the field key or anything in the tags is different, it will write it as a new point in a new series. So the last writes—so if you had two things that were sending in identical lines, the last one that comes in is going to write. So you’re not going to get duplicate points. I think that’s the salient point here is that you can’t duplicate data in the database. It’s virtually impossible. If it had a different timestamp though, it would treat it as a different point at a different time. So keep that in mind.
Dave Patton 00:58:29.604 And I guess, the stack. So you’d use Telegraf. It is awesome. It is very easy. It is a Swiss Army knife. It is fantastic. It is easy to work with. It’s pretty much bulletproof. I’ve really yet to meet somebody that Telegraf broke down on them. Telegraf, if you’re not familiar, is our agent. There are 140 different plugins for things to collect data, process data, and output data. It is fantastic, and I highly recommend you should use it.
Dave Patton 00:59:01.374 Chronograf and Kapacitor. So we talked about Kapacitor, but Chronograf is our UI. The thing I’m going to say about it is that I know probably 80% of our customers are using Grafana for visualization. So Chronograf is fantastic. I think a lot of people that use it really enjoy it, but it is also our management UI. So it’s not an either-or thing between Chronograf and Grafana. If you’re using Grafana for visualization, odds are you are still going to use Chronograf for management. So it is— and I think once you start using it for a lot of visualization, I think you’ll really enjoy it. So don’t just rule out Chronograf because it’s going to make management of your cluster a lot easier. And that’s it. That’s all I got.
Chris Churilo 00:59:46.198 Awesome. Okay. So let’s just go through the Q&A. Anonymous said: “Thanks. Good stuff. That’s awesome” Okay. Abhishek, do you want to maybe give a little bit more detail about your questions so you can get that answered? Okay. Good. He’s going to give us a little bit more details. So and if anyone else has any other questions, please feel free to throw them into the chat or the Q&A. As I mentioned earlier, I will be posting—after I do an edit of this, I will post it and will send an email so you have the link. And the link is easily found if you just go to our Resources page under Trainings, you’ll see all of our trainings. But we’ll get that posted for you guys by the end of today. So Abhishek, just go ahead and post a little more details. Or if you want me to unmute you, just let me know. That might be easier. So we’ll just wait for him to give us a little bit—okay. I’m going to unmute him. Hang on. Here we go. All right. You have permission to speak.
Abishek 01:00:59.983 Hey. Am I audible?
Dave Patton 01:01:02.414 Yep.
Abishek 01:01:03.359 Hey. Okay. First of all, yeah, thanks for the nice presentation. So basically what I’m doing currently with Influx is I’m sending some custom metrics like user login and all. Okay. So now what I want to measure is that write as we [inaudible] continues to deliver features. So I want to then measure [inaudible] IDs like new users are getting logged in or something. I know so over the period of time, I would say historically, I want you to know say for example, for the last let’s, say, one and a half year, let’s say, deliver these many features and what is the user response? It is like gone down or gone up. So my question is, so can I store this kind of custom metrics historically in Influx or I need to push to some other database from Influx because, yeah?
Dave Patton 01:02:03.731 Yeah. Well, I think you could do both. But anything that has a time base component is a good candidate data set to be stored in Influx. So if—I want to keep saying that we’re not a relational system, but if it has that time base component, absolutely. Put it in Influx. But it doesn’t preclude you from also putting that into something like Cassandra or SQL Server where you might want to do some relational analysis on that data. So they can work together. And it’s something we kind of pride ourselves on. One is our ease of use. But two, we want to be good citizens and play well in the sandbox with other things. So with that, we try to make it very easy to get data in and out of the system. So if you have a different kind of analysis. If you want to put in Tableau, Tableau doesn’t connect to us. You have to put that into relational system. So if you wanted to do that, by all means, put it in something else. It’s not an either-or proposition. It’s your data. But if it has that time-based component, absolutely put it into Influx.
Abishek 01:03:15.276 Yeah. Currently, I’m putting in Influx. And I’m using the retention period as like one year. Okay. I mean, of course, I don’t have yet the one-year data. But my cluster setup has one-year retention period. So I mean, other question is, after one year, what will happen to those data? Whether I can still read those data?
Dave Patton 01:03:36.786 Well, if that’s your retention policy, after a year, that data will, poof, go away.
Abishek 01:03:43.703 Yeah. I mean, so exactly. So that was kind of my concern. And then yeah, because I want to do some historical analysis also with data.
Dave Patton 01:03:54.481 Yeah. So you hear me talking about there’s the backroom store, but there is also an export and an import. So what you can do before that retention policy expires, excuse me, is do an export of that data. And the difference is, if you do an export, the resulting file will be raw line protocol. Like you could open it up in a text editor and look at it. If you do a backup, the resulting file is actually—it’s almost like a TSM file. It’s like binary. It’s not very human-readable. It’s smaller because it’s compressed, whereas the export file will be pretty big. But you could do an export, and then take that data, and then put it into Cassandra or whatever.
Abishek 01:04:37.043 Understand.
Dave Patton 01:04:38.056 Or store it in Glacier S3, do whatever you want to do with it.
Abishek 01:04:42.332 Understand. Thank you for the answer.
Dave Patton 01:04:45.870 Yeah, of course. All right. Chris?
Chris Churilo 01:04:51.968 All right. So if you have any other questions, I think we’re just down to a couple people. I just want to remind everybody I will do an edit and then post this recording. But if you do have any other questions after today’s session, feel free to post them on our community site. And Dave goes there to take a look at the questions that are there, and will get those answered. Alternatively, if you send me an email, then I’ll post it there. And if I can answer it, I’ll answer it in our community site as well. Looks like we have another question. All right. Dylan, nice long one. Okay [laughter]. So Dylan asks: “One of your Kapacitor sites mentions, ‘Use batch where guaranteed processing is required.'” Which resonates with the problem that he’s been seeing with streaming scripts hosting to remote Influx databases. “If for some reason, network, etc. Kapacitor can’t connect to the database, I missed the aggregation, do batch queries work better during these conditions; i.e. is there a way to automatically handle this and backfill when the database is available?”
Dave Patton 01:05:52.716 No. Not inherently. So the difference between stream and batch is, it’s going to be one how they operate, but also, what’s the use case. So a stream—every point that comes in—When I set up a stream in TICK script, what that does is it sets up a subscription on the database. And what that means is that any point that comes into the database, if it matches as subscription will be teed off to Kapacitor for processing concurrent before it even hits disk on the database. So on a streaming TICK script, you’ll notice you generally have it with a window node. And what that means is for that window node, let’s say it was 10 minutes. That means that for 10 minutes, every point that comes in is going to be stored in memory on Kapacitor. So again, what that means is if Kapacitor goes down during that 10-minute window, all those points for the past whatever minutes are gone also.
Dave Patton 01:06:51.698 A batch TICK script, what it will do is you’ll notice when you write a batch script, there’s a query. It will actually run that query against the database, get the points, the result sets for that query, and then apply its functions. So whereas that streaming, it will wait 10 minutes to get all the points it needs to and then apply the functions. So you can do the same thing. It’s just how it receives the data and then what it’s going to do. So generally for aggregations and rollups, I would probably use a batch. For alerts, I would use a streaming TICK script. So a batch puts a little bit more load on the database. A streaming TICK script puts a little bit more load on the memory usage of Kapacitor. If I have a batch script and the database, because down, it should rerun that when you bring it up. I think—so there’s a timing on the TICK script even when it’s a batch. So if you had it every 10 minutes, and it was in the middle of doing it, and it went down, I think you would have to wait another 10 minutes for it to rerun the script. I don’t think it can keep state of its last run. So hopefully, that answers your question, Dylan.
Chris Churilo 01:08:17.651 So, Dylan, I also allowed you to talk if you want to—oh, he just left. So [inaudible] must have done. Okay [laughter].
Dave Patton 01:08:24.123 Either that or I just offended him.
Chris Churilo 01:08:25.708 Oh. No, no, no. What I will do is I will send him a note.
Dylan 01:08:27.676 Well, I’m—
Chris Churilo 01:08:28.431 Oh. There he is.
Dylan 01:08:28.966 I’m back.
Dave Patton 01:08:29.542 All right.
Dylan 01:08:30.999 Sorry. For some reason, I logged out or something. No, that was great. I was writing a follow-up question actually to that, which is that what I’m doing is I’m posting my data directly into Kapacitor, and then using Kapacitor to send data to various Influx databases depending on where it is. And I’m wondering if that’s not really a recommended pattern.
Dave Patton 01:08:55.672 If you’re processing the data or if you have—I wouldn’t use Kapacitor in that capacity—no pun intended—to just do a pass-through. I think Telegraf would be much more efficient. If you, let’s say that data’s coming in and you had to do like a lookup, like a GOIP lookup or some sort of decoration or processing of that data before it hit the database, that is a really good case, and that is absolutely our pattern. But if it’s just a pass-through, I would probably look at using Telegraf for that.
S4 01:09:25.166 Okay. Great.
Dave Patton 01:09:26.714 Because in Telegraf, you can have multiple outputs. So you can still write to more than one InfluxDB cluster.
S4 01:09:34.223 Okay. Great. Yeah. I’ll probably give that a try.
Dave Patton 01:09:43.867 Chris.
Chris Churilo 01:09:46.208 Is there any other questions? We’ll just wait one more second here. Like I’ve mentioned, I’ll post this training. And if you have other questions, please post them to the community site. If you send me an email, I’ll post it there as well. And yeah, look for Dave at a number of the events. He’s got, I think, he’s going to two Devopsdays in Seattle and Vancouver coming up. And you can just check our website influxdata.com/events for a number of events that our developers and SE are attending. So with that, thanks for joining, everybody.
Dave Patton 01:10:20.837 Yeah. Thank you. Thanks, guys.
Chris Churilo 01:10:22.709 And I hope you guys have an awesome time playing with InfluxData.
Dave Patton 01:10:26.680 Take care.
Chris Churilo 01:10:27.630 Bye.
Dylan 01:10:28.052 Thanks.
David Simmons is the IoT Developer Evangelist at InfluxData, helping developers around the globe manage the streams of data that their devices produce. He is passionate about IoT and helped to develop the very first IoT Developer Platform before "IoT" was even ‘a thing.’ David has held numerous technical evangelist roles at companies such as DragonFly IOT, Riverbed Technologies, and Sun.
Track and graph your Aerospike node statistics as well as statistics for all of the configured namespaces.
Knowing how well your webserver is handling your traffic helps you build great experiences for your users. Collect server statistics to maintain exceptional performance.
Collect and graph performance metrics from the MON and OSD nodes in a Ceph storage cluster.
Use the Dovecot stats protocol to collect and graph metrics on configured domains.
Easily monitor and track key web server performance metrics from any running HAProxy instance.
Gather metrics about the running Kubernetes pods and containers for a single host.
Collect and act on a set of Mesos statistics and metrics that enable you to monitor resource usage and detect abnormal situations early.
Gather and graph metrics from this simple and lightweight messaging protocol ideal for IoT devices.
Gather phusion passenger stats to securely operate web apps, microservices & APIs with outstanding reliability, performance and control.
The Prometheus plugin gathers metrics from any webpage exposing metrics with Prometheus format.
Monitor the status of the puppet server – the success or failure of actual puppet runs on the end nodes themselves.