InfluxDB Roadmap: What's New and What's Coming
Session date: Jun 21, 2022 08:00am (Pacific Time)
The engineering teams have been busy working on a number of cool features that will make your InfluxDB apps better than ever. In this session, Balaji Palani, Senior Director of Product Management, will review the details behind some work that will boost query performance, a new way to act on data before it is stored, updates to Telegraf, support for Cloud-native ingest, an update on IOx and much much more! You don’t want to miss this! This is a 1-hour event featuring a technical demo and Q&A.
Watch the Webinar
Watch the webinar “InfluxDB Roadmap: What’s New and What’s Coming” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “InfluxDB Roadmap: What’s New and What’s Coming”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
- Caitlin Croft: Sr. Manager, Customer and Community Marketing, InfluxData
- Balaji Palani: Director, Product Management, InfluxData
Caitlin Croft 00:00:00.269 Hello, everyone, and welcome to today’s webinar. My name is Caitlin, and I’m joined today by Balaji, who will be providing an InfluxDB roadmap update. So please feel free to pose any questions you may have for Balaji in the Q&A or in the chat. We will be monitoring both. And just want to remind everyone to please be respectful and courteous to all attendees and speakers. We want to make sure this is a fun and happy place for the entire community. Without further ado, I will hand things off to Balaji.
Balaji Palani 00:00:37.917 All right, hi, everyone. Good morning. Hopefully, you guys are doing great. I’m coming to you here live from Bay Area, California, and here to talk about InfluxDB. So if you’re not familiar with me, my name is Balaji. I head the product team at InfluxData. And who are we - we are the makers of time series database platform, especially built for developers who like to build cool applications. We believe that, fundamentally, things are going to change with [inaudible] observed and monitored and so on. And InfluxDB is an [inaudible] fantastic choice or platform for saving, storing time-series data, doing acting on them, analyzing them, and so on. I have a lot of content for today. So in case I don’t get into it too deep or you have questions later on, feel free to hit me up on my email, the [email protected]. Or as Caitlin mentioned, we also have - I’m active on the community Slack, so hit me up on Slack as well. Without further ado, let me launch into this, right? We really want to share what we recently shipped in InfluxDB. There’s a number of things we did. We’ve been shipping away at a lot of features capability adding to our platform, so I want to share about that. And also, we can take a look at the upcoming next few months, what’s coming as part of IOx, our next-gen storage engines. So stay tuned for that later on.
Balaji Palani 00:02:26.154 Okay. So a little bit of brief history because I think it’s good to set the context. In about 2019, I joined InfluxData around 2018. Since I joined, my main focus was to announce, which we did. In May 2019, announced the availability of Influx. We started collecting our first production customers starting in September 2019. And what we did was we launched the open source and cloud with a similar kind of single API. Similar API, we’d say some of the API cannot be available in cloud. But mainly, the idea was you start developing an open source and then move to cloud if you want to run production workloads or you need high availability, you need durability, and so on. With Cloud, we launched a little bit different, right? So in 1.X, we had a cloud with single tenant. In 2.0, what we did was we used Kubernetes as the back point for putting all of our services on top. [inaudible] launched a multi-tenant cloud, meaning a single cluster can host multiple workloads. We built it in a very secure manner. Both we are encrypting data at rest and in transit. It’s highly durable. For example, all incoming data is - there’s an outer band service which actually backs up into S3, the line protocol data. The intention is, even if something goes wrong, we can replay all your data and make it available, really, So that data availability was high on our mind. It’s also highly available both in terms of being replicated onto multiple copies but also in terms of people because it’s monitored 24/7 by our respective engineering teams who built those services like true DevOps fashion. So really, we launched with that.
Balaji Palani 00:06:16.089 And we introduced four vectors, right? So amount of data ingest, how long you’re storing the data, how many query you’re running, and how much data you’re ejecting out. All of that made the subscription model or the usage model really work for us. And if you wanted to have some longer-term planning, you know what your usage looks like, you can talk to our sales team and convert to an annual contract spend as well. And since then, we didn’t stop with that, right? So since then, we have gone from a single cloud provider to three cloud providers, AWS, TCP, and Azure. We are now running in 14 regions, some of them including single-tenant clusters privately for individual customers. Because they are large enough, they wanted a separate region. We also launched marketplace billing in the sense that you can go now to a cloud provider marketplace, subscribe to our software from there, and really have InfluxDB usage as a charge on your bill. You don’t have to worry about budgeting. It’s all part of the same cloud provider bill. And we are now completely getting into all the cloud providers, including - you can talk to us for private offers, which is nothing but an equivalent of annual contract but from marketplaces.
Balaji Palani 00:07:37.862 We’re also doing daily deploys. We are having a lot of robust pipeline of features being delivered. And I’m going to chat a little bit about what those are. But kind of a brief history of where we are. So before we talk about the features, let’s talk about how we deploy these features into cloud, right? So it all starts at the top. You have code and config being merged in domain. And when that happens, there’s a series of things that happen. Our continuous integrated, or CI pipeline, takes them and deploys them to the staging environments and one on each of the cloud providers. This is because, even though it’s all the same, it’s similar, there are some nuances, right? We use AKS, GKE, or EKS community services individual for the cloud providers. So you might find differences. We’re going to make sure that all of those are tested. And then once it goes from staging, automated testing happens. It gets applied to tools. It’s our internal production cluster. Tools is nothing but - we also use it internally for observability or things like - every engineering team pushes metrics into tools. So we’re really, really pushing the boundaries of InfluxDB into tools. We created as a production cluster, high cardinality use cases, something new, feature flagging and so on. All of those are available on tool. So we get to really roll out and look at them, do some UAT testing, and then it gets pushed into the production process, which I talked about earlier. There were 14 different regions that goes on to AWS, TCP, and Azure.
Balaji Palani 00:09:20.189 So anyway, this is kind of the CI pipeline. Continuously, we are able to deliver and deploy software every day, sometimes couple times a day. So how do we decide about which features get into this, right? So I think, from that perspective, I wanted to talk about who and what are we focused on. We are focused on you. If you’re a developer and you’re building some cool real-time applications requiring - either you store time-series data, query them, or you analyze them, transform them in real-time, or you require to alert on them, notify them, look for certain patterns - you are the core persona for us. And there are a number of things we are really focused on. Develop happiness. You can think about this as usability. We really have a world-class design team which are working really, really close with the engineering, right, from when we say, “Hey, we want to develop a particular feature, right, capability.” We really are trying to bring up human-centered kind of design and development. What does it mean?
Balaji Palani 00:10:23.535 Human-centered means we know we want to cater to mistakes. We want to recover from them. We want to provide further incentives in the tool that can really make things happen. It just makes it super easy to build on the platform. “Time to awesome.” This has been our mantra from the beginning, right? What does time to awesome mean? We really want to get you there fast enough. We know that we are a tool, and we don’t want to be getting in the way. We want you to get from where you are to where you want to go so that you can focus on really building your applications fast. Scale reliability and availability, this is super essential because once you start building a platform, you want to grow. So we want to grow with you. And as you grow from maybe 1 task to 20,000 tasks, we want to offer you things that make sense at scale. If tasks go down, we want to have some monitoring information that can retry task if they go down automatically without even you having to do them manually and so on. So all of these are feature and front and center for us. And I’m going to talk about how each of these things, what’s new gets into each of these pillars, right? So for example, in developer happiness, one of the first thing you want to talk about is Flux, right? So this would be an incomplete presentation without really talking why Flux, right?
Balaji Palani 00:11:49.889 So in 1.X, we had InfluxQL. And InfluxQL was quite simple, really easy to do SQL, so really fast to get started on. But there was some kind of hindrances if you want to do more than query language, right? So for example, your incoming data is coming from a variety of different sources. You want to transform them. You want to analyze them. For example, we built a continuous query language. And then, finally, we also built the text scripts back in 1.2. What that really told us was we are really going for beyond a query language, something which can script, something which can analyze and act, allow developers to really do that. And for example, these are all the things you could do, right? On the query side, we can not only query time-series data from within our platform but also from a variety of other platforms like SQL, right? So if you have metadata information from Postgres or BigQuery or some other areas, you can bring them on into InfluxDB using Flux. And then you can join them. You can filter, aggregate, also do join. All of these are functions. So you can think about, “Oh, yeah, I’m just calling a function. I’m passing parameters, and I’m looking for the output which can then pass into another one.” So really, construct of Flux is to make sure it’s very flexible and allows you to really do the thing you want to do as a developer. Not only that - and analyze. You can see it’s from simple math across measurements but also things like anomaly detection, more complex things like correlation, statistical functions, down-sampling. We have customers who are using some of these functions to do predictive analysis and predictive maintenance and so on.
Balaji Palani 00:13:40.570 You can also act. So it’s not only, “Okay, you’re analyzing.” But you’re also taking those things, and you’re writing if-then conditional sort of logic. You create checks. Checks are nothing but monitors. You look for certain patterns, and then you notify. You alert on them. You can notify to different endpoints like PagerDuty, Slack, some of the other things, simple SNS, and things like that, send mail, send grid. But also, you can output it. You can push through completely different HTTPS source sync or even MQTT. You can feed it back to your MQTT sync or broker in case you want to set up some complex pipelines. So Flux can really do more than query, and that’s the power of Flux. So it might seem like it has a bit of a learning curve if you want to use the full power, but you can pace yourself. And what you will find is it would really be worth it once you start learning the basics of Flux. You can understand the concept. And once you understand it, using it would be simple as calling different functions, passing it, and then outputting and so on. We really feel like Flux is a differentiator for us, and we continuously add new functions.
Balaji Palani 00:15:06.509 What else? From a developer happiness perspective, we know the developers love to live in their IDE. VS Code is one of the most popular IDEs. We added a VS Code Flux extension that not only allows you to connect into cloud, browse through your buckets or even your tasks, your scripts, through all of the objects that’s available in cloud, but also, it has language server processing. If you’re running into errors, you’re doing some simple syntax checks, it will give you where is that error. Once you fix it, it’ll also give you some basic table visualization or even things you can really, really develop using Flux, store it as a script. And then, once you log into InfluxDB UI, you’ll see all of those changes in real time. So the idea is you don’t have to go to a different tool or an IDE to really start developing your applications. You live and you code in the IDE of your choice. Other things that we launched, recently, we introduced templates. What’s a template? You can think about all of the objects within InfluxDB, dashboards, tasks, buckets, labels. Everything can be exported as a JSON or a YAML concept. That’s what the template uses. We use it to kind of - you can take all of your configurations from one environment, push it to a different environment, different account, different organization. You can also use it to import things into open source.
Balaji Palani 00:16:48.188 We have a community template repo where you can download templates that have been released from community. For example, we use templates to deliver stuff for monitoring, usage tracking. We added usage tracking template. Or monitoring your tasks, we had operational monitoring templates. So you can really easily install that template, get those resources within your application, and just use them. We also have secrets. So secrets are useful because, I mean, Influx is just one tool, right? So you might have to connect and do, for example, AWS [inaudible] or PagerDuty or some other things to import data from. So you might be having API key secrets. So you can deploy or you can create a secret and use them within Flux or within your UI, within notebooks, within different places. So you’re developing in a very secure manner. In case you’re looking for help, we recently added a help button on the left side which allows not only for you to go check out a documentation or InfluxDBU courses but also has links to status page to see if there is an incident or an outage impacting your region. And for paid customers, we also offer the ability to contact support right from within the app. You can create different things. And I believe it doesn’t allow attachments yet, but that’s something that we are working on to enable in the future as well. So you can attach logs and so on. But this is super cool because, really, you don’t have to go to another support site, especially if you’re a paid customer. You can just simply submit a ticket, and Support will get back to you.
Balaji Palani 00:20:24.982 Other than decoupling, now, all of a sudden, I can have a couple of engineers work on Flux create these as API endpoints. And now it can enlighten the rest of the company to really start using these API. Or it really allows you to separate Flux and get the output of that and give all of your developers access to that data, democratization of data. You can do it in a very secure manner. Instead of asking all your developers to develop against the V2 query endpoint, you can block it or you cannot provide APIs’ key for that except to a specific set of developers. And to the rest, you can really give them specific API endpoints with the API keys. And you can also rapidly connect to other things like NodeJS - or sorry, Node-RED or other external tools to really do interoperability. So this has a number of things. We’re also deepening our - this is currently only available in the API. But what we’re also doing, you will see upcoming a - we’re really working to make sure scripts can fire off your tasks. So what that will allow you to do is reusability, right? So you can create all of your tasks. Or alerting could go off on a singular or a setup scripts, which is common. And all you’re doing is passing parameters. So very powerful feature but very under advertised, in my opinion. And this is something that we are really, really promoting to all of our developers to start using invokable scripts.
Balaji Palani 00:22:08.919 This is an example of an invokable script. You can see that the bottom portion is the Flux query from bucket, and that’s nothing but a parameter. You’re defining my bucket as a parameter that can take in anything. So you can basically send in my bucket, trigraph, or anything when you’re calling or invoking that API. So this is a very simple way to just, as I mentioned previously, decouple your back-end logic and just pass parameters and get the output. The output would still be CSV or kind of table format. But that’s something that you can - there are a lot of things available out there that can allow you to convert that into JSON or appropriate format that you actually want to use within your application. So really, really powerful example of how you can use invokable script.
Balaji Palani 00:23:05.501 Other things, we know that Telegraf is a very popular kind of download, an open source element. For those of you who are not familiar, Telegraf is an agent that can be installed in specific locations. It’s really, really small memory footprint. So you can really put it at the edge, collect the data, and send it into Influx. It has for 250, 300 plugins for input and output. One of the output plugins is InfluxDB Cloud so you can send it to InfluxDB or open source or cloud or enterprise and so on. It’s a very popular tool that everybody uses. So with this, what we did was we added the ability to create and store and manage your Telegraf config within cloud. So you can go to your cloud instance, click on Telegraf, and then search for any of the plugins that’s available, select that, configure it right within the UI, save it, and then use that configuration on all of your Telegraf. Again, not saying this may not be possible. You may already be using Ansible or some of those automation tools to really deploy Telegraf. This is not to replace but more as a convenience factor in case you are looking to store everything in a central location and want to use the Telegraf config. You can do that.
Balaji Palani 00:24:30.905 We also added Starlark. So Starlark is a Python-like language. It allows a little bit of pre-processing before you send data into Influx, right? So Influx, you can also do processing. Let’s suppose you’re looking at certain field and you want to add a tag or another field. So Starlark allows you to openly express that in a Python-like kind of format, and Telegraf will process it before it actually send it into a cloud or open source and things. So very popular as well. Something to check out. This is third bucket, scale, reliability, and availability. So let’s see what we have. So we also added - so apart from Flux - it’s definitely a query and scripted language support in the UI, API, and everything - we also added for version 1.X compatibility InfluxQL API endpoints, right? So basically, as a compatibility effort - it’s not available in the UI. But hey, you want to bring forth your 1.X dashboards or things like that? You could do that, right? So you can still run InfluxQL as long as you go through that 1.X package compatibility. So what this shows you is, since then, we’ve had a number of performance improvements here we made both on the Flux and InfluxQL side. We added group caching for InfluxQL. We added, for example, parallelization for Flux. When you use two, we didn’t use to parallelize them. Right now, we do. We added parallelization for aggregate window functions. So there’s a lot of improvements that have happened over the past few months in really doing query performance, improving query performance.
Balaji Palani 00:26:22.835 I mean - this is not new. But I wanted to mention that, other than Telegraf, you can also use open source at the Edge because Edge is becoming smarter and you have things like autonomous cars or boats, marina, things like that, where you’re really having a large footprint. You can install open source. You can do a lot of the processing at the edge. But with what we recently released and launched, you can also send - or you can replicate data from your Edge into cloud. We are super excited about this feature. And the reason is it starts unlocking a couple of new use cases, right? For example, open source at Edge has been something that’s not new, right? So you can use that. You use to really process the high-granularity data at the edge and so on. My apologies, I have something going on in the background. But what you can also do is replicate either the complete data or, if not that, down-sample data into cloud, maybe for long-term storage. Maybe you want to analyze different edges differently. Maybe you want to build a predictive model on cloud. So really, really exciting set of use cases that gets unlocked because of this.
Balaji Palani 00:27:41.957 So what is this? How do you set it up? EDR or Edge replication is nothing but a durable view that is set up at a bucket level. So what happens is, on the open source side, you can actually define remote configurations and say, “Hey, this is my cloud host token and so on.” Once you do that, you can set up every - at a bucket level, you can say, “Hey, I want to set up a replication stream for specific buckets and going into the remote endpoint or remote configuration going to a specific bucket.” You can do one is to one. So you can send one bucket in Edge into one bucket in cloud. And by the way, you can also do Edge-to-Edge, or you can do multiple edges into a single-cloud bucket. Or you can have a single bucket on the Edge to go to multiple cloud. And perhaps you have a primary and a secondary cloud account where you want to set up a DR situation. You want to do it, right? You can do that, too.
Balaji Palani 00:28:47.138 What’s the benefit of the queue approach, right? So let’s say something happens and your connection breaks down. The queue starts. And basically, all of the incoming data will get deposited into the queue. And once the connection is reestablished, this flushes into cloud. So really, really powerful use cases, especially in terms of when you have network connectivity issues. For example, I know we work with several satellite launching companies. So satellites, when they go on the Earth, there’s only few kind of time intervals where the connector is established, where they can flush all the queue into cloud. But otherwise, it has a local open source which uses for transforming and analyzing telemetry on that satellite itself and then taking actions based on that, right? So you can still do that. But you can also flush into account, and this Edge Data Replication really makes that possible. This is a very recently released feature, and we are really excited about working with you guys. If you have other use cases or if you want to enhance how this works, happy to really connect and work on this.
Balaji Palani 00:30:03.339 Last but not least, we also have been consistently updating our enterprise 1.X. For example, we added the 2.0 API compatibility. What this ensures is you start developing with the 2.0 client libraries. And when it’s time to move to cloud, you can do that as well. This also aids into - you’re not ready with enterprise 2 yet, but once it’s ready, you can also easily migrate to 2 without really having to change out of your front-end logic. We have added Flux library to enterprise. We are doing a lot of regular updates to it to make sure that what we build on cloud or Flux is being introduced to enterprise as well. We have done improvements to backup and restore. For example, we have introduced a recent tool that can estimate the size of the backup. You can only backup metadata. You can do incremental or full backups and so on and so forth. We added migration tooling to help migrate workloads into cloud. This migration tooling currently is support assistant. It’s automated. It’s built on top of Kubernetes. You provide a set of configurations. You take the TSM and upload it to an S3 location, and it just does it, right? So it just keeps on migrating in the back-end. Once it’s completed, it will let us know, and then support can let you know. The idea is once this is - really, we are working also to enable self-serve migration. So if not, it will be available when we are able to iron out the issues and so on. Right now, it is a support-assisted thing.
Balaji Palani 00:31:42.603 We also added a number of performance and security updates, things like configurable password hashing, improved performance by making compact full write [inaudible] applied both TSM and TSI. So a lot of enterprise 1.X enhancement improvements still happening. But the main enterprise 2 is coming, and I’ll talk about it when I’m talking about IOx. So let’s transition to what’s coming. And again, I have split into these three, time-to-awesome, developer happiness, and IOx. And let’s see what’s coming under trying-to-awesome. So with Data Explorer that’s currently out there, we’ve had a number of positive feedback from users. It really helps. Especially if you’re new to InfluxDB, you don’t have to learn Flux. You use the Data Explorer and that works. But there are a number of limitations, though. For example, as a user or developer, you’re not able to see the schema or the shape of the data in the bucket before you start. Or there [inaudible] example, the data set is large or high cardinality, it will run in the background, but you wouldn’t understand, “Hey, what’s it doing?” And then, at the end of it, you’ll see a time on. So really, we focus on query experience, making it really easy to get started quickly. For example, we are adding a new schema browser. You can select the bucket. You can select any buckets, and quickly, the schema will come back. And you can look at all the tags. You can look at all the fields. And with a single click, you can start to see the Flux. You don’t have to create the Flux. It’ll create it for you, especially for the range, for filters, and so on. And you can see that Flux code right in front of the eyes.
Balaji Palani 00:33:40.332 You can see the results panel, and those are interactable. So you can add group by. You can add specific things. So we’re really taking the data explorer that we built, and we’re taking it to the next level. That’s all the [inaudible] about. This is imminent. We’re at least going to have an MEP launch pretty soon, and you will see that this will get exposed to all of the cloud. And eventually, it will find its way into open source and enterprise, too, as well. Cloud-native ingest. So Telegraf is great, right? But it requires installing an agent somewhere in the middle, right? So we’ve talked to a number of users who said, “Hey, we are serverless. We don’t have any server footprint. What do we do?” So really, this is to help aid those kind of things where you can’t install Telegraf or you don’t want to because it’s a cloud service. And then you can use our cloud data ingest. We are focused on first enabling for MQTT, but we built this architecture in such a way. We are using Apache NiFi which is awesome in terms of scale and performance. And really, right now, we are focusing on - if you have data into an MQTT broker, you can connect into it and then ingest that data from that, right? So this is also pretty imminent. Hopefully, by the end of this month or early next, we will release this, and you should be able to connect to your broker, configure it. We allow you to look at a specific topic. We allow you to transform that. Before you ingest into line protocol, you can do some decent amount of regex and so on and then ingest into our cloud. Super excited about it. Once MQTT is done, we can rapidly enable for Kafka cloud for other sorts of things because we know that there’s data setting on all of these cloud services that we will just love for you to ingest into cloud.
Balaji Palani 00:35:42.056 Multi-org is another exciting thing. Again, multi-org, basically, the concept here is that you have an account, but for whatever reasons, you want to basically - sometimes, you’re building your platform for your customers, right? So you might want to have a bucket, a set of cache, a set of alerts for every customer that you have onboard. You have several use cases like that. So this multi-org really enables all of that, right? So you have a single account. You have multiple organizations that you create, and this could be located in specific regions. You don’t have to have everything in a single region. You can have for example in US East one. I can have another organization in US West, too on AWS. Maybe another one in Sydney and so on. So what this allows you to do is have a single account, have different orgs, and manage those orgs. You can have still usage for org at an account level. You can have a single bill for all of your org usage and so on. Super excited about it. I mean, we have been working on this. We enabled multi-account first. Now this is next. And what’s coming? User roles is also coming because we saw that, hey, once we enable organizations, now you can add users to it. Right now, every user is an owner and admin with full privileges. So this is really designed for high-trust teams. But we are really working to enable other roles, things like account admin, things like billing admin, or maybe simple viewer. You don’t have permissions except to view data. You can’t edit or anything but just get access to dashboards in a view-only format or something like that. This is also coming pretty soon. We are nearly there with the multi-org. And once that is done, we’ll start rolling out one by one out-of-the-box roles. And what that allows us to do is then transition into more customized, full role-based access control which is on a very minute level, this bucket, that resource, that alert or task, create a custom role, and add users to that.
Balaji Palani 00:37:58.098 So again, all of these are coming. I know this has been the most asked-for feature. This is something that we are working on, and we want to really make it the best. One thing about developer happiness, this is not coming next but later. We have worked out the design. The concept here is that we have tasks that fire off on a schedule. But sometimes, you need something which is a trigger base, things like, “Hey, as soon as the data comes in, before you get into storage, I want to fire up a notification to really save time.” It’s called a streaming use case. So this is something that will be done with triggers. Also, sometimes, you have data arriving late, weeks or months or days afterwards, and you might need to go back and reprocess all of those down-sampling things. You can do it today, but you have to set up a batch processing outside and so on. It’s a little bit of a pain. So it triggers once the incoming data - you can say hey, “Watch for any data that’s older than X. And if it comes, then run this script.” That reruns all of the tasks or runs a particular task or something like that. So you can put in the exact kind of the late-arriving logic that you want to put into a script and have that trigger trigger that. So this is something we do believe that, once we enable IOx and so on, we want to focus on. This is not next but a little bit later.
Balaji Palani 00:39:29.451 This is super important, and this is why it has a separate section. This goes into the scale. We’re really, really working on the next-gen storage engine. You’ve heard about it, IOx. And the reason we wanted to build IOx is - two or three times, right? So one, we know that cardinality is a problem at a larger scale. It’s not a problem at small scale. So if you want cardinality, just talk to us. We’ll give you whatever cardinal you need. But really, fundamentally, unlimited cardinality, things like you want to enable distributed tracing that every transaction, every trace has an ID that you want to track, you can’t really do that without blowing up your cardinality. So I IOx is really built to handle that, right? So the way it’s been done is we have taken a lot of the high-performance, high-scale Parquet, which is on disk - really great format to store on disk. And you can store it on S3 and so on. We use Apache Arrow for storing in-memory. Again, this is a columnist store but in memory. So kind of a storage view there. We also have the DataFusion which is the query engine, which also has an in-built SQL. So all of these, we are really top contributors to the open source project. But in addition to that, we’re adding Flux because Flux, we don’t want to - there are some specific advantages to Flux. So we want to get that into that, also in FluxQL. All of these will be enabled in cloud first. We are really launching IOx and cloud so that we are making it available as the default storage engine for everybody.
Balaji Palani 00:41:17.367 Once that is done, that’s kind of the next - that’s kind of the focus. Once that is done, that opens up the window for us to take what we build for cloud and really start creating the enterprise to which is built on IOx, which has all these scale and functions and everything. So really, this opens up the window towards, “Hey, how do we get enterprise 2 and launch?” Which is kind of the most common ask for a lot of customers as well. I talked about supporting SQL. It supports Flux. But we’re also working such that Flux can take advantage of IOx. For example, you can call IOx specifically and say - Flux actually provides data unpivoted. But IOx does it in pivoted format. So there are some specific advantages to both. We really want to expose all of those specific advantages. So not only you get unpivoted data, but you can take advantage of the pivoted data as well. Another thing IOx does is it has a catalog. So we want to leverage that catalog for the query experience you saw earlier, make the schema browsing really, really fast. It’s like microseconds. You will see everything. So currently, we have to read, extract all the data and then show you. But with IOx, it’s going to be really brief. It already supports Influx. Well, again, we are landing in the cloud. We will have this. All of the users take advantage of that once that is done. I think this will also open up the window towards enterprise.
Balaji Palani 00:42:57.508 Bulk import, export. Since we are using Apache Parquet as a file format, we really can now move data between different instances, making it hopefully super trivial between open source instances or enterprise or cloud. Offline backups will be easier. So this will also allow us to do self-serve and say, “Hey, back up my bucket.” And this also allows us to really get to the tiered storage. So you can actually say, “Hey, anything beyond this, stored in S3. Between this and this, store it locally, where it will be in memory and in S3, and so on.” So there’s a number of things that will enable for us as well. Again, the goal is to unlock more time-series use cases, higher cardinal key data, things like events, tracing, logs. And we really think that, pretty soon, shortly in the shorter timeframe, we’ll launch a cloud. And once that is available, then the rest of the things - it just opens up that window towards that. So that’s about it. That was my last slide on IOx. As next steps, I really want you guys to get started. This sounds exciting. Want to learn more? There’s a lot of self-service content available, documentation available, university courses available. Or you can just sign up for Cloud, go to influxdata.com, just click on InfluxDB. You can also go through marketplaces. It’s been a pleasure. Hopefully, this was useful. I’ll open up for questions.
Caitlin Croft 00:44:39.399 Awesome. Thank you, Balaji. Wow. Our team has been really busy. [laughter] So I know you talked briefly about Edge data replication. Just wanted to let everyone know that we have a webinar on just Edge data replication next Tuesday. So I’ve dropped the registration link into the Zoom chat. So please register for it. It’s a free webinar. It’ll be great. Sam Dillard, he’s their product manager for Edge, and he’ll be presenting on that. All right, a couple of questions here. Does Flux provide built-in functions for interpretation, or can I write my own EDF for multiple kinds of interpretation?
Balaji Palani 00:45:23.350 Great question. It’s something that’s not listed here. But what we are actually working on is Flux packages or modules. What this allows you to do is create your own user-defined function. And first of all, it won’t be public. So you can create that module shared within your org or within your account. But eventually, the public - let’s say something you think is useful. You want to push it out to a lot of users to take advantage of that. That will also be something possible in the future. But right now, we’re only working on creating those first MEP of modules which allow you to create UDF. But yeah, you can write your own thing. Even today, you can do that. It’s just harder to share so that what you built can be leveraged by other users within your company.
Caitlin Croft 00:46:16.420 Cool. Do you envision InfluxDB to serve as a database of choice for IIoT applications with high data flow rates, such as hundreds in a millisecond?
Balaji Palani 00:46:29.595 I did not get the question there. Sorry.
Caitlin Croft 00:46:32.282 So I think they’re just asking, “Do you see InfluxDB being the best database of choice for collecting industrial IoT data where you’re dealing with hundreds of metrics per second or even a millisecond?”
Balaji Palani 00:46:48.451 Yes. So the short answer is yes. The detailed answer is we’re - I mean, right now, it is pretty scalable. Even we have specific users on cloud really running at millions of data points per second. But even if you want to push the boundaries even beyond that, we believe that IOx should help us get there. But again, it all depends upon - I mean, we have a number of customers even today who are using it to monitor that industrial IoT. I mean, we have public references from PTC who really are an OEM partner to us. They use leverage InfluxDB underneath. So a number of industrial IoT customers were really pushing InfluxDB to the limits. So yes.
Caitlin Croft 00:47:41.885 Awesome. Is it possible for Flux queries to merge data from the cloud and Edge buckets?
Balaji Palani 00:47:50.738 Technically, yes, it is possible. So Flux has things like 2, which does take in remote endpoints. Or you can also do HTTP function which can actually post to a remote endpoint. So there are a number of things. Flux is just really - again, what is the optimal way or efficient way, I think, will depend upon the use cases. I’m happy to help if you want to just send it my way using community Slack or just email me. But yes, the short answer is, you should be able to take data from Edge, send it to cloud using replication. Or you can use tasks to really create data and then combine them, join them together.
Caitlin Croft 00:48:38.609 Correct. Okay, someone is asking, “Is there more documented information on the recent Edge cloud production enhancement?” So there was a press release. And obviously, there’s the webinar next week. I’m sure you guys are working on more documentation.
Balaji Palani 00:49:06.864 Yes. So both Open Source. So the Edge Data Replication was launched with Open Source 2.2. Also, I believe there’s 2.3 in the works. So if you go to docs at influxdata.com, go to Open Source, you search for application. There’s a ton of content available. And like Caitlin mentioned, we’ll have Sam deliver more content for you guys.
Caitlin Croft 00:49:25.629 Perfect. We’ll stay on the line here just a minute or a couple more minutes, just in case anyone has any further questions. Balaji, that was great. I think it was great that you kind of talked about how to upgrade if you’re using InfluxDB 1.X, how to upgrade and move all your data to InfluxDB 2.0 because there’s just so much functionality there.
Balaji Palani 00:49:49.533 Oh, yeah, there’s so much functionality. And the easiest way to do it is - I really encourage everybody to sign up for cloud. Super easy. It won’t even take a minute. Try it out. And once you’re ready to move - or even if you want to stay on open source or something, open source should have all that can help you get there, so.
Caitlin Croft 00:50:14.394 In your experience working with customers who have gone through the upgrade process, what are some tips and tricks that you wish that more community members knew?
Balaji Palani 00:50:25.849 One of the fundamental things is upgrades. Again, you’re going from a completely different kind of 1.X where you may be having tick scripts, capacitor to cloud, which is a task you can have Flux tasks and so on. It really depends upon how you’re using 1.X. If you don’t have all of those things, it’s super straightforward. Just migrate the data, which, by the way, I mentioned here, which [inaudible] support, and they’ll help you migrate. Or if it’s not a very big database, you can bring all of your data through front-end, the line protocol, push it into cloud. But if you have terabytes of data and you’re afraid to migrate them, talk to us, open up a support ticket. We’ll help you migrate that. And this is all done in an automated fashion, doesn’t go through the front-end, but it goes through Kafka directly. But the main thing is, if you’re using tick scripts, capacitor, those kind of things, continuous queries, I think you have to have a plan on transitioning those into Flux. You can use capacitor with 2. You might have to send dual-right data into your cloud and capacitor. And then capacitors take scripts and continue to work just like you’re doing right now.
Caitlin Croft 00:51:59.981 Right. Do you see InfluxDB’s clustering capability eventually being offered in the open source version?
Balaji Palani 00:52:09.158 Actually, this is a great question for Paul. I think the next biggest thing for us is IOx, and we do have open source IOx also. So I don’t know about clustering specifically, and I don’t want to promise or provide any messaging here.
Caitlin Croft 00:52:30.458 Totally fair. Yeah, I thought that would be an interesting question to answer. [laughter] Yeah, definitely, people who are curious about clustering on that. Definitely, keep an eye out as we move forward with InfluxDB IOx. I think that’s the safest answer we can give right now. Cool. Well, thank you, everyone, for joining today’s webinar. I hope to see everyone on next week’s webinar with Sam. I posted it in the chat again. Once again, the session has been recorded and will be made available later today in addition to Balaji’s slides. Thank you, everyone.
Balaji Palani 00:53:10.555 Thanks, everyone.
VP, Product Marketing, InfluxData
Balaji Palani is InfluxData’s Vice President of Product Marketing, overseeing the company’s product messaging and technical marketing. Balaji has an extensive background in monitoring, observability and developer technologies and bringing them to market. He previously served as InfluxData’s Senior Director of Product Management, and before that, held Product Management and Engineering positions at BMC, HP, and Mercury.