How ntop Built a Web-Based Traffic Analysis & Flow Collection with InfluxDB
Coming soon! Our webinar just ended. Check back soon to watch the video.
Webinar Date: 2018-05-29 08:00:00 (Pacific Time)
ntopng is the next-generation version of ntop, a network traffic probe that monitors network usage. It provides an intuitive, encrypted web user interface for the exploration of real-time and historical traffic information. In this webinar, Luca Deri, Founder of ntop, will share how ntopng works. He will share how they use InfluxDB to collect real-time network traffic monitoring metrics and events to support their ntopng platform. He will share how they made the switch from RRD to InfluxDB, what their data architecture looks like, as well as share some challenges and successes using InfluxDB at ntop.
Watch the Webinar
Watch the webinar “How ntop Built Their High-Speed Web-Based Traffic Analysis and Flow Collection with InfluxDB” by filling out the form and clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “How ntop Built Their High-Speed Web-Based Traffic Analysis and Flow Collection with InfluxDB”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Chris Churilo: Director Product Marketing, InfluxData
• Luca Deri: Founder, ntop
Chris Churilo 00:00:01.728 All right. It’s three minutes after the hour, so we will get started here. Good morning, everybody. Good afternoon. My name is Chris Churilo, and I work at InfluxData, and today I’m really excited to have Luca Deri from ntop, who’s going to be sharing with us what his SaaS solution, some of the capabilities of the products, as well as how they built it using InfluxDB for their Time Series Database. Just a couple of housekeeping items. Just a reminder. I am recording this session, and after I do a quick edit, I will post it back on the website so you could take another listen to it. You can use the same link that you used for the registration. Alternatively, an automated email will be sent to you tomorrow with the link, so you’ll be able to get that tomorrow as well. And if you have any questions, feel free to post them in the Q and A or the chat panel in your Zoom app. And with that, we’ll get started and have it over to Luca. Thanks Luca.
Luca Deri 00:00:58.606 Hi. Hi. Good morning. Thank you, Chris. So today I’m going to talk about ntopng, and what ntopng is about, and what is our expanse with Influx. Let’s first start with an overview of ntopng, and then jump into the time series program. So in the first part, I want to explain to you what is the idea, what is the product that we are trying to tackle with ntopng, and what is the history, and what are the ideas that we have put into the application, and how we moved to Influx. For a very long time, I am in this business. So in the business of network traffic monitoring. And in 1998, when I was back to Italy as an assistant at university, I have been asked to monitor network traffic. And at that time, it was not pretty possible to the PC. Okay, network viability frame is from today’s network and there were many different types of protocols, ethernet was not so important at that time. No WiFi. Nothing. And then in that age in 1998, I started to build the ntop, the original ntop. And I coded it as an open source software. And today, ntopng. So the successor of ntop, it is an open source application. And since then, we have created not just the ntopng but also many other tools that have improved traffic monitoring tools such ntopng.
Luca Deri 00:02:34.715 So for instance, we have worked in the past in high-speed packet processing in packet inspection. That means that we are going to analyze the packet content to understand what is a certain communication flow about. So let’s say if we’re going to Facebook, or if we are connecting to another website, or if we are doing home banking. And then as we are open source developers, we are also contributing with our code to do other projects and products. So for instance, IDS and IPS such as Bro and Suricata are the types of applications that we have accelerated in the past. And also our tools being open source are also being embedded in various products. These are just a couple of examples. And so for instance the Cisco 100 was the first product by Cisco that they integrated their tools, our software also embedded in more hardware devices as many other open source software. So what are the goals? In essence, we produce open source software. But in addition to that, we have also some proprietary applications that allow us to live because open source is nice but sometimes it doesn’t pay our bills. But in any case, we still believe that it is important to spread the software. So even though we have some commercial software or tools, they are free for education and research, so this is very important.
Luca Deri 00:03:55.661 Our goal is to increase visibility, to give people the ability to analyze network traffic in a simple way. So therefore you should look at us as source of data. Okay. So we have a visualization platform, but we need time series. You would hear about that. But in essence, our tools are approved, a sensor that can produce data. And especially today, this world has changed it back a bit since 1998 when I’ve started. At that time, it was important to speak a certain language, let’s say SNMP, NetFlow, these type of things. Today, it doesn’t really matter where the data is coming from. It is important if you can manipulate it. And with this respect, the format today is more or less JSON, okay? And whenever you have it translated, the information, from a specific format to JSON, then you can manipulate it easily, and this is an example. In essence, so we need to move from value type of formats in binary base to a string base format, and therefore based on that, you can do whatever you want. And ntopng tries to complement existing information with information that is coming from network traffic.
Luca Deri 00:05:09.475 Our approach is to able to do everything using, in essence, more [inaudible]. So you go to a shop, you buy a PC—okay, a generic PC—you put that software, it works more or less like Influx. So we don’t want to use, let’s say, hardware specific because we don’t to be vendor locked. We don’t want to do anything in proprietary because our goal is to make sure that the same software has been applied on a huge cluster of machines. This is very important for us. The performance will be different but the idea, the software would be more or less the same. And also, this allows us to be open. Because for us, open source does not mean that we disclose you the code. This is one part of the story. The main part is that we give you data in a format that you can reuse, okay. So for us, we have open source application that generates the data in a proprietary format, it’s not our goal. Our goal is to produce data and to integrate this data with other type of data coming from other application. This is the main goal that we are trying to show. And it is important to do that in commodity hardware, so that you could pick a software, rank them on your machine. And we want to stay at the software level, we don’t want to do any hardware. So we are software company, in essence.
Luca Deri 00:06:29.701 And the motivation for doing traffic monitoring is very simple. So if you want to improve, if you want to produce a better product or deliver a better product, in essence, you need to monitor yourself and to make sure that you are meeting the expectation of your customer, of your users. So therefore, like any other market, we are trying to monitor ourselves, we are trying to monitor the network traffic, and we’re trying to extract from the traffic the information people are interested in essence to report about what is happening. And what is happening on the network is very important. First of all, because we need to make sure that we have control of the activities of our network. Because many times people—let’s say, operating network in a sub-optimal way because they are happy if the network kind of works but they don’t pay too much attention to problems that are not probably big problems, but would be worth to analyze and to tackle. So therefore, it is important for us not just to understand if the network is running in a healthy way, but also to use the tools to improve and to understand whether everything is in good shape or whether there are something that can be improved. So this is very important. And in order to do that, we need correlation. So we need to be open. And that’s why I stress the point of open data, it’s actually important as open source.
Luca Deri 00:08:02.519 The first thing that every network administrator knows is that packets never lie. So in essence, the packet containing information, okay, that it’s a ground truth. But unfortunately, they are not what we want. First of all, because there are too many packets on the network. We cannot store all of them. And also because we need to interpret the packets. So we need to correlate them in a way so that people can understand that. Because our users don’t understand low-level information, they just want to know if the network is fast enough, or what they have to do if they can deliver this presentation in a good way. So in essence, we have to translate networking information like packet loss, [inaudible], and so on into something meaningful to people. And in particular, for security, we need to understand if something happening on our network is not expected. And in case that happens, to understand where it is happening and how to fix it. So this is the new problem. So packets are good but they are too many, so we need to compress them. When I compress them, I don’t want to say that we need to have to squeeze them in a way that they take less space. But we have to represent this information in a different format. And usually the typical format is something called flow. That is a five tuple. So in essence, we have the source IP, destination IP, source port, destination port, and protocol. In addition to that, we have the [inaudible] and other attributes. But these five tuple element/s are usually enough to understand a sort of communication that is flowing in our network. So therefore, the first step is to move from packets to flows, and to create and commute metrics on these flows.
Luca Deri 00:09:52.339 Before I go into details of our ntopng working, I would like to tell you that, if you want, you can study from source, it’s on Github. And this is the address. We support many distributions, so out of the box, the software compiles some various link distributions, MacOS and so on. And if you want, you can also have a binary package so that you can install the package with the [inaudible]. Please remember the ntopng is part of various distributions, first off, Ubuntu. So if you want to store it from distributions, you simply have to do [inaudible] and that’s it. Don’t pay attention to company. Okay. Let’s go back to 1998. So when I started the world—I told you it was very different. So this is an example of the web GUI. It was not very, very fast. It was old HTML with some shades of gray. As you can see, no java script, nothing. So it was time after 10 years to start again. It was time to rethink the application because the application was focusing on information that was not probably too relevant. And also the code was polluted by many things like [inaudible] protocol or IPX or AppleTalk. Many protocols that were no longer existing, and instead it was not really focusing on the need of year 2000 people. So therefore I decided to step over, but I had some legacy code. And if you see, there is a red square here, that means RRD. So in the original ntop architecture, time series information was storing around the RRD database. And this is where, in essence, we store our [inaudible], and through which we’re driving to Influx. As you can see, the web interface was very limited. There was a sort of mobile GUI that was for all the WAP based telephones. And as you can see, this is what was happening in the year 2000. And based on this architecture, we have started the design. We have designed a video ntopng. ntopng was more or less taking the lessons learned on ntop in trying to fix many, many things. So to give you an idea, this is the user interface of ntopng.
Luca Deri 00:12:13.869 ntopng, in essence, is a software application that you can start on your PC. So, in essence, it has to be where the traffic flow was. It analyzes the traffic and it has a web service. To you, it looks like self-contained. So you didn’t have to install NginX, or Apache, or anything like that. So everything happens inside ntopng. ntopng fills you the traffic of your network and displays the traffic according to values criteria, flows, cost, [inaudible] interfaces and so on. Again, so user interface that is more or less like this. So in essence, it is a live application that analyzes the traffic, does not have any—almost any configuration, and reports in memory the information. And you can click in every website and jump from one host to two [inaudible] representation to a time series and so on. Of course, we want to do more than simply display on the screen, data, but we also want to interpret. So just to give you an example. When we have to interpret data, we have to tell you not only that this is HTTP, okay, but we also have to tell you more about it. So for instance, if it is good or not. So we have a traffic that is in plain text so we can analyze. HTTP is a good example. We can do that, but when we have to go and deal with the encrypted traffic like SSL, the problem is a little bit more complicated. I will tell you what we have done. But in essence we are interpreting the traffic. We are saying this is good, this is bad. So even people with not much experience can have a simple pie chart graph that says, okay, you’re in good shape. Okay. So the problem are not here.
Luca Deri 00:14:02.307 Of course sometimes something goes wrong, so you have alerts. And when we have alerts, we have two types of alerts in ntopng. So first of all, we have alerts with [inaudible]. So in essence, when you have a threshold and you cross the threshold, you go from good to bad. And as long as you are inside this state, okay, ntopng will reminder you that there’s a problem. And as soon that you go back to normality, then it will say, okay, you are back to normal. And when we do that, we don’t seem to tell you there is a problem but we try to put this problem in a context to tell you why the problem has originated because there is—let’s say this communication, and what are the [inaudible]. So this source has to send this [inaudible] to this other source. So this is very important. And of course we also can be passive, like ntopng [inaudible]. Active discovery is one of the things that we have introduced in the latest version, because we have realized that the people in modern network put many devices on the desk. Sometime they don’t even know that there’s a device. Okay. And it is important to understand what people have in their own network. And sometime if you put the tool, so ntopng, in the wrong location, to where you don’t have full visibility of the traffic, you can see only a part of the picture. So we believe that it is important to see the whole picture. So therefore, ntopng complements the traditional passive discovery. So whenever we see traffic flowing in our network, we learn from traffic itself. So let’s say, from the user agent, it can tell you that this is an android device or it is a tablet device that’s issuing a certain request with active discovery.
Luca Deri 00:15:55.793 Active discovery means we are probing the network. Of course this is an optional activity. It’s not compulsory to enable it. But it allows you to have an overview of the device that’s running on your network. We are trying to bind an operating system and then type next to the device, so that you know exactly how many tablets, how many modern televisions you have on your network, your laptop, and so on. And this is the important thing, because ntopng is implementing that, so therefore you can create nice reports about, “What are my tablets doing?” or, “Are my printers just printing or also connecting to certain website they should not talk to?” these types of things. And of course you can go and do drill down. Drill down means have the ability to interpret the traffic in a more low-level fashion. So in essence, to move from high-level overview of a certain device traffic, to a low-level analysis of the flow. So for instance, this example, you can see that in this case—so we have a machine that is a Linux, okay, that’s set [inaudible] address and so on. Many people don’t read this information. Many people don’t pay attention to this type of data. But it is important to know that ntopng is a tool for various types of people. So sometime you want to go down to the [inaudible] level, sometime you’re going to stay up.
Luca Deri 00:17:24.734 And we are also trying to understand and to mimic the users trying to figure out whether the users or happy or not. Just to give you an example, whenever you connect to a website that is not secure, and your browser reports to you that there is a problem with [inaudible] certificate. We are also trying to do the same from ntopng point of view. So we want to give you a warning that there is a problem. And [inaudible] data, this means that your certificate has expired or your website is not properly configured, what that [inaudible] are going to websites that probably not that safe. So we produce this information and we don’t want to do everything ntopng. ntopng, like I said in the beginning, is a pro. So in essence, its goal is to produce data and somebody has to interpret. We do a light-weight interpretation, like I said, we have alerts. But this is not the only point of why we can do that. So in essence, ntopng can be used to export data to other people, to other application, and do that externally.
Luca Deri 00:18:26.341 Another example here in the bottom of the slide is ICMP. So if you see too many destination port unreachable, it means that the machines are trying to connect to a peer that is not up. So the service that you’re trying to connect to is not available. So this might be a problem, and so this is down or might be a miscalculation. We move from this server to another server, and some of us are still using the old one. So in essence, ntopng gives you an idea of the problem. A network administrator has to decide whether this is a problem or not. We do want to say, yes, we support this and that. So in essence, we try to support everything. Of course, I’ve not included here Influx because I want to discuss it later on. But like I said, ntopng is a pro, it’s a sort of data. So we have to be integrated with more or less all the modern stuff. We’re going from Slack, to Grafana, to Vagrant, to Wireshark and so on. And we do that because, like I said in the beginning, it is not important to be open source. It is important to be open in data, so it’s important to export data to other applications, to make our tool available to other applications, and to make sure that altogether, we do a better job.
Luca Deri 00:19:43.704 Like I said, the design goal is a clean separation between the monitoring engine and the reporting facilities. This allows us to keep the engine very simple so that the user interface can be coded by anyone anywhere, that’s the scripting language we use. We use HTML 5. We try to have, right now, let’s say a pretty user interface. And we also try to categorize information through the packet inspection. And we like people to modify the user interface by adding new scripts to ntopng. So we simply create a [inaudible], we put it in the ntopng LUA scripts folder, and then you have modified the GUI and adopt it to your needs. So to give you a quick overview. Ntop is divided into three parts. The lower part is the packet capture. So it’s the activity to read traffic from a network adaptor. We have an engine based on the packet inspection that is called C++, and that is the coding information and it’s sorting the information according to flow, also interface, and so on. And we have a user interface [inaudible] Lua. Like I said, the user interface is not compulsory. So people don’t have to use a web GUI to access the monitoring data. Because we have many users who are using ntopng just to produce data and to deliver to other applications. So this is very important.
Luca Deri 00:21:09.432 And we have an API, and this API internally is in C++, back to the user is Lua. And in essence, we have three big broad concepts. One is the network device, so we can list multiple network interfaces. The second is flow, that is five tuple like I told you before. So when I would receive the packet, we classify it according to flow. So protocol, like the source, API destination, source IP, sources port, and destination port, and host. So our host, it can also be seen at two type of levels. At the IP level or at the [inaudible] address level, so you can also the information [inaudible] physical device, open logical device, that’s the IP. So this is the monitoring edge. And on top of this, we have the LUA user interface. Everything is kept in memory. But because memory can exhaust, so memory is not infinite, as you know, so at some point we have to purge data purge. So in essence, when lost or idle or when our flow is over, so we remove this information from the user interface and from the memory. So in essence, we have in and out—call something in and out to disk where we [inaudible] for the state of a host, because the host is the longest [inaudible]. And when we see new traffic for the host, we can resume from the old state.
Luca Deri 00:22:36.208 The packet journey is very simple. So we capture a packet using PF_RING, so a PF_RING is our infrastructure for capturing packet, or we can use the libpcap that is part of every Unix, Windows, MacOS cell distribution. We decode the packet. So if this is not an IP packet that we see the account at, so it’s in our [inaudible] resolution protocol [inaudible] plus one. Instead, if it’s an IPv4/IPv6 packet, we map it to a flow, okay, and we do apply the packet inspection. I will go to that in a second. So in essence, we try to figure out specific information about this flow. So for instance, from I said, we extract information about the inscription keys so that we know if we’re connecting to Facebook or we’re connecting to, let’s say, Spotify, it is very important. And we classify this according to our application protocol so that people know more or less what is happening on the network. And then we move to the next platform.
Luca Deri 00:23:38.017 DPI is very important today because saying this Port 80 does not mean anything today. So even if I say HTTP, it doesn’t mean much. So people want to understand what type of services we are using. And with the [inaudible] of the cloud, so in essence, may communications are very complicated because we are going for [inaudible] to Amazon for storage, we are going there if we’re accessing content and for storing information because we have a machine on AWS. So it’s very important to categorize a certain flow of communication and to understand the nature of the information that [inaudible]. So therefore the DPI, it is very important. The problem of DPI is that the toolkits available in the market are not what we want. First of all, they are proprietary. So many times we have to sign an NDA just to talk about the toolkit, and they cost a lot of money. So this is not compatible with the idea of an open source application. Okay. I mean I’m not just talking about price, I’m talking about the fact that you have to integrate a component that you cannot inspect a key to an application. It’s open source. And the problem is that this component has to be open because it is manipulating our data, our packets. So I want to understand what is [inaudible]. This is very important.
Luca Deri 00:25:00.503 It’s not [inaudible]. So therefore we said, okay, the market does not open it, so let’s create our own toolkit and make it open source as well. So this means that the—this was the reason why we have created nDPI. That is another library, DPI-based that we have on our website, that allows you to monitor the traffic into the sec, over 240 protocols. So we have all the latest protocols. When I say protocol, I don’t mean a protocol that we describe in [inaudible]. When I say an application protocol, I mean, something that is meaningful for people. So for example, WhatsApp, okay. So the Facebook webex. So this is what nDPI is about. So nDPI inside ntopng allows us to create a layer L7 statistics, to forget about Mac addresses and IP addresses. Because in essence, they are not very meaningful today. I want to know if I’m connected to webex or If I’m downloading a file in Google Drive. So this is the important thing. And we make the statistics persistent on disk.
Luca Deri 00:26:07.038 Sometimes this is not what we want, or this is not enough. Because ntopng has no access to packets because in many of the pricing networks, packets are not available. This means that there are routers that are producing this information, or there are switches that are producing for information. So in this case, ntopng allows to collect this information from switches and routers. And we have another component nProbe that is doing the conversion from the flow format to [inaudible]. And this information is delivered over our messaging queue called ZMQ to ntopng. So in essence, ntopng can do at the same time packet collection, workflow collection. And, of course, we can correlate this information seamlessly. In addition to that, we have to provide information that is coming from devices. SNMP is a protocol. It’s a very popular protocol that allows us to do that. So in particular, ntopng allows us to query devices to read the state of network interfaces, to see the traffic in and out, to understand if there is a problem. I mean, it might be that the port is misconfigured or one gigabit port is running at low speed. Something like that. And ntopng allows you to correlate that. So it means that we can go from a very high-level view of the traffic down to the network port. So I can tell you exactly, this host is creating some trouble. This host is connected to this port of this switch. So you can go down and troubleshoot your problem. And this is very important because the last has to give a complete report of the state of our network.
Luca Deri 00:27:53.226 Of course there are some protocol like sFLow or NetFlow that can also do an intersection of SNMP and packet information. And we support this as well, so therefore you can drill down from the host to the physical port, to the switch. And simply clicking on the web interface and, of course, to have historical data. Of course we try to support big data, because one of the things that we have analyzed is that in this market, the main problem is that we’re constantly producing data. So in essence, we produce numbers even if there is not traffic, or if there is traffic. And of course we could use more numbers, more metrics, so whenever we have more traffic—but in essence, we produce traffic and monitoring metrics all the time. So it means that at some point, we are exhausting our resources. So we need to rely on a big data system. So this means that we have to send data to a database that might not be local, it might be elastic that might scale up as necessary. And in this case, we support at the moment the Elasticsearch and Logstash, or we can deliver this information in ZMQ format so that there are people that are reading from these type of feeds, and take this information, let’s say, and put it on our Kafka message bus, or deliver to very big data applications. So like I said, ntopng is a sort of gateway that can convert from packets or flows. Sometimes in a proprietary format into a JSON based format that other people can reuse and can send to other applications.
Luca Deri 00:29:41.415 So this was my overview of ntopng. Now, I want to go deep into the time series program because this is the main reason why we’re here today. Okay. Now, let’s say that first of all, monitoring traffic is kind of special because we have, like [inaudible], various factors, for instance, an IP address, mac address, a protocol. So the cardinality of this information is not certain because it might be that at some point, we have very new traffic. I’m talking with my printer, and sometime I open the Skype on doing a webinar so I talk with hundreds of people. So it means that the number of metrics that we have to monitor can change over time. And because monitoring information can be a lot, so there are various technique that in particular network devices allows those devices to save secure cycles or to save memory. So one example is packet sampling. So in essence, so instead of looking at every single packet, we analyze one packet out of them. So this allows us to reduce a lot on the device, but at some point we don’t have a data that is 100% accurate because we see only a part of the picture.
Luca Deri 00:31:11.480 Another problem is counter polling. So in essence, when we have a SNMP device, just to make an example, we need to go to the device and to ask, let’s say, the number of bytes observed on port X. But in order to do that, we have to go and do this polling. So it means that we cannot go too fast because the device is not designed for that. So the design is a switch, let’s say. So first of all, that’s the switch packets. And secondly, we cannot go too fast because we have many ports, so [inaudible] is pretty common today. And in many offices, we have many of them, so the number of metrics we have to poll is not very little. So we have to do a lot in every, let’s say, X seconds. So our wish to monitor very often is not really possible, okay, because we put too much pressure on the device, and of course on the polling application, that’s ntopng in our case. Another problem we see is that when I look at, let’s say, host monitoring, when I look at the latest strand in time series, I see that people like to go down and to have a very really small granularity. So let’s say to see a counter every second, okay, and this is [inaudible] every three seconds. But because of what I’ve just told you, so you see that this is not really possible for SNMP.
Luca Deri 00:32:43.294 And another problem is that if you use other enterprise protocol like NetFlow, this is even worse. Because sometime the default is set in minutes, so it means that the number that we see is average in one minute. That’s it. So therefore, even if we [inaudible] fast disk, if we poll very often and set in counter, in the counter it won’t change because it will change the steps. So the fact that we have every data, sometimes it’s part of a problem. Just to give you an idea of the time that this has taken from Cisco document. Cisco is the company that has created NetFlow. And as you can see, inactivity time for flow is 15 second, so it’s not too bad. But for long flows, we see 30 minutes. So it means that if I’m doing a webinar like that this that last, let’s say, one hour, I receive two updates. One, after half an hour. And the other one, at the end of the hour. So it means that—I don’t understand, I don’t know what is happening in the first half an hour or in the second half an hour. I just know that these are the tools and this is not very nice. So in essence we have two driving sources. One, to have very low granularity, and one is the [inaudible] taking this data from router is a problem.
Luca Deri 00:34:01.831 And then we have the problem for real-time measurements. Like I’ve said, ntopng is designed mostly for real-time, so it means that whenever we collect packets, so whenever we read packets, ntopng allows you to see immediately in real-time this information on a web interface. But whenever we go and we have this information from a remote ntopng, we cannot do that often. Okay. This is because we don’t want to [inaudible] interface too often or we don’t want to poll too often data, let’s say, through a web socket. Because we don’t want to put too much pressure on ntopng in case there are many users that are reading the same data at the same time. Because also at the same time, we have to dump this information into this and to a time series, and the problem becomes a bit complicated if the cardinality of [inaudible] increases. So this is what I’m going to talk about that now. So let’s make an example through SNMP. So like I said, a standard switch with 24 or 48 ports is pretty common today. Okay. So their interface, we have at least five counters. One counter is packet in and out. Another is by in and out. And another one is error. So we have five counters per port. And if, let’s say, we have at least 24 ports, even though in a network there are big switches. Let’s say, with 500-plus ports. Or, if we have many switches, so at least we’d have 100 ports. So 100 ports times 5 means 500 counters, okay? So this is the very bare minimum. And then we have the problem that sometime we want to know more because I want to know more. Let’s say, I have to poll the [inaudible] because I want to know more about the topology of the switch. The number of [inaudible] which are losing a certain port and so on.
Luca Deri 00:35:57.841 So we have, let’s say, thousands of ports to monitor on our [inaudible], so you can imagine on a big enterprise. And unless the enterprise is using protocol like sFlow that is delivering to ntopng, those counters is not really possible to implement, let’s say, one minute polling or [inaudible]. But we have to stay at least at five minute, because we don’t have enough time, okay, to poll the data. Because if something went wrong, let’s say, if a switch is unresponsive, so we need to retry. Okay. And if we pull the switch too often, the switch will think that either we are polling to often so it does not reply because it looks like an attack. Or the switch will report us the same counter that we have polled, let’s say 30 seconds ago. So this is a big problem. So people want to have granularity, but SNMP is not really about that.
Luca Deri 00:37:00.176 And also we have the problem of traffic that we are monitoring, the Internet traffic. So ntopng, we are keeping counters only on lock up host. So if you’re monitoring your [inaudible] and you were able to—let’s say, YouTube, because you want to watch a video, you don’t want to keep probably, okay, a counter of the remote YouTube server that is delivering your video. But you keep a counter often of how many bytes you watch a YouTube movie but you, I mean, a local host. So even in a small enterprise today, we have at least 20 host. If you go and think about your home, okay, you have many connected devices so it is very easily—if you’re computing, or laptop, or a fridge, or many new devices today are connected. So to have 20 host is very easy. And nDPI was a port about 240 protocols, like I said. From our experience, we have seen that usually our host uses, in average, about 20 ports or 30 ports. So let’s say we have to do 30 times 20 host, and each host has to do also back in, back out, you have to create a traffic matrix. So let’s say we have many, many metrics, even for a small matter. So let’s do some matrix. So suppose that we, at least for counters, okay, in and out packets, in and out bytes, times 30. Okay. So it makes 120. So if I have other 20 counter for [inaudible] for, let’s say, packet out of four, therefore—for anything, there is a number. So I don’t want to talk about a website, this becomes a bigger problem. But in general, we have about 140 counter [inaudible]. So if you do times 20, if you want to save every minutes, so you see you have many counters. And if you have other counters for autonomous systems, so what autonomous system you have connected to? What type of countries you are connected at? How often? And so on. We can have very easily a thousand of metrics even in a small matter. So this is the problem with [inaudible].
Luca Deri 00:39:16.861 So this means that if we have a 24 network, we have very easily 35 to 50 thousand counters to save every minute if I want to have one-minute poll. And this is a problem. This is a big problem, I tell you. Because first of all the number of counters changes over time. And like I’ve said, we have hosts that are coming and leaving our network. If we stay at the IP level, and sometime we are mixed up in [inaudible] because my data today, my IP is used by me, tomorrow it’s use by another device so we have to make sure that we don’t mix things. And also we have to merge this information with other counters that are, for instance, poll every five minutes SNMP. So in essence, we have to do a lot just with the information, and then we have to save this data into Time Series Database. Inside ntopng we have a sort of Internet [inaudible], so where we poll some information every second, let’s say the number of packets on interface. Sometime every minute, sometime every 5 minutes, every hour, every day. So usually we stay at the second level. For interface counters, we go at five minutes level. For host, SNMP counters. So in essence, every five minutes, we have a lot of information. And until now we have used RRD.
Luca Deri 00:40:34.889 RRD is used because this was a heritage from the original ntop. It was a good idea for us, we thought, because it was based on files. So it means it’s self-contained. We don’t have to go and speak with a remote server. It is nice, in essence, because we have a lot of experience with that. But we understand that this choice was good 20 years ago but it’s no longer a good idea today. First of all, because with a large number of hosts, five minutes are not enough anymore. Even with SSD drives. So with SSDs, it’s in theory, it is very fast to write the disk. But when you have many counters and we have a tool like RRD that whenever we have to add a new point just to open a file, update the file and close the file, it means that we have to put a lot of pressure on the file system. And very often, even when I was [inaudible], let’s say 200 hosts for a small company, we have many problems doing that. And when we are not able to redact, then we are dropping information because we cannot choose about how fast we can go. It’s the metrics size that is deciding that. And please understand that adding backup drive does not help because it’s the system itself that is inefficient.
Luca Deri 00:41:51.012 RRD has a many limitations, for instance. First of all, it’s an old programing library, so it’s a sort of hack that when it works, it [inaudible] for threads. It has many dependencies. And let’s say at least five years is maintenance mode, so it’s not actively developed. So for us, this was a big bottleneck so we decided to fix that. And also, another problem is that the way RRD is designed, it is constantly doing an average computation. So in essence, if I put one inside RRD, I will read 0.999999. So in essence, we are not very precise. Okay. The fact that we don’t have one is not a big problem, but this gives an idea of how the system works. And also bad feeling to the users because they understand they put a number and they read something else. So the confidence is not that high.
Luca Deri 00:42:47.701 So we started to evaluate Influx in 2014. We were happy with that but we have said it probably too early to jump to it because we have seen that we did many changes to the project, that we were not very certain about. The problem, we had to fix it. I mean we understood that RRD was not the solution anymore, and we also understood that we probably had to improve a little bit ntopng and to wait until Influx was [inaudible] product. And last year we have decided that it was really time to find the solution, and we evaluate it again the solutions [inaudible] in the mark. And we realized that the two real solutions were Influx and Prometheus. But we decided to adopt Influx for a very few reasons. First of all because the ecosystem is very good. So we don’t have just the database but we have an ecosystem in the vibrant community. This is very important. So whenever we have a problem or whenever we need some information, we think [inaudible] and on the website and search, and we definitely find a solution.
Luca Deri 00:44:01.526 Second, is the flexibility. So the way Influx has been designed is closer to what we expect. So in essence, ntopng wants to push data. So ntopng has to decide when there’s some data available and to push it to Influx instead of doing [inaudible] model that the Prometheus is doing. And second, the most important thing is that we know many people that are using new database and they’re very happy of it. And we believe that this is a very good advertisement, it’s much better, this, and many other things who can read. So we have decided to jump on Influx because it’s generic, because of the ecosystem, because of the community, and because of the good advertisement people told us. So in essence, we have done integrated Influx inside ntopng. You get an idea of where we are today. But we also faced with somethings that we would like you to consider in official version. First of all, our application is probably special with respect to other applications. So it will be desirable for us to inject data, let’s say, through a library instead of going through HDP or going through EDP. Let’s say a streaming service, it would be nice for us to write directly into the library instead of going through this path. Also because for security reason, and particular here in Europe, where there is a new local GDPR, it is very important that other people don’t look at our data. So the fact that Influx, let’s say, comes out of the box without a pass or without the use. We would like to have a fight in a way, for us, is more under control.
Luca Deri 00:45:49.068 Second, we would like not to have a little media text format, even though it works very well. And we would like to push data into it. So this is the only common thing we can give you today, of an improvement that we would like to see in a future version. At the moment, we had to do many changes in our library. Not because of Influx, but because Influx is so powerful, okay, that in order to explore the library—sorry, the database— properly, we had to do a bit of design in our application. Okay. First of all, we have to change the way the rollup, the data conservation works because Influx is working in a very different way from RRD. So this was a big deal because our library in ntopng was really relying on that, so we had to do many changes. And also, the way Influx is handling counters. So they’re very high precision, opens up a lot of opportunities to us. So we have to exploit us. And because RRD was not able to do that, so we had to redesign some components of the library to make use of that, so that we have very fine-grained counters, also [inaudible] more network interface and [inaudible] some second. And we have less ground work counters for data coming from SNMP.
Luca Deri 00:47:13.063 Another thing that we had to fix is the way data is handled by the library. Not that we were used to it, but with Influx you have several ways of doing that. Several ways to consolidate data. And the fact that we now have this set of possibility makes us willing to use them. So therefore we had to do changes in the way we can explore this data. I mean RRD was, let’s say, a bit limited on that. So now we have this possibility and we want to do it. Another problem is data resolution. So in Influx we can have more precise measurements on what we had in RRD. So sometime we have to make sure that we don’t have too many data [inaudible] for the web interface, but this also means that whenever we want to drill down, we now have the data. Where in RRD, in order to save space on this kind, don’t exhaust the disk because of the nature of the system itself. We couldn’t have so many granular points, so therefore this has put, let’s say, an implication on our web interface because in order to redact, we had to, let’s say, change a bit the way the user interface was working. So at the moment, we have completed the migration from RRD to Influx. So in essence, we have created a library that allows us to write, at the same on RRD and on Influx. So therefore we can monitor your data, we can use ntopng to visualize information or you can ignore completely the web interface and just read data from Influx. So all the data is coming more or less in real-time.
Luca Deri 00:49:02.075 And what we have to do at the moment is to complete this integration because we also have to extract data from Influx. And takes a little bit of time because I told you the way we have designed the ntop and data GUI, it was really tight on RRD. So therefore now we have to make some changes to extend this library, not just to write data but also to read data from Influx. And as soon as this process is completed, we would like to invite people to move to Influx because Influx opens a big set of opportunities. But already today, we have users that are using our tools not using the ntopng user interface but using, let’s say, Chronograf to read data out of InfluxDB and completely ignoring the web interface or ntopng and using ntopng as a switch of data. So just to wrap up, Influx is therefore the step forward with respect to RRD. We had to do many changes in our, not because of the Influx design, but because Influx gives us a lot of new opportunities that we want to exploit. We don’t want to use Influx as a replacement for RRD, but we want to improve also our [inaudible] with Influx. Because Influx allows us to have very granular counters, it is very important for us to jump on it because we need to scale. And we cannot scale unless we change the way we write data disk because this is a big bottleneck for us.
Luca Deri 00:50:36.175 So the final message is that it is very important today to deliver people granular data with high granularity, because people want to correlate host monitoring with traffic monitoring. And we believe that we made a good choice to jump into this new project and to jump to Influx database. Thank you very much.
Chris Churilo 00:51:01.989 Wow. Very, very detailed. Thank you so much, Luca. That was really great. And definitely a trip down memory lane when it comes to some of those old network protocols. We are going to stay open for a few minutes. So if you have any questions for Luca about his solution, about how he created a solution, about how he use InfluxDB. Please feel free to put your questions either in the Q and A or the chat panel, and we will definitely—we, Luca will definitely be able to answer these questions for you. So Luca, what version of InfluxDB are you guys running now?
Luca Deri 00:51:42.343 The 1.5. That is the latest version.
Chris Churilo 00:51:45.259 Okay. Great. Very nice. And have you had a chance to play with IFQL [now Flux] at all?
Luca Deri 00:51:54.841 Yes. Of course we did. We are still learning. So I’m not saying that we are experts because, like I said, this is a big [inaudible] process. But this gives a lot of flexibility, yes. So we played with it.
Chris Churilo 00:52:10.112 Oh, good. Excellent. Well, it’s definitely the path that we’re moving on to hopefully give you guys a lot more flexibility. And it’s one of the things that we’re going to focus on at our conference in June, where we have that hackathon. So I invite people who are listening. If you haven’t had the chance to take a look at that, please do. And then, Luca, so everybody on today’s webinar can just go to your website to get the open source bits from ntop, is that correct?
Luca Deri 00:52:42.385 Yeah. It’s correct. You can go to Github and download the source if they want to compile it. Yes.
Chris Churilo 00:52:48.150 Okay. And I definitely think everyone should definitely try it. It’s a pretty powerful tool. And I think it’s going to—it looks pretty simple to be able to use, and probably you’ll find some surprising things on your network.
Luca Deri 00:53:05.108 Great.
Chris Churilo 00:53:06.068 All right. Well, looks like we don’t have any questions. But as always, when you guys do have questions, feel free to email me and I will forward them to Luca. And if you happen to be in San Francisco at the end of June, Luca is actually going to be in town. So shoot me an email if you need the details and you’ll be able to meet with him face-to-face. And maybe have a conversation about his implementation of InfluxDB. Hey, Luca. Thank you so much. And as I’ve mentioned everyone, we will post a recording after I do just some small edits. And you’ll get the automated email tomorrow.
Luca Deri 00:53:41.442 Thank you, Chris. Thank you very much.
Chris Churilo 00:53:42.402 Thank you so much, Luca. And we’ll see you soon. Buh-bye everybody.
Luca Deri 00:53:46.101 Sure. Buh-bye.