<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>InfluxData Blog - Philip O'Toole</title>
    <description>Posts by Philip O'Toole on the InfluxData Blog</description>
    <link>https://www.influxdata.com/blog/author/philipo/</link>
    <language>en-us</language>
    <lastBuildDate>Thu, 05 Nov 2015 08:00:46 -0700</lastBuildDate>
    <pubDate>Thu, 05 Nov 2015 08:00:46 -0700</pubDate>
    <ttl>1800</ttl>
    <item>
      <title>Make your mark on the InfluxDB source</title>
      <description>&lt;p&gt;InfluxDB is an &lt;a href="https://github.com/influxdb/influxdb"&gt;open-source project&lt;/a&gt;, and we rely heavily on the community when it comes to development. Bug reports, performance numbers, and &lt;a href="https://groups.google.com/forum/#!forum/influxdb"&gt;helping each other&lt;/a&gt;, as we iterate on the software, are all very helpful. In fact, it is in these ways that the community makes its biggest impact.&lt;/p&gt;

&lt;p&gt;Of course, there is another way to contribute to InfluxDB, and that is to actually &lt;a href="https://github.com/influxdb/influxdb/graphs/contributors"&gt;improve the source code&lt;/a&gt;. Knowing that code you wrote will be run by thousands of developers and companies worldwide can be a real source of pride.&lt;!--more--&gt;&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Help Wanted!&lt;/h2&gt;
&lt;p&gt;To help you make your mark on the InfluxDB source code, we on the Core team recently selected a set of issues that we think are the most approachable. All the &lt;a href="https://github.com/influxdb/influxdb/labels/status%2Fhelp-wanted"&gt;issues are logged on Github and are marked with the &lt;em&gt;help-wanted&lt;/em&gt; label&lt;/a&gt;. We have tried to add some introductory notes to each issue to help you get started, and will make a particular effort to answer questions about these issues as you proceed. Some issues are more difficult than others, and have been labelled as such.&lt;/p&gt;

&lt;p&gt;Before you start working with the source, be sure to check out the &lt;a href="https://github.com/influxdb/influxdb/blob/master/CONTRIBUTING.md"&gt;CONTRIBUTING guide&lt;/a&gt; to learn how to configure your build enviroment. You will also need to &lt;a href="https://influxdb.com/community/cla.html"&gt;sign the CLA&lt;/a&gt; before any of your changes can be merged.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Get your t-shirt&lt;/h2&gt;
&lt;p&gt;As an added-incentive, anyone whose PR is merged to master will get an InfluxDB t-shirt. Just remind us in the Issue when the change is merged, and we’ll contact you for shipping details. Wear the shirt with pride, knowing your code is running across the world!&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;So start coding!&lt;/h2&gt;
&lt;p&gt;We know it can be difficult to get up to speed on a complex code base like InfluxDB, so we hope this will help make it fun. Learning how a time-series database works can be a fascinating experience, and there is no better way to do so than getting into the code. Hopefully these particular issues will help you get going.&lt;/p&gt;
</description>
      <pubDate>Thu, 05 Nov 2015 08:00:46 -0700</pubDate>
      <link>https://www.influxdata.com/blog/make-your-mark-on-the-influxdb-source/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/make-your-mark-on-the-influxdb-source/</guid>
      <category>Use Cases</category>
      <author>Philip O'Toole (InfluxData)</author>
    </item>
    <item>
      <title>Testing InfluxDB Storage Engines</title>
      <description>&lt;p&gt;When you decide to build a database, you set yourself a particular software engineering challenge. As infrastructure software it must &lt;em&gt;work&lt;/em&gt;. If people are going to rely on your system for reliably storing their data, you need to be sure it does just that.&lt;/p&gt;

&lt;p&gt;InfluxDB is tested in various ways. We use &lt;a href="https://www.circleci.com/"&gt;CircleCI&lt;/a&gt; for unit-level testing, as well as some basic integration-level testing. We find CircleCI to be easy to use, well-designed, and responsive. But as we approach the 1.0 release our testing is becoming more sophisticated and thorough.&lt;/p&gt;

&lt;p&gt;Correct software is obviously critical – every data point received must be indexed, and every query must return the correct results. But resource usage – CPU, disk IO, and RAM – are just as important. We want a system that is stable when running, and monitoring resource usage during testing can flag issues. Memory leaks become apparent, and excessive disk IO can indicate a sub-optimal design or implementation. Bugs too, may be exposed – a pegged CPU may indicate a problem in the code. So in this post we discuss how we monitor resource usage on systems under test.&lt;!--more--&gt;&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Test Infrastructure&lt;/h2&gt;
&lt;p style="text-align: left;"&gt;A high level view of our test infrastructure is shown below.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img class="aligncenter size-full wp-image-8488" src="/images/legacy-uploads/test-arch-1.png" alt="test-arch-1" width="743" height="455" /&gt;&lt;/p&gt;
&lt;p style="text-align: left;"&gt;On each node under test we run &lt;a href="https://collectd.org/"&gt;collectd&lt;/a&gt;, configured to output in &lt;a href="https://collectd.org/wiki/index.php/Plugin:Write_Graphite"&gt;Graphite format&lt;/a&gt;, and &lt;a href="https://github.com/influxdb/telegraf"&gt;Telegraf&lt;/a&gt;. Data from these agents is then sent to another InfluxDB system, which we run specifically to store metric data from our test systems. This setup achieves two objectives &amp;ndash; it allows us to analyze the test results and means we are also &lt;a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food"&gt;dogfooding&lt;/a&gt; our own software.&lt;/p&gt;
&lt;p&gt;While an InfluxDB system is under load we record various metrics about the host machine during the entire test run. Most importantly we record disk IO, memory usage, CPU load, and the &lt;a href="https://en.wikipedia.org/wiki/Resident_set_size"&gt;resident set size&lt;/a&gt; of the InfluxDB process. During and after a test the results are shared with the team, and with the help of our Grafana dashboards, we study the results for problems, and (hopefully!) confirm our software is working as expected.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Testing the TSM1 Engine&lt;/h2&gt;
&lt;p style="text-align: left;"&gt;We recently starting testing the &lt;a href="https://influxdb.com/blog/2015/10/07/the_new_influxdb_storage_engine_a_time_structured_merge_tree.html"&gt;new tsm1 storage engine&lt;/a&gt;. A recent test ran for about 8 hours and involved writing billions points to a single InfluxDB node, across 1000s of different series. The target retention policy also had a duration of 1 hour, so we could test those code paths too &amp;ndash; since old data would be deleted hourly as new data was being indexed. The Grafana dashboard, showing results of the complete test run, is shown below.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img class="aligncenter size-full wp-image-8490" src="/images/legacy-uploads/100b-1hrt.png" alt="100b-1hrt" width="1532" height="1061" /&gt;&lt;/p&gt;
&lt;p&gt;Some interesting features are present. Write load is steady, as is CPU load. Disk usage reached a steady-state as the retention enforcement deleted data at about the rate incoming data was being indexed. The sawtooth pattern in disk usage is a mixture of compaction performed by the tsm1 engine, as well as retention enforcement. Interestingly InfluxDB RSS correlates closely with disk usage, which makes sense to us. The software memory-maps data on disk, so as data on disk is deleted, memory usage declines. But memory usage is generally steady over the long run, at about 30% total physical RAM on the host machine. This tells us this software does not suffer any detectable memory leaks. (The short spike at the very left of each graph is an aborted run.)&lt;/p&gt;

&lt;p&gt;By comparison, the bz1 engine consumes resources in a more irregular manner, as shown below.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img class="aligncenter size-full wp-image-8491" src="/images/legacy-uploads/0_9_4.png" alt="" width="1350" height="941" /&gt;&lt;/p&gt;

&lt;h2 style="text-align: left;"&gt;Test Environments&lt;/h2&gt;
&lt;p&gt;We run this kind of testing on a mixture of systems – &lt;a href="https://aws.amazon.com/ec2/"&gt;AWS EC2&lt;/a&gt; instances, &lt;a href="https://www.digitalocean.com/"&gt;Digital Ocean&lt;/a&gt; droplets, and physical machines. Physical machines are a particularly important part of our test infrastructure as they allow us to focus on our software – after all nothing is changing between test runs except our code. We don’t have to worry about &lt;a href="https://www.liquidweb.com/blog/why-aws-is-bad-for-small-organizations-and-users/"&gt;noisy neighbours&lt;/a&gt; or busy networks – running on physical hardware allows us to rule out all factors except the ones we control. But, of course, testing with cloud environments is very important since so many of our users run InfluxDB systems in such environments.&lt;/p&gt;

&lt;p&gt;We also use &lt;a href="http://www.ansible.com/"&gt;Ansible&lt;/a&gt; for our test system deployment and configuration management.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Test Strategies&lt;/h2&gt;
&lt;p&gt;We recently started building &lt;a href="https://golang.org/doc/articles/race_detector.html"&gt;race-detection&lt;/a&gt; enabled builds alongside our &lt;a href="https://influxdb.com/download/index.html"&gt;nightly builds&lt;/a&gt;. This allows us to run the test suite against these binaries, which helps us detect race conditions in our code. We also differentiate between stress testing – loading a system to the extreme until it falls over – versus moderate-to-high load testing, which we expect to run for days without any problems. We call this latter testing “burn-in”.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Plans for the Future&lt;/h2&gt;
&lt;p&gt;We have much more to do as we ramp up testing. Unit and basic integration testing can only take you so far, and it’s important to run tests that last for hours and days. Other key features – such as clustering – are still Beta so as new features come online testing in those areas will increase. Query performance is another key area, which will undergo significant work and testing in the near future.&lt;/p&gt;

&lt;p&gt;As &lt;a href="http://www.feynman.com/"&gt;Richard Feynman&lt;/a&gt; once said: “For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.” And our testing makes sure we don’t fool ourselves.&lt;/p&gt;
</description>
      <pubDate>Tue, 20 Oct 2015 08:00:34 -0700</pubDate>
      <link>https://www.influxdata.com/blog/testing-influxdb-storage-engines/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/testing-influxdb-storage-engines/</guid>
      <category>Product</category>
      <author>Philip O'Toole (InfluxData)</author>
    </item>
    <item>
      <title>How to Use the SHOW STATS Command and the _internal Database to Monitor InfluxDB</title>
      <description>&lt;p&gt;InfluxDB supports &lt;a href="https://docs.influxdata.com/platform/monitoring/influxdata-platform/tools/show-stats"&gt;statistics&lt;/a&gt; and &lt;a href="https://docs.influxdata.com/platform/monitoring/influxdata-platform/tools/show-diagnostics"&gt;diagnostics&lt;/a&gt; meta queries which allows developers and system administrators to make better use of their InfluxDB system, diagnose problems, and troubleshoot issues.&lt;/p&gt;

&lt;p&gt;This post outlines some of statistics and diagnostics currently gathered by InfluxDB, and some advice on how to work with this information.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Who watches the watchers?&lt;/h2&gt;
&lt;p&gt;A common use of InfluxDB is the monitoring and analysis of IT infrastructure. And to run a successful InfluxDB system, the database itself must be monitored. The command &lt;code class="language-ini"&gt;SHOW STATS&lt;/code&gt; allows you to do just that. &lt;code class="language-ini"&gt;SHOW STATS&lt;/code&gt; returns information about the various components of the system, for the node receiving the query. Each module exporting statistics exports a &lt;em&gt;Measurement&lt;/em&gt; named after the module, and various series are associated with the Measurement. (The fact that it is a Measurement is important, as will be seen shortly.)&lt;/p&gt;

&lt;p&gt;Let’s take a look at the &lt;em&gt;runtime&lt;/em&gt; statistics, which capture details about the &lt;a href="https://golang.org/pkg/runtime/"&gt;Go runtime&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;&amp;gt; show stats

name: runtime
-------------
Alloc   Frees   HeapAlloc       HeapIdle        HeapInUse       HeapObjects     HeapReleased    HeapSys Lookups Mallocs NumGC   NumGoroutine    PauseTotalNs    Sys             TotalAlloc
4056352 15134   4056352         1712128         4874240         7001            0               6586368 71      22135   4       51              1573952         10918136        13093576&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case &lt;code class="language-ini"&gt;SHOW STATS&lt;/code&gt; gives you an overview of memory usage by the InfluxDB system, within the Go runtime. Many Go developers will recognize the importance of these numbers.&lt;/p&gt;

&lt;p&gt;Another key statistic is the &lt;em&gt;httpd&lt;/em&gt; module:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;name: httpd
tags: bind=:8086
query_req       query_resp_bytes        req
---------       ----------------        ---
2               418                     2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This output shows the number of queries received (&lt;code class="language-ini"&gt;query_req&lt;/code&gt;), by this node, since the system started – 2 in this example – and the number of bytes returned to the client, 418 in this case (this system just started!).&lt;/p&gt;

&lt;p&gt;Most inputs, such as &lt;a href="http://graphite.readthedocs.org/en/latest/"&gt;Graphite&lt;/a&gt; and &lt;a href="http://opentsdb.net/"&gt;openTSDB&lt;/a&gt;, also have detailed statistics available. This can be particularly useful when working with these systems. We get plenty of questions about performance of these inputs, so this statistical information can be really useful.&lt;/p&gt;

&lt;p&gt;Here are example statistics for the Graphite input:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;name: graphite
tags: bind=:2003, proto=tcp
batches_tx      bytes_rx        connections_active      connections_handled     points_rx       points_tx
----------      --------        ------------------      -------------------     ---------       ---------
62              1658490         6                       6                       69006           62000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This shows the number of points received by the Graphite service on port 2003 (&lt;code class="language-ini"&gt;points_rx&lt;/code&gt;), for the TCP protocol. It also shows the number of points sent to the database (&lt;code class="language-ini"&gt;points_tx&lt;/code&gt;). If you notice &lt;code class="language-ini"&gt;points_rx&lt;/code&gt; is greater than &lt;code class="language-ini"&gt;points_tx&lt;/code&gt;. This shows that the Graphite input is buffering points internally, as it batches writes into the database for maximum throughput.&lt;/p&gt;

&lt;p&gt;These are just a few quick examples of what &lt;code class="language-ini"&gt;SHOW STATS&lt;/code&gt; can do. Keep in mind that depending on what services are enabled, and what code paths execute within the database, you may see statistics from other components.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;The &lt;em&gt;internal&lt;/em&gt; database&lt;/h2&gt;
&lt;p&gt;All this statistical information is very useful, but is reset when the system restarts. What if we want to analyze the performance of our system over time? Of course, InfluxDB is a time series database, built especially for storing this kind of data. So the system periodically writes all statistical data to a special database called &lt;code class="language-ini"&gt;_internal&lt;/code&gt;, which allows you to use the full power of &lt;a href="https://influxdb.com/docs/v0.9/query_language/spec.html"&gt;InfluxQL&lt;/a&gt; to analyze the system itself.&lt;/p&gt;

&lt;p&gt;Some examples may help.&lt;/p&gt;

&lt;p&gt;If you have questions about how InfluxDB is using the Go heap, it’s easy to see how usage changes over time. For example using the &lt;code class="language-ini"&gt;influx&lt;/code&gt; CLI, issue the following queries to see Go heap usage every 10 seconds.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;&amp;gt; USE _internal
Using database _internal
&amp;gt; SHOW MEASUREMENTS
name: measurements
------------------
name
graphite
httpd
runtime
shard
write

&amp;gt; SELECT HeapAlloc FROM runtime LIMIT 5
name: runtime
-------------
time                            HeapAlloc
2015-09-18T18:40:04.199587653Z  548536
2015-09-18T18:40:14.199761008Z  3895536
2015-09-18T18:40:24.199791989Z  2057504
2015-09-18T18:40:34.19971719Z   2111680
2015-09-18T18:40:44.199490569Z  2169848&lt;/code&gt;&lt;/pre&gt;
&lt;p style="text-align: left;"&gt;Even better, when coupled with a tool like &lt;a href="https://w2.influxdata.com/time-series-platform/chronograf/"&gt;Chronograf&lt;/a&gt;, you can visualize all this data.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img class="aligncenter wp-image-249640 size-large" src="/images/legacy-uploads/Screen-Shot-2020-08-24-at-12.02.03-PM-1024x508.png" alt="SHOW STATS command analyze performance Chronograf" width="1024" height="508" /&gt;&lt;/p&gt;
&lt;p style="text-align: left;"&gt;The next example of a query, also visualized using Chronograf, shows a &lt;code&gt;derivative&lt;/code&gt; query of the total garbage collection (GC) pause time of the Go runtime. Since this graph shows rate-of-change, the spikes in the graph show when a GC pause took place.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img class="alignnone wp-image-249641 size-large" src="/images/legacy-uploads/Screen-Shot-2020-08-24-at-12.05.00-PM-1024x412.png" alt="Chronograf derivative query" width="1024" height="412" /&gt;&lt;/p&gt;

&lt;h2 style="text-align: left;"&gt;Cluster-level statistics&lt;/h2&gt;
&lt;p&gt;Because every node in your cluster writes these statistics to the &lt;code class="language-ini"&gt;_internal&lt;/code&gt; database, queries against &lt;code class="language-ini"&gt;_internal&lt;/code&gt; return data for the whole cluster, which can be very useful. However, all data is tagged with the hostname and node ID, so analysis of a specific node is always possible. Shown below is &lt;code class="language-ini"&gt;points_rx&lt;/code&gt; for the Graphite service on just the node with hostname &lt;code class="language-ini"&gt;malthus&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;&amp;gt; SHOW TAG VALUES WITH key=hostname
name: hostnameTagValues
---------------------
hostname
malthus
&amp;gt; SELECT points_rx FROM graphite WHERE hostname='malthus' LIMIT 5
name: graphite
--------------
time                            points_rx
2015-09-18T18:40:54.199425753Z  141001
2015-09-18T18:41:04.19947468Z   315608
2015-09-18T18:41:14.1993757Z    476001
2015-09-18T18:41:24.199438213Z  641001
2015-09-18T18:41:34.199454694Z  802001&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But remember, the commands &lt;code class="language-ini"&gt;SHOW STATS&lt;/code&gt; and &lt;code class="language-ini"&gt;SHOW DIAGNOSTICS&lt;/code&gt; only ever return data for the &lt;strong&gt;node on which the query executes&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Expvar support&lt;/h2&gt;
&lt;p&gt;All statistics data are available in standard &lt;a href="https://golang.org/pkg/expvar/"&gt;expvar&lt;/a&gt; format, if you wish to use external tools to monitor InfluxDB. This information is available at the endpoint &lt;code class="language-ini"&gt;/debug/vars&lt;/code&gt;.&lt;/p&gt;
&lt;h2 style="text-align: left;"&gt;Diagnostics&lt;/h2&gt;
&lt;p&gt;Diagnostic information is treated a little differently within the InfluxDB system. It’s mostly information about the system that is not necessarily numerical in format. It is important to note that diagnostic information is not stored in the &lt;code&gt;_internal&lt;/code&gt; database.&lt;/p&gt;

&lt;p&gt;Example data is the build version of your InfluxDB and its uptime. This information is particularly useful to InfluxDB Support, so be sure to include the output of this query anytime you file a Support ticket or GitHub issue.&lt;/p&gt;

&lt;p&gt;Example output is shown below.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;&amp;gt; SHOW DIAGNOSTICS
name: system
------------
PID     currentTime                     started                         uptime
7299    2015-09-18T20:32:22.219545782Z  2015-09-18T19:54:04.069260449Z  38m18.150285438s


name: build
-----------
Branch  Commit                                   Version
master  d81618c57fae135d9b1c1a8fb3403722ceb29354 0.9.4


name: runtime
-------------
GOARCH  GOMAXPROCS      GOOS    version
amd64   8               linux   go1.5


name: network
-------------
hostname
malthus&lt;/code&gt;&lt;/pre&gt;
&lt;h2 style="text-align: left;"&gt;More to come&lt;/h2&gt;
&lt;p&gt;As always, we encourage open-source developers to add statistics and diagnostics to any code they contribute, if it makes sense. We hope you find this information useful, as you work with InfluxDB. If you have any questions or concerns, come join us in our &lt;a href="https://w2.influxdata.com/slack"&gt;Community Slack&lt;/a&gt;.&lt;/p&gt;
</description>
      <pubDate>Tue, 22 Sep 2015 08:00:50 -0700</pubDate>
      <link>https://www.influxdata.com/blog/how-to-use-the-show-stats-command-and-the-internal-database-to-monitor-influxdb/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/how-to-use-the-show-stats-command-and-the-internal-database-to-monitor-influxdb/</guid>
      <category>Product</category>
      <author>Philip O'Toole (InfluxData)</author>
    </item>
  </channel>
</rss>
