<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>InfluxData Blog - Ryan Betts</title>
    <description>Posts by Ryan Betts on the InfluxData Blog</description>
    <link>https://www.influxdata.com/blog/author/ryanbetts/</link>
    <language>en-us</language>
    <lastBuildDate>Mon, 27 Nov 2017 14:17:08 -0700</lastBuildDate>
    <pubDate>Mon, 27 Nov 2017 14:17:08 -0700</pubDate>
    <ttl>1800</ttl>
    <item>
      <title>InfluxDB Internals 101 - Part Two</title>
      <description>&lt;ul&gt;
 	&lt;li&gt;Query path: reading data from InfluxDB
&lt;ul&gt;
 	&lt;li&gt;Indexing points for query&lt;/li&gt;
 	&lt;li&gt;A note on TSI (on disk indexes)&lt;/li&gt;
 	&lt;li&gt;Executing queries&lt;/li&gt;
 	&lt;li&gt;A note on IFQL&lt;/li&gt;
 	&lt;li&gt;DELETE and DROP - removing data from InfluxDB&lt;/li&gt;
 	&lt;li&gt;Updating points&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://w2.influxdata.com/blog/influxdb-internals-101-part-one/"&gt;Part One&lt;/a&gt; of this series describes the InfluxDB write path: how the database persists and organizes data being written to the database. This part (Part Two) describes the other main interaction with the database: querying data once it has been persisted. Note that Part One also defines the InfluxDB jargon used in this post (&lt;code&gt;tagset&lt;/code&gt;, &lt;code&gt;fieldset&lt;/code&gt;, &lt;code&gt;measurement&lt;/code&gt;, &lt;code&gt;series&lt;/code&gt;) which will be helpful to new readers.&lt;/p&gt;

&lt;p&gt;InfluxDB is queried using a SQL dialect called &lt;code&gt;influxql&lt;/code&gt;. There is quite a bit of &lt;a href="https://docs.influxdata.com/influxdb/v1.3/query_language/"&gt;documentation&lt;/a&gt; for the language as well as a &lt;a href="https://docs.influxdata.com/influxdb/v1.3/query_language/data_exploration/"&gt;guide&lt;/a&gt; to using &lt;code&gt;influxql&lt;/code&gt; for different querying tasks. This post focuses on how the query engine works and not on the semantics of the language itself.&lt;/p&gt;

&lt;p&gt;Time series applications tend to query in two patterns. Queries either window and produce per-window aggregates (window data into one-minute intervals and calculate the average for each minute). Or, queries search for a specific point (often the &lt;code&gt;last()&lt;/code&gt; or most recent point in a series). Both query patterns filter the points in the database by criteria applied to a set of dimensions; for example, all the data where &lt;code&gt;region = us-east&lt;/code&gt; or where &lt;code&gt;measurement = 'cpu'&lt;/code&gt;. In InfluxDB, these dimensions are stored as &lt;code&gt;tagsets&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Finally, before we get into more detail, it is important to note that &lt;code&gt;influxql&lt;/code&gt; supports &lt;code&gt;selection&lt;/code&gt; and &lt;code&gt;projection&lt;/code&gt; operators but does not support traditional relational &lt;code&gt;joins&lt;/code&gt;. Optimizing query performance in InfluxDB requires finding the initial point for each series and then leveraging columnar storage to efficiently scan a sequence of points following that initial point. The use of flexible schema-on-write &lt;code&gt;tagsets&lt;/code&gt; vs. pre-defined dimension tables in a star-schema is one of the more interesting differences between InfluxDB and a traditional SQL columnar OLAP database.&lt;/p&gt;
&lt;h2 id="indexingpointsforquery"&gt;Indexing Points for Query&lt;/h2&gt;
&lt;p&gt;Part One describes the different data structures populated by incoming writes to achieve durability and compact long-term storage. There is one additional data structure populated by writes to make queries efficient: the &lt;em&gt;index&lt;/em&gt;. InfluxDB automatically maintains an index to make filtering by &lt;code&gt;tagsets&lt;/code&gt; efficient.&lt;/p&gt;

&lt;p&gt;The index maintains mappings of &lt;code&gt;measurement name&lt;/code&gt; to &lt;code&gt;field keys&lt;/code&gt;, of &lt;code&gt;measurement name&lt;/code&gt; to &lt;code&gt;series ids&lt;/code&gt; (an internal series identifier), of &lt;code&gt;measurement name&lt;/code&gt; to &lt;code&gt;tag keys&lt;/code&gt; to &lt;code&gt;tag value&lt;/code&gt; to &lt;code&gt;series id&lt;/code&gt;, and of &lt;code&gt;series id&lt;/code&gt; to &lt;code&gt;shards&lt;/code&gt;. The index (as of version 1.4) also maintains sketches of &lt;code&gt;series&lt;/code&gt; and &lt;code&gt;measurements&lt;/code&gt; for fast cardinality estimates. You can read the &lt;a href="https://github.com/influxdata/influxdb/tree/master/tsdb" target="_blank" rel="noopener noreferrer"&gt;index implementation&lt;/a&gt; on GitHub for more detail.&lt;/p&gt;

&lt;p&gt;That’s a lot of different mappings to think about and understand. Personally, I find it easier, and conceptually accurate, to think of the index as a posting list (aka inverted index) that maps tag key/value pairs to a list of series keys. This slight abstraction captures the primary purpose of the index: to make it efficient at query time to identify all series that need to be scanned based on a &lt;code&gt;tagset&lt;/code&gt; filter in an &lt;code&gt;influxql&lt;/code&gt; WHERE predicate.&lt;/p&gt;
&lt;h2 id="anoteontsiondiskindex"&gt;A Note on TSI (On-disk Index)&lt;/h2&gt;
&lt;p&gt;The current default index is stored in-memory. This allows fast lookup for query planning. However, it also means that high-cardinality data, data that include a large number of unique &lt;code&gt;tagsets&lt;/code&gt;, requires a lot of memory to index. This is why we suggest that users use &lt;code&gt;tagsets&lt;/code&gt; for lower-cardinality dimension data and use unindexed &lt;code&gt;field values&lt;/code&gt; for high-cardinality data.&lt;/p&gt;

&lt;p&gt;We are developing a new index structure, Time Series Index (TSI), which is now shipping as an &lt;a href="https://w2.influxdata.com/blog/path-1-billion-time-series-influxdb-high-cardinality-indexing-ready-testing/"&gt;opt-in preview&lt;/a&gt;. TSI stores the index on SSD, allowing much higher cardinality datasets than the default in-memory index.&lt;/p&gt;
&lt;h2 id="parsingandplanning"&gt;Parsing and Planning&lt;/h2&gt;
&lt;p&gt;Having described the index, it is possible to explain the internal workflow that runs to parse, plan, and execute an example &lt;code&gt;influxql&lt;/code&gt; query. The query engine:&lt;/p&gt;
&lt;ul&gt;
 	&lt;li&gt;Determines the type of query (one with an expression or a raw data query)&lt;/li&gt;
 	&lt;li&gt;Determines and then separates the time range and the condition expression for filtering data&lt;/li&gt;
 	&lt;li&gt;Determines which shards it needs to access using the list of measurements and the time frame&lt;/li&gt;
 	&lt;li&gt;Expands any wildcards&lt;/li&gt;
 	&lt;li&gt;Validates that the query is semantically correct&lt;/li&gt;
 	&lt;li&gt;Directs the storage engine to create the iterators for each shard&lt;/li&gt;
 	&lt;li&gt;And merges the shard iterator outputs, performing any post-processing on the data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sample query:
&lt;code&gt;select user, system from cpu where time &amp;gt; now() - 1h and host = 'serverA'&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The database receives the query and parses out the measurements that are accessed, fields returned, grouping time intervals, filter predicates, and other &lt;code&gt;influxql&lt;/code&gt; query components. You can read the &lt;a href="https://github.com/influxdata/influxql/blob/master/ast.go#L980"&gt;AST&lt;/a&gt; structure for the SELECT statement in the influxdata/influxql GitHub repository.&lt;/p&gt;

&lt;p&gt;After parsing, the query engine determines which series are needed to produce an answer. In this example, the query engine uses the index to find all &lt;code&gt;series&lt;/code&gt; that are part of the &lt;em&gt;cpu&lt;/em&gt; &lt;code&gt;measurement&lt;/code&gt;. It then uses the index to find all &lt;code&gt;series&lt;/code&gt; that have the &lt;code&gt;tag key, tag value&lt;/code&gt; pair &lt;em&gt;host, serverA&lt;/em&gt;. The intersection of these sets provide the &lt;code&gt;series&lt;/code&gt; that need to be scanned. The time range in the query, &lt;em&gt;now() - 1h&lt;/em&gt;, limits the scan to &lt;code&gt;shard groups&lt;/code&gt; covering the last one hour.&lt;/p&gt;

&lt;p&gt;The query engine instantiates an iterator for each series, for each shard. These iterators are nested, forming a tree. The iterator tree is executed bottom-up, reading, filtering, and merging data to produce a final result set.&lt;/p&gt;

&lt;p&gt;The version 1.4 &lt;code&gt;EXPLAIN&lt;/code&gt; and &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; statements provide statistics on iterators created and TSM blocks decoded as part of query execution. There are example outputs in the &lt;a href="https://w2.influxdata.com/blog/whats-new-influxdb-oss-1-4/"&gt;What’s New in InfluxDB 1.4&lt;/a&gt; blog post.&lt;/p&gt;
&lt;h2 id="anoteonifql"&gt;A Note on IFQL&lt;/h2&gt;
&lt;p&gt;The combination of schema-on-write, automatic indexing of &lt;code&gt;tagsets&lt;/code&gt;, and SQL-like syntax produce a system that allows newcomers to be productive quickly, that feels familiar, and requires minimal setup to get started.&lt;/p&gt;

&lt;p&gt;However, the pre-allocation of narrowly scoped iterators means high-cardinality queries, and queries that produce a very large number of groups are expensive to plan. The iterator structures can consume, worst case, GBs of RAM. Secondly, the iterator allocation during planning and other implementation details make multi-query resource management difficult. Finally, while SQL-like syntax is a good fit for simple queries, it becomes cumbersome for more sophisticated analytics. Time series queries are often sets of functions applied to groupings of filtered streams. Expressing these queries using select-project-join logic with advanced SQL partition and over clauses requires an experienced SQL programmer and is no longer beginner- friendly.&lt;/p&gt;

&lt;p&gt;We recently announced a prototype query language, &lt;a href="https://w2.influxdata.com/blog/announcing-ifql-a-new-query-language-and-engine-for-influxdb/"&gt;IFQL&lt;/a&gt;, to explore solutions to these problems: cheaper planning, better resource management, and easier expression of complex queries.&lt;/p&gt;
&lt;h2 id="deleteanddropremovingdatafrominfluxdb"&gt;DELETE and DROP: Removing Data from InfluxDB&lt;/h2&gt;
&lt;p&gt;InfluxDB supports retention policies to enforce time to live policies against data. This is always the preferred way to regularly delete points from the database. However, applications sometimes write bad data to the database. That data needs to be removed to return to normal operation. In these cases, &lt;code&gt;DELETE&lt;/code&gt; and &lt;code&gt;DROP&lt;/code&gt; can be used to delete unwanted points.&lt;/p&gt;

&lt;p&gt;DELETE and DROP statements are processed through the query layer, not the write layer. This allows DELETE and DROP to re-use the selection and expression features of &lt;code&gt;influxql&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Deleting data from a columnar database is expensive. InfluxDB organizes data on disk into immutable runs of values for a single column of a series. A delete operation needs to undo a lot of that work for a subset of points.&lt;/p&gt;

&lt;p&gt;In InfluxDB, deleting a row from the database produces a tombstone. A tombstone includes a &lt;code&gt;series key&lt;/code&gt; and the min and max time of the deleted range. This allows a very compact expression for the primary delete use case: delete all data for an invalid series between times &lt;em&gt;t1&lt;/em&gt; and &lt;em&gt;t2&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;When sufficient tombstones collect, TSM data is re-compacted into a new immutable file with the deleted data removed and tombstone records deleted. At query time, tombstones are checked to avoid processing data marked as deleted.&lt;/p&gt;

&lt;p&gt;Over the last six months, substantial work has gone into making tombstone management, compaction based on accumulated deletes, and index updates after deletes, correct and efficient.&lt;/p&gt;
&lt;h2 id="updatingpoints"&gt;Updating Points&lt;/h2&gt;
&lt;p&gt;InfluxDB does not support an &lt;code&gt;UPDATE&lt;/code&gt; statement. However, re-inserting a fully qualified &lt;code&gt;series key&lt;/code&gt; at an existing timestamp will replace the old point’s &lt;code&gt;field value&lt;/code&gt; with the new &lt;code&gt;field value&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Hopefully this post has added to your mental model of InfluxDB. It discusses four key concepts:&lt;/p&gt;
&lt;ul&gt;
 	&lt;li&gt;&lt;code&gt;series&lt;/code&gt; and &lt;code&gt;tagsets&lt;/code&gt; are indexed for query planning.&lt;/li&gt;
 	&lt;li&gt;Query planning uses the index to identify series to scan.&lt;/li&gt;
 	&lt;li&gt;Query planning generates and executes a tree of iterators.&lt;/li&gt;
 	&lt;li&gt;DELETE and DROP statements are part of &lt;code&gt;influxql&lt;/code&gt; and result in tombstones to annotate deleted data.&lt;/li&gt;
&lt;/ul&gt;
</description>
      <pubDate>Mon, 27 Nov 2017 14:17:08 -0700</pubDate>
      <link>https://www.influxdata.com/blog/influxdb-internals-101-part-two/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/influxdb-internals-101-part-two/</guid>
      <category>Product</category>
      <category>Developer</category>
      <author>Ryan Betts (InfluxData)</author>
    </item>
    <item>
      <title>InfluxDB 1.4 Now Available: InfluxQL Enhancements, Prometheus Read/Write, Better Compaction and a Lot More!</title>
      <description>&lt;h2&gt;What’s New in InfluxDB 1.4&lt;/h2&gt;
&lt;p&gt;We are announcing the new version of InfluxDB 1.4 in Open Source. This release, unlike our previous releases, is not paired with a corresponding InfluxDB Enterprise release. All of the features and changes described here are available in open source InfluxDB. This blog post is assembled largely from the pull requests and feature descriptions written by the InfluxDB platform team and community members. Thank you!&lt;/p&gt;
&lt;h2 id="influxqlenhancements"&gt;InfluxQL Enhancements&lt;/h2&gt;
&lt;p&gt;InfluxDB 1.4 includes new InfluxQL capabilities to make it easier to explore metadata and understand query execution. We’ve added &lt;code class="language-markup"&gt;SHOW CARDINALITY&lt;/code&gt; queries to make it much easier to query for series cardinality.&lt;/p&gt;

&lt;p&gt;The &lt;code class="language-markup"&gt;SHOW CARDINALITY&lt;/code&gt; commands come in two flavors: estimated and exact. The estimated values are calculated using sketches and are a safe default for all cardinality sizes. The &lt;code class="language-markup"&gt;EXACT&lt;/code&gt; variations count directly from TSM data and are expensive to run for high cardinality data. We suggest preferring the estimates. We also started adding predicate support (&lt;code class="language-markup"&gt;WHERE&lt;/code&gt; clause support) to meta-queries. However, filtering by &lt;code class="language-markup"&gt;time&lt;/code&gt; is only currently supported with TSI. We will continue to improve these capabilities as we develop 1.5.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;SHOW MEASUREMENT CARDINALITY — estimates the cardinality of the measurement set for the current database.

SHOW MEASUREMENT CARDINALITY ON db0 — estimates the cardinality of the measurement set on the provided database.

Note: SHOW MEASUREMENT CARDINALITY also supports GROUP BY tag and WHERE tag. However when using these options on the query, then the query falls back to an exact count. 

SHOW SERIES CARDINALITY — estimates the cardinality of the measurement set for the current database.

SHOW SERIES CARDINALITY ON db0 — estimates the cardinality of the measurement set on the provided database.

Note: SHOW SERIES CARDINALITY also supports FROM measurement, GROUP BY tag WHERE tag etc. However when using these options on the query, then the query falls back to an exact count. 

SHOW MEASUREMENT EXACT CARDINALITY —  counts exactly the number of measurements on the current database. 

SHOW SERIES EXACT CARDINALITY —  counts exactly the number of series exactly on the current database.

SHOW TAG KEY CARDINALITY —  estimates the number of tag keys on the current database. Note: this is currently implemented as an exact count.

SHOW TAG VALUES CARDINALITY WITH KEY = "X" —  estimates the number of tag values for the provide tag key, on the current database. Note: this is currently implemented as an exact count.

SHOW TAG KEY EXACT CARDINALITY —  counts exactly the number of tag keys on the current database.  

SHOW TAG VALUES EXACT CARDINALITY WITH KEY = "X" —  counts exactly the number of tag values for the provide tag key, on the current database.&lt;/code&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We’ve also added support for &lt;code class="language-markup"&gt;EXPLAIN&lt;/code&gt; and &lt;code class="language-markup"&gt;EXPLAIN ANALYZE&lt;/code&gt; to help understand query costs. &lt;code class="language-markup"&gt;EXPLAIN&lt;/code&gt; parses and plans the query and then prints a summary of estimated costs. Many SQL engines use &lt;code class="language-markup"&gt;EXPLAIN&lt;/code&gt; to show join order, join algorithms, and predicate and expression pushdown. InfluxQL doesn’t support joins. Instead, the cost of a query in InfluxQL is typically a function of total series accessed, number of iterator accesses to a TSM file, and number of TSM blocks that need to be scanned. Consequently, these are the elements of InfluxQL &lt;code class="language-markup"&gt;EXPLAIN&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;&amp;gt; explain select sum(pointReq) from "_internal"."monitor"."write" group by hostname;
QUERY PLAN
------
EXPRESSION: sum(pointReq::integer)
NUMBER OF SHARDS: 2
NUMBER OF SERIES: 2
CACHED VALUES: 110
NUMBER OF FILES: 1
NUMBER OF BLOCKS: 1
SIZE OF BLOCKS: 931&lt;/code&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running &lt;code class="language-markup"&gt;EXPLAIN ANALYZE&lt;/code&gt; executes the query and counts the actual costs during runtime.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;&amp;gt; explain analyze select sum(pointReq) from "_internal"."monitor"."write" group by hostname;
EXPLAIN ANALYZE
-----------
.
└── select
├── execution_time: 242.167µs
├── planning_time: 2.165637ms
├── total_time: 2.407804ms
└── field_iterators
├── labels
│   └── statement: SELECT sum(pointReq::integer) FROM "_internal"."monitor"."write" GROUP BY hostname
└── expression
├── labels
│   └── expr: sum(pointReq::integer)
├── create_iterator
│   ├── labels
│   │   ├── measurement: write
│   │   └── shard_id: 57
│   ├── cursors_ref: 1
│   ├── cursors_aux: 0
│   ├── cursors_cond: 0
│   ├── float_blocks_decoded: 0
│   ├── float_blocks_size_bytes: 0
│   ├── integer_blocks_decoded: 1
│   ├── integer_blocks_size_bytes: 931
│   ├── unsigned_blocks_decoded: 0
│   ├── unsigned_blocks_size_bytes: 0
│   ├── string_blocks_decoded: 0
│   ├── string_blocks_size_bytes: 0
│   ├── boolean_blocks_decoded: 0
│   ├── boolean_blocks_size_bytes: 0
│   └── planning_time: 1.401099ms
└── create_iterator
├── labels
│   ├── measurement: write
│   └── shard_id: 58
├── cursors_ref: 1
├── cursors_aux: 0
├── cursors_cond: 0
├── float_blocks_decoded: 0
├── float_blocks_size_bytes: 0
├── integer_blocks_decoded: 0
├── integer_blocks_size_bytes: 0
├── unsigned_blocks_decoded: 0
├── unsigned_blocks_size_bytes: 0
├── string_blocks_decoded: 0
├── string_blocks_size_bytes: 0
├── boolean_blocks_decoded: 0
├── boolean_blocks_size_bytes: 0
└── planning_time: 76.192µs&lt;/code&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For the moment, these statistics are provided to help users and our support team understand the “cost” of the queries being executed. It helps to explain what the query engine is actually doing and hopefully provides greater insight into the data set being accessed. Unfortunately, there isn’t much more you can do to act on these results other than confirming that the number of series being accessed make sense based on expected results. But, we believe the insight to be useful by itself.&lt;/p&gt;
&lt;h2 id="supportforprometheusreadandwriteendpoints"&gt;Support for Prometheus Read and Write Endpoints&lt;/h2&gt;
&lt;p&gt;As &lt;a href="https://w2.influxdata.com/blog/influxdb-now-supports-prometheus-remote-read-write-natively/"&gt;announced&lt;/a&gt; earlier, we added Prometheus read and write endpoints. These have been available on &lt;code class="language-markup"&gt;master&lt;/code&gt; for a while and are shipping with InfluxDB 1.4.&lt;/p&gt;
&lt;h2 id="compactionperformanceimprovements"&gt;Compaction Performance Improvements&lt;/h2&gt;
&lt;p&gt;TSM compactions have been improved in a few areas: performance, scheduling, and observability. Performance has improved to better handle higher cardinalities within a shard. These changes include using off-heap memory for TSM indexes, disk-based index buffering when creating TSM files, and reductions in allocations.  These changes should reduce GC pressure which lowers CPU and memory utilization in almost all cases. It also prevents OOMs due to very high cardinality compactions.&lt;/p&gt;

&lt;p&gt;Compaction scheduling has been improved to better coordinate resources across shards and adapt to changing workloads. Previously, each shard scheduled and limited compactions for the shard independently. The scheduler now uses a weighted queue approach instead of fixed limits on scheduling and works across shards better. This allows higher priority work to take advantage of available cores more dynamically. The &lt;code class="language-markup"&gt;max-concurrent-compactions&lt;/code&gt; limit that was added in 1.3 is now enabled by default to limit compactions to 50% of available cores. This better controls memory and CPU utilization when many shards are active.&lt;/p&gt;

&lt;p&gt;Monitoring of compactions now has a metric for the depth of the queue for each level. Each shard now exposes a gauge style metrics such as &lt;code class="language-markup"&gt;tsmLevel3CompactionQueue&lt;/code&gt; that indicates how long the queue is for that level and shard. The sum of all levels in a shard indicates if compactions are backing up and you may need more CPU cores.  The combination of the &lt;code class="language-markup"&gt;*Active&lt;/code&gt;, &lt;code class="language-markup"&gt;*Err&lt;/code&gt; and &lt;code class="language-markup"&gt;*Queue&lt;/code&gt; metrics provide basic utilization, saturation and error (USE) metrics. The existing &lt;code class="language-markup"&gt;*Duration&lt;/code&gt; metrics can be used to monitor compaction latencies if you use the four gold signals approach to monitoring.&lt;/p&gt;
&lt;h2 id="clientandhttpenhancements"&gt;Client and HTTP Enhancements&lt;/h2&gt;
&lt;p&gt;There are a number of features that make using the HTTP interface easier.&lt;/p&gt;

&lt;p&gt;HTTP responses to the &lt;code class="language-markup"&gt;/query&lt;/code&gt; endpoint no longer force &lt;code class="language-markup"&gt;Connection: close&lt;/code&gt;. This allows re-use of HTTP connections by clients. The issue &lt;a href="https://github.com/influxdata/influxdb/issues/8525"&gt;#8525&lt;/a&gt; includes useful discussion of the change.&lt;/p&gt;

&lt;p&gt;InfluxDB HTTP responses now include the InfluxDB version in the header &lt;code class="language-markup"&gt;X-Influxdb-Build&lt;/code&gt; for applications that need to distinguish database versions. Internally, Chronograf will use this to more easily manage combinations of open source and enterprise InfluxDB instances.&lt;/p&gt;

&lt;p&gt;Errors from queries and writes are now available via the &lt;code class="language-markup"&gt;X-InfluxDB-Error header&lt;/code&gt; and 5xx error messages are written to server logs when &lt;code class="language-markup"&gt;log-enabled = true&lt;/code&gt;&lt;code&gt;&lt;/code&gt; is enabled in the &lt;code class="language-markup"&gt;[httpd]&lt;/code&gt; configuration section.&lt;/p&gt;

&lt;p&gt;InfluxDB now honors &lt;code class="language-markup"&gt;X-Request-Id&lt;/code&gt; header so that callers can pass a correlation id as part of the request. HTTP responses populate both X-Request-Id and Request-Id to maintain backwards compatibility with previous version, and to support the more common &lt;code class="language-markup"&gt;X-Request-Id&lt;/code&gt; header name. More details are recorded in the &lt;a href="https://github.com/influxdata/influxdb/pull/8619"&gt;pull request&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, thanks to @emluque, the InfluxDB CLI now supports Ctrl+C to cancel a running query.&lt;/p&gt;
&lt;h2 id="messagepackformatsforresponses"&gt;Message Pack formats for responses&lt;/h2&gt;
&lt;p&gt;Message pack can now be used for responses by setting &lt;code class="language-markup"&gt;application/x-msgpack&lt;/code&gt; in the &lt;code class="language-markup"&gt;Accept&lt;/code&gt; header. The server will respond with message pack serialized responses.&lt;/p&gt;
&lt;h2 id="experimentalandpreviewfeatures"&gt;Experimental and Preview Features&lt;/h2&gt;
&lt;h3 id="tsiprogress"&gt;TSI Progress&lt;/h3&gt;
&lt;p&gt;A lot of work has gone into TSI over the course of developing 1.4. However, we are not yet ready to release TSI as the default production index. We’ll be writing more about our TSI progress as we work on 1.5.&lt;/p&gt;

&lt;p&gt;This release does include a utility to generate TSI indexes from TSM data. This is allows TSI indexes to be rebuilt even when they are larger than in-memory support would allow; it also allows building TSI indexes for older shards for experimentation.&lt;/p&gt;

&lt;p&gt;Further description and a usage example is available in the pull request: &lt;a href="https://github.com/influxdata/influxdb/pull/8669"&gt;#8669&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="previewofuint64support"&gt;Preview of uint64 Support&lt;/h3&gt;
&lt;p&gt;We have added unsigned 64 bit integer (aka &lt;code class="language-markup"&gt;uint64&lt;/code&gt;) support that can be enabled with an InfluxDB build flag. We are leaving this behind a build flag until we implement &lt;code class="language-markup"&gt;uint64&lt;/code&gt; support through the rest of the TICK stack. Telgraf, Chronograf, and Kapacitor do not yet support this field type and there are some client libraries where &lt;code class="language-markup"&gt;uint64&lt;/code&gt; values cannot be naturally expressed.&lt;/p&gt;

&lt;p&gt;To enable &lt;code class="language-markup"&gt;uint64&lt;/code&gt;, build InfluxDB with &lt;code class="language-markup"&gt;go install -tags uint64 ./...&lt;/code&gt;. Write &lt;code class="language-markup"&gt;uint64&lt;/code&gt; values by suffixing an integer with &lt;code class="language-markup"&gt;u&lt;/code&gt; in the write protocol.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;create database u64ex
use u64ex
Using database u64ex
insert cpu v1=18446744073709551615u
select v1 from cpu
name: cpu
time                v1
----                --
1510620507267476000 18446744073709551615&lt;/code&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="ifqlprototypeinterfaces"&gt;IFQL Prototype Interfaces&lt;/h3&gt;
&lt;p&gt;InfluxDB OSS 1.4 includes the prototype RPC interface to support &lt;a href="https://w2.influxdata.com/blog/announcing-ifql-a-new-query-language-and-engine-for-influxdb/"&gt;IFQL&lt;/a&gt;. This API &lt;em&gt;will change&lt;/em&gt; as we advance the IFQL prototype and we are not establishing any compatibility promises for this new interface. However, you can enable and access storage if you want to explore the interface as an access point to the database. An &lt;code class="language-markup"&gt;ifql&lt;/code&gt; section is now available on the configuration file.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-markup"&gt;[ifql]
# Determines whether the RPC service is enabled.
enabled = true
# Determines whether additional logging is enabled.
log-enabled = true
# The bind address used by the ifql RPC service.
bind-address = ":8082"&lt;/code&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The IFQL RPC interface is protobuf based; the protobuf file is available for your coding pleasure.&lt;/p&gt;
&lt;h2 id="otherchangelogtidbits"&gt;Other Changelog Tidbits&lt;/h2&gt;
&lt;ul&gt;
 	&lt;li&gt;&lt;a href="https://github.com/influxdata/influxdb/issues/8426"&gt;#8426&lt;/a&gt;: Add &lt;code class="language-markup" style="white-space: pre-wrap;"&gt;parse-multivalue-plugin&lt;/code&gt; to allow users to choose how multivalue plugins should be handled by the &lt;code class="language-markup" style="white-space: pre-wrap;"&gt;collectd&lt;/code&gt; service.&lt;/li&gt;
 	&lt;li&gt;&lt;a href="https://github.com/influxdata/influxdb/issues/8548"&gt;#8548&lt;/a&gt;: Allow panic recovery to be disabled when
investigating server issues.&lt;/li&gt;
 	&lt;li&gt;&lt;a href="https://github.com/influxdata/influxdb/pull/8592"&gt;#8592&lt;/a&gt;: Mutex profiles are now available.&lt;/li&gt;
 	&lt;li&gt;&lt;a href="https://github.com/influxdata/influxdb/pull/8854"&gt;#8854&lt;/a&gt;: Report the task status for a query.&lt;/li&gt;
 	&lt;li&gt;&lt;a href="https://github.com/influxdata/influxdb/issues/8830"&gt;#8830&lt;/a&gt;: Separate importer log statements to stdout and stderr.&lt;/li&gt;
 	&lt;li&gt;&lt;a href="https://github.com/influxdata/influxdb/issues/8690"&gt;#8690&lt;/a&gt;: Implicitly decide on a lower limit for fill queries when none is present.&lt;/li&gt;
&lt;/ul&gt;
</description>
      <pubDate>Tue, 14 Nov 2017 04:00:56 -0700</pubDate>
      <link>https://www.influxdata.com/blog/whats-new-influxdb-oss-1-4/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/whats-new-influxdb-oss-1-4/</guid>
      <category>Product</category>
      <category>Developer</category>
      <author>Ryan Betts (InfluxData)</author>
    </item>
    <item>
      <title>InfluxDB Internals 101 - Part One</title>
      <description>&lt;p&gt;Paul Dix led a series of internal InfluxDB 101 sessions to teach newcomers InfluxDB internals. I learned a lot from the talks and want to share the content with the community. I’m also writing this to organize my own understanding of InfluxDB and to perhaps help others who want to learn how InfluxDB is architected. A lot of this information is gathered from InfluxDB documentation as well — the goal with this series is to present a consolidated overview of the InfluxDB architecture.&lt;/p&gt;

&lt;p&gt;There’s a lot to digest so it’s presented in three parts. This first post explains the data model and the write path. Post two explains the query path. Post three explains InfluxDB Enterprise clustering.&lt;/p&gt;
&lt;h2 id="seriestableofcontents"&gt;Series Table of Contents&lt;/h2&gt;
&lt;ol&gt;
 	&lt;li&gt;Data model and write path: adding data to InfluxDB
&lt;ul&gt;
 	&lt;li&gt;Data model terminology&lt;/li&gt;
 	&lt;li&gt;Receiving points from clients&lt;/li&gt;
 	&lt;li&gt;Persisting points to storage&lt;/li&gt;
 	&lt;li&gt;Compacting persisted points&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
 	&lt;li&gt;Query path: reading data from InfluxDB
&lt;ul&gt;
 	&lt;li&gt;Indexing points for query&lt;/li&gt;
 	&lt;li&gt;A note on TSI (on disk indexes)&lt;/li&gt;
 	&lt;li&gt;Parsing and planning&lt;/li&gt;
 	&lt;li&gt;Executing queries&lt;/li&gt;
 	&lt;li&gt;A note on IFQL&lt;/li&gt;
 	&lt;li&gt;DELETE and DROP - removing data from InfluxDB&lt;/li&gt;
 	&lt;li&gt;Updating points&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
 	&lt;li&gt;Clustering: InfluxDB Enterprise
&lt;ul&gt;
 	&lt;li&gt;Understanding the meta-service&lt;/li&gt;
 	&lt;li&gt;Understanding data-nodes&lt;/li&gt;
 	&lt;li&gt;Understanding data distribution and replication&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="datamodelandwritepathaddingdatatoinfluxdb"&gt;Data model and write path: adding data to InfluxDB&lt;/h2&gt;
&lt;h3 id="datamodelandterminology"&gt;Data model and terminology&lt;/h3&gt;
&lt;p&gt;An InfluxDB database stores &lt;code&gt;points&lt;/code&gt;. A point has four components: a &lt;code&gt;measurement&lt;/code&gt;, a &lt;code&gt;tagset&lt;/code&gt;, a &lt;code&gt;fieldset&lt;/code&gt;, and a &lt;code&gt;timestamp&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;measurement&lt;/code&gt; provides a way to associate related points that might have different &lt;code&gt;tagsets&lt;/code&gt; or &lt;code&gt;fieldsets&lt;/code&gt;. The &lt;code&gt;tagset&lt;/code&gt; is a dictionary of key-value pairs to store metadata with a point. The &lt;code&gt;fieldset&lt;/code&gt; is a set of typed scalar values — the data being recorded by the point.&lt;/p&gt;

&lt;p&gt;The serialization format for points is defined by the &lt;a href="https://docs.influxdata.com/influxdb/v1.8/write_protocols/"&gt;line protocol&lt;/a&gt; (which includes additional examples and explanations if you’d like to read more detail). An example point from the specification helps to explain the terminology:&lt;/p&gt;

&lt;p&gt;’’ temperature,machine=unit42,type=assembly internal=32,external=100 1434055562000000035&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;measurement&lt;/code&gt; is &lt;em&gt;temperature&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;tagset&lt;/code&gt; is &lt;em&gt;machine=unit42,type=assembly&lt;/em&gt;. The keys, &lt;em&gt;machine&lt;/em&gt; and &lt;em&gt;type&lt;/em&gt;, in the &lt;code&gt;tagset&lt;/code&gt; are called &lt;code&gt;tag keys&lt;/code&gt;. The values, &lt;em&gt;unit42&lt;/em&gt; and &lt;em&gt;assembly&lt;/em&gt;, in the &lt;code&gt;tagset&lt;/code&gt; are called &lt;code&gt;tag values&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;fieldset&lt;/code&gt; is &lt;em&gt;internal=32,external=100&lt;/em&gt;. The keys, &lt;em&gt;internal&lt;/em&gt; and &lt;em&gt;external&lt;/em&gt;, in the &lt;code&gt;fieldset&lt;/code&gt; are called &lt;code&gt;field keys&lt;/code&gt;. The values, &lt;em&gt;32&lt;/em&gt; and &lt;em&gt;100&lt;/em&gt;, in the &lt;code&gt;fieldset&lt;/code&gt; are called &lt;code&gt;field values&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each point is stored within exactly one &lt;code&gt;database&lt;/code&gt; within exactly one &lt;code&gt;retention policy&lt;/code&gt;. A &lt;code&gt;database&lt;/code&gt; is a container for users, retention policies, and points. A &lt;code&gt;retention policy&lt;/code&gt; configures how long InfluxDB keeps points (duration), how many copies of those points are stored in the cluster (replication factor), and the time range covered by shard groups (shard group duration). The &lt;code&gt;retention policy&lt;/code&gt; makes it easy for users (and efficient for the database) to drop older data that is no longer needed. This is a common pattern in time series applications.&lt;/p&gt;

&lt;p&gt;We’ll explain &lt;code&gt;replication factor&lt;/code&gt;, &lt;code&gt;shard groups&lt;/code&gt;, and&lt;code&gt;shards&lt;/code&gt; later when we describe how the write path works in InfluxDB.&lt;/p&gt;

&lt;p&gt;There’s one additional term that we need to get started: &lt;code&gt;series&lt;/code&gt;. &lt;span style="font-size: 16px;"&gt;A series is a group of points that share a&lt;/span&gt; &lt;code class="language-markup" style="font-style: inherit; font-weight: inherit;"&gt;measurement&lt;/code&gt;&lt;span style="font-size: 16px;"&gt; + &lt;/span&gt;&lt;code class="language-markup" style="font-style: inherit; font-weight: inherit;"&gt;tag set&lt;/code&gt;&lt;span style="font-size: 16px;"&gt; + &lt;/span&gt;&lt;code class="language-markup" style="font-style: inherit; font-weight: inherit;"&gt;field key&lt;/code&gt;&lt;span style="font-size: 16px;"&gt;. &lt;/span&gt;&lt;/p&gt;

&lt;p&gt;You can refer to the &lt;a href="https://docs.influxdata.com/influxdb/v1.8/concepts/glossary/"&gt;documentation glossary&lt;/a&gt; for these terms or others that might be used in this blog post series.&lt;/p&gt;
&lt;h3 id="receivingpointsfromclients"&gt;Receiving Points from Clients&lt;/h3&gt;
&lt;p&gt;Clients POST points (in line protocol format) to InfluxDB’s HTTP &lt;code&gt;/write&lt;/code&gt; endpoint. Points can be sent individually; however, for efficiency, most applications send points in batches. A typical batch ranges in size from hundreds to thousands of points. The POST specifies a database and an optional retention policy via query parameters. If the retention policy is not specified, the default retention policy is used. All points in the body will be written to that database and retention policy. Points in a POST body can be from an arbitrary number of series; points in a batch do not have to be from the same measurement or tagset.&lt;/p&gt;

&lt;p&gt;When the database receives new points, it must (1) make those points durable so that they can be recovered in case of a database or server crash and (2) make the points queryable. This post focuses on the first half, making points durable.&lt;/p&gt;
&lt;h3 id="persistingpointstostorage"&gt;Persisting Points to Storage&lt;/h3&gt;
&lt;p&gt;To make points durable, each batch is written and &lt;code&gt;fsynced&lt;/code&gt; to a write ahead log (&lt;code&gt;WAL&lt;/code&gt;). The &lt;code&gt;WAL&lt;/code&gt; is an append only file that is only read during a database recovery. For space and disk IO efficiency, each batch in the &lt;code&gt;WAL&lt;/code&gt; is compressed using &lt;a href="http://google.github.io/snappy/"&gt;snappy compression&lt;/a&gt; before being written to disk.&lt;/p&gt;

&lt;p&gt;While the &lt;code&gt;WAL&lt;/code&gt; format efficiently makes incoming data durable, it is an exceedingly poor format for reading — making it unsuitable for supporting queries. To allow immediate query ability of new data, incoming points are also written to an in-memory &lt;code&gt;cache&lt;/code&gt;. The &lt;code&gt;cache&lt;/code&gt; is an in-memory data structure that is optimized for query and insert performance. The &lt;code&gt;cache&lt;/code&gt; data structure is a map of &lt;code&gt;series&lt;/code&gt; to a time-sorted list of fields.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;WAL&lt;/code&gt; makes new points durable. The &lt;code&gt;cache&lt;/code&gt; makes new points queryable. If the system crashes or shut down before the &lt;code&gt;cache&lt;/code&gt; is written to &lt;code&gt;TSM&lt;/code&gt; files, it is rebuilt when the database starts by reading and replaying the batches stored in the &lt;code&gt;WAL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The combination of &lt;code&gt;WAL&lt;/code&gt; and &lt;code&gt;cache&lt;/code&gt; works well for incoming data but is insufficient for long-term storage. Since the &lt;code&gt;WAL&lt;/code&gt; must be replayed on startup, it is important to constrain it to a reasonable size. The &lt;code&gt;cache&lt;/code&gt; is limited to the size of RAM, which is also undesirable for many time series use cases. Consequently, data needs to be organized and written to long-term storage blocks on disk that are size-efficient (so that the database can store a lot of points) and efficient for query.&lt;/p&gt;

&lt;p&gt;Time series queries are frequently aggregations over time — scans of points within a bounded time range that are then reduced by a summary function like mean, max, or moving windows. Columnar database storage techniques, where data is organized on disk by column and not by row, fit this query pattern nicely. Additionally, columnar systems compress data exceptionally well, satisfying the need to store data efficiently. There is a lot of literature on column stores. &lt;a href="https://searchdatamanagement.techtarget.com/definition/columnar-database"&gt;Columnar-oriented Database Systems&lt;/a&gt; is one such overview.&lt;/p&gt;

&lt;p&gt;Time series applications often evict data from storage after a period of time. Many monitoring applications, for example, will store the last month or two of data online to support monitoring queries. It needs to be efficient to remove data from the database if a configured time-to-live expires. Deleting points from columnar storage is expensive, so InfluxDB additionally organizes its columnar format into time-bounded chunks. When the time-to-live expires, the time-bounded file can simply be deleted from the filesystem rather than requiring a large update to persisted data.&lt;/p&gt;

&lt;p&gt;Finally, when InfluxDB is run as a clustered system, it replicates data across multiple servers for availability and durability in case of failures.&lt;/p&gt;

&lt;p&gt;The optional time-to-live duration, the granularity of time blocks within the time-to-live period, and the number of replicas are configured using an InfluxDB &lt;code&gt;retention policy&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CREATE RETENTION POLICY &amp;lt;retention_policy_name&amp;gt; ON &amp;lt;database_name&amp;gt; DURATION &amp;lt;duration&amp;gt; REPLICATION &amp;lt;n&amp;gt; [SHARD DURATION &amp;lt;duration&amp;gt;] [DEFAULT]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;duration&lt;/code&gt; is the optional time to live (if data should not expire, set &lt;code&gt;duration&lt;/code&gt; to &lt;code&gt;INF&lt;/code&gt;). &lt;code&gt;SHARD DURATION&lt;/code&gt; is the granularity of data within the expiration period. For example, a one- hour &lt;code&gt;shard duration&lt;/code&gt; with a 24 hour &lt;code&gt;duration&lt;/code&gt; configures the database to store 24 one-hour shards. Each hour, the oldest shard is expired (removed) from the database. Set &lt;code&gt;REPLICATION&lt;/code&gt; to configure the replication factor — how many copies of a shard should exist within a cluster.&lt;/p&gt;

&lt;p&gt;Concretely, the database creates this physical organization of data on disk:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;'' Database director  /db
    '' Retention Policy directory /db/rp
        '' Shard Group (time bounded). (Logical)
            '' Shard directory (db/rp/Id#)
                '' TSM0001.tsm (data file)
                '' TSM0002.tsm (data file)
                '' …
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The in-memory &lt;code&gt;cache&lt;/code&gt; is flushed to disk in the TSM format. When the flush completes, flushed points are removed from the &lt;code&gt;cache&lt;/code&gt; and the corresponding &lt;code&gt;WAL&lt;/code&gt; is truncated. (The WAL and cache are also maintained per-shard.) The TSM data files store the columnar-organized points. Once written, a TSM file is immutable. A detailed description of the TSM file layout is available in the [InfluxDB documentation].&lt;/p&gt;
&lt;h3 id="compactingtsmdata"&gt;Compacting TSM Data&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;cache&lt;/code&gt; is a relatively small amount of data. The TSM columnar format works best when it can store long runs of values for a series in a single block. A longer run produces both better compression and reduces seeks to scan a field for query. The TSM format is based heavily on log-structured merge-trees. New (&lt;code&gt;level one&lt;/code&gt;) TSM files are generated by cache flushes. These files are later combined (&lt;code&gt;compacted&lt;/code&gt;) into level two files. Level two files are further combined into &lt;code&gt;level three&lt;/code&gt; files. Additional levels of compaction occur as the files become larger and eventually become cold (the time range they cover is no longer hot for writes.) The documentation reference above offers a detailed description of compaction.&lt;/p&gt;

&lt;p&gt;There’s a lot of logic and sophistication in the TSM compaction code. However, the high-level goal is quite simple: organize values for a series together into long runs to best optimize compression and scanning queries.&lt;/p&gt;
&lt;h2 id="concludingpartone"&gt;Concluding Part One&lt;/h2&gt;
&lt;p&gt;In summary, batches of &lt;code&gt;points&lt;/code&gt; are POSTed to InfluxDB. Those batches are snappy compressed and written to a &lt;code&gt;WAL&lt;/code&gt; for immediate durability. The points are also written to an in-memory &lt;code&gt;cache&lt;/code&gt; so that newly written points are immediately queryable. The &lt;code&gt;cache&lt;/code&gt; is periodically flushed to &lt;code&gt;TSM&lt;/code&gt; files. As &lt;code&gt;TSM&lt;/code&gt; files accumulate, they are combined and &lt;code&gt;compacted&lt;/code&gt; into higher level &lt;code&gt;TSM&lt;/code&gt; files. &lt;code&gt;TSM&lt;/code&gt; data is organized into &lt;code&gt;shards&lt;/code&gt;. The time range covered by a &lt;code&gt;shard&lt;/code&gt; and the replication factor of a &lt;code&gt;shard&lt;/code&gt; in a clustered deployment are configured by the &lt;code&gt;retention policy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Hopefully this post helps to explain how InfluxDB receives and persists incoming writes. In the next post, we’ll discuss how the system supports query, update, and delete operations.&lt;/p&gt;
</description>
      <pubDate>Fri, 27 Oct 2017 04:00:12 -0700</pubDate>
      <link>https://www.influxdata.com/blog/influxdb-internals-101-part-one/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/influxdb-internals-101-part-one/</guid>
      <category>Product</category>
      <category>Use Cases</category>
      <category>Developer</category>
      <author>Ryan Betts (InfluxData)</author>
    </item>
    <item>
      <title>Why I Joined InfluxData - Ryan Betts</title>
      <description>&lt;p&gt;Ever take a long break and think about changing careers or industries only to realize that you still absolutely love building high-performance databases? I did, and I’m thrilled to have recently joined InfluxData to continue that path. I took seven months off and in early March joined InfluxData to lead the team that builds InfluxDB - the ‘I’ in the &lt;a href="https://w2.influxdata.com/open-source/"&gt;TICK&lt;/a&gt; stack.&lt;/p&gt;

&lt;p&gt;I spent eight years as a founding developer and then CTO at VoltDB building a high-velocity ACID relational database. I’ve thought a lot about high-velocity data, and I sought out and joined InfluxData very intentionally. InfluxData is building the right tools for the right users in the right way.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Making high-velocity data simple&lt;/em&gt; for developers is an unsolved problem that needs to be solved. Real-time metrics, events, and interactions are the heart of large-scale operational workloads and critical to creating value from IoT. Monetizing IoT often requires meaningful real-time action on real-time data. It’s still too hard to get right.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Easy for developers&lt;/em&gt; means thinking about the full stack, not just the database. Teams that set off to build high-performance data applications can quickly lose themselves in the weeds of integration, interoperability, and complex distributed systems management. It is still too hard to do simply.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Open source is the foundation&lt;/em&gt; for modern infrastructure. InfluxData is working hard to build a sustainable business (so we’ll be here for you in the future) around our open source tools and stacks. We’re committed to our community and determined to build the heart of our systems in the open.&lt;/p&gt;

&lt;p&gt;Finally, a talent-driven business must recruit based on skills and experience, not location. InfluxData &lt;em&gt;embraces distributed development teams&lt;/em&gt;. Why should your employer dictate your neighborhood? (&lt;a href="https://w2.influxdata.com/careers/"&gt;Recruiting pitch here&lt;/a&gt;: just email me!)&lt;/p&gt;

&lt;p&gt;I feel a lot of gratitude for the offer to join InfluxData. The people here are amazingly welcome and focused on the company’s mission. There is a lot of work to do, and I look forward to sharing with you what we learn and build over the next few years.&lt;/p&gt;
</description>
      <pubDate>Thu, 30 Mar 2017 04:01:00 -0700</pubDate>
      <link>https://www.influxdata.com/blog/joining-influxdata/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/joining-influxdata/</guid>
      <category>Developer</category>
      <category>Company</category>
      <author>Ryan Betts (InfluxData)</author>
    </item>
  </channel>
</rss>
