The Chronograf Files: The Curious Case of JavaScript’s `sort`

Will Faurot — April 5, 2016

“The world is full of obvious things which nobody by any chance ever observes.”
— Sir Arthur Conan Doyle, The Hound of the Baskervilles*

It was your average day at the InfluxData offices. The usual sounds filled the air: the clack of a ping pong ball, the ever-present drip of an overworked coffee pot. A day just like any other — or so I thought.

It all started when a new Chronograf ticket came in, where a user had reported their graphs behaving strangely.

The Bug

I give you ‘Exhibit A’, a screenshot submitted by our affected user:

chrono-out-of-order

Though this graph is quite the visual cacophony, the issue was clear immediately: Chronograf was plotting this user’s data out of order. For a product like Chronograf, this is about as critical as a bug can be. Because we’re in the business of time series data, we’d better be sure we’re respecting the laws of physics and that time continues to function as it has since the beginning of, well, time.

At this point, Chronograf had seen several releases over the better part of a year. How had we not seen behavior like this before?

The user also provided us with the sample dataset that resulted in the above graph, hereby dubbed ‘Exhibit B’. The dataset itself was large so I won’t reproduce it in full, but here is a small sample we’ll use for investigative purposes:
`timestamp value
——— ——
100000000 4
200000000 9
300000000 6
400000000 7
500000000 6
600000000 4
700000000 9
800000000 7
900000000 10
1000000000 6
1100000000 4
…`

Note: InfluxDB supports multiple time formats, though in this case we’re using Unix time. Not to spoil too much, but this will be highly relevant soon!

At first glance, the data seems reasonable. The values seem OK, and the timestamps are ordered chronologically. No cause for alarm, right?

The Fix

“Wait, you said `sort` does WHAT??'”
— Anonymous

Now I had exhibit A, a graph which explained how the bug affected the end-user. And I had exhibit B, a sample dataset I used to accurately reproduce the issue.

The next step was to ask myself a few questions which I’ve found helpful when beginning the debugging process:

1) What sections of the code does this bug touch? Who is the most likely offender?

I immediately thought of a piece of code that converts raw InfluxDB responses, which use JSON formatting, into data structures that our graphing library understands. As far as I could tell, the sample dataset was ordered correctly. It had to be something farther down the line.

2) Are there any obvious anomalies?

As far as plotting data chronologically, Chronograf seemed to function properly in all other known cases. Given the graph we saw earlier, it’s also the kind of bug you feel like you’d tend to notice.

Here is where it starts to get good: the test data was noteworthy because the first point starts on day one of what Unix considers to be the dawn of the universe. Unix timestamps represent the amount of time (usually in seconds or milliseconds) that has passed since 00:00:00 UTC, January 1, 1970.

For example, the ‘Unix time’ as of me finishing this sentence is `1458851975430`. On its own, maybe this isn’t a particularly exciting or interesting detail. It’s test data, who cares if we’re plotting points from 1970 or 2016? We do, as it turns out!

I took a hard look at the data after it had gone through the conversion process, immediately before it made its way into our graphing library. I confirmed it was out of order. Using the same small subset of data from the last section, here’s a view of the test data as it appeared to me:

`timestamp value
——— ——
100000000 4
1000000000 6
1100000000 4
200000000 9
300000000 6
400000000 7
500000000 6
600000000 4
700000000 9
800000000 7
900000000 10
…`

Wait, what? Why on earth is `1000000000` (9 zeroes) appearing before `200000000` (8 zeroes)? This new insight was enough for me locate the perpetrator, a line that looked something like this:

`// allTimestamps is a list of unique timestamps,
// e.g. [100000000, 200000000, 300000000 …]
allTimestamps.sort()`

I remembered why this line was initially added: if for any reason the data came in to the converter out of order, or merging multiple series was required, we wanted to ensure that our data was chronological before trying to plot it on a graph.

And suddenly it all made sense. Let’s try something…

If you’re on a device with a keyboard, open your browser’s developer console. On Chrome you can choose View -> Developer -> JavaScript Console and type in the following and hit enter: `[2, 10, 5].sort()`

Does the result surprise you at all? I imagine it does, as this is the output you should have seen:

`[2, 10, 5].sort()
// => [10, 2, 5]`

It boils down to this: JavaScript’s built-in `sort` function, by default, sorts its items ‘lexicographically’, i.e. alphabetically. ‘1’ comes before ‘2’ in the alphabet according to JavaScript. From this perspective, the output we just saw makes sense. Is this the default behavior we expected when sorting a list of numbers? Not in the slightest.

The fix itself was simple. Instead of using `sort` without any arguments, we provided what Mozilla calls a `compareFunction`

If you ran the previous example in your browser’s console, give this one a try:

`[2, 10, 5].sort((a, b) => a – b)`

Better, right?

OK, now we now how `sort` works. We know what to fix, but we still haven’t fully answered our original question: Why did we have to wait so long for a bug like this to make an appearance? The answer lies in the number of significant digits for each timestamp. With the provided test data, we had multiple rollovers where the timestamps not only grew in numerical size, they grew enough to require another significant digit to represent them.

We can illustrate this by using the test data, and it only requires a few points:

`timestamp value
——— ——
800000000 4
900000000 6
1000000000 4`

Try sorting these timestamps without a `compareFunction` and see the results:

`[800000000, 900000000, 1000000000].sort()
// => [1000000000, 800000000, 900000000]`

`1000000000` appears at the beginning of the list, which is technically correct according to JavaScript. But because it has an extra digit, it results in non-chronological data. Most of the data being fed into Chronograf is recent, typically from the last several years. Enough time has passed since 1970 (the beginning of Unix time) to the point where it will be quite a while before Unix timestamps need another digit to be represented. If we’re using milliseconds, Nov 21, 2286 in fact.

During normal Chronograf usage, if InfluxDB was returning Unix timestamps, they were of the same length in virtually all cases. The fact that JavaScript was treating these numbers as strings when trying to sort them was irrelevant — we’d get the same results regardless of whether we told JavaScript to explicitly sort them as numbers.

One last experiment for the console, where the results should be identical:
`// Using Three realistic InfluxDB timestamps:
[1459444040000, 1459444050000, 1459444060000].sort()
[1459444040000, 1459444050000, 1459444060000].sort((a, b) => a – b)`

Chronograf was dealing with timestamps like this in virtually all cases, explaining why it took a unique dataset for us to track down this particular bug.

Voila! Another case in the books.

The Lesson

“History is not a burden on the memory but an illumination of the soul.”
— Lord Acton

As a developer, this experience was fantastic. It was a fun, unique, mildly infuriating bug with no shortage of teaching potential. Here were my big takeaways:

  • A deep understanding of your tools will make you a better *insert thing here*. This idea spans all fields and disciplines, though it is especially important in software. The `sort` bug was able to bite us because of a gap in our knowledge, and it serves as inspiration for learning even more about how JavaScript, our most valuable tool, works. It will help inform decisions we make in the future, and maybe even prevent a few bugs.
  • Critical bugs lurk beneath the surface, and often manifest themselves under the strangest of circumstances.

For us to finally see `sort` behave in a way we didn’t expect, we needed a dataset that used a specific time format and spanned over two very specific years, 1970-1972. I’ve found that software never ceases to amaze with the sheer multitude of ways in which it can break. In this regard I have a begrudging respect for it’s raw and unbridled creativity.

If you’re interested in a more in-depth look at how `sort` works under the hood, this article from javascriptkit.com was invaluable.

What’s next? Get Started With Chronograf!

Which now sorts numerical timestamps with aplomb!

If you’d like to try Chronograf, the ‘C’ in the TICK stack, you can download the latest version here. We love reading and appreciate any and all feedback, so let us know what you think. Whether it’s bug reports, feature requests, general experience improvements, or any other thoughts or ideas you have about the product, don’t hesitate to email the team: chronograf@influxdata.com.