AWS Intends for Their New Project to be an Elasticsearch Fork

Navigate to:

Yesterday Amazon (AWS) unveiled Open Distro for Elasticsearch in a blog post along with its own companion site. In another blog post titled Keeping Open Source Open they make an argument for the motivation behind the project and the desire to create a 100% open version of Elasticsearch. The new open source version takes direct aim at Elastic’s commercial version with three headline features including security, monitoring, and SQL execution. In this post, I’ll argue that it is absolutely AWS’ intention to fork the Elastic community. I’ll also explore the implications for the Elasticsearch community and Elastic the company, and ponder what this could mean for other commercial open source vendors.

Is AWS an upstanding OSS citizen?

Based on a casual reading of the AWS blog post, it looks like they have nothing but the best of intentions. They’re offering freely available open source versions of features that were previously only available in Elastic’s proprietary version. They’re also responding to their customers’ concerns about Elastic intermingling commercially licensed code with the Apache 2.0 licensed code in the repository. They make an argument for why open source needs to be open and available. They even claim that this isn’t intended as a fork and they’ll be attempting to contribute their changes upstream.

While I agree with their take on the comingling of commercial code with open source being a problem, I think there’s quite a bit here that’s disingenuous. They can’t for a moment believe that Elastic will accept their upstream changes. Elastic obviously won’t take their contributions because they’re not in the scope of the open source project and only in the scope of their commercial offering.

Adding to this is AWS’ website for this project, which has separate community and contributing tabs. They also, not so subtly, accuse Elastic of being poor stewards of the open source project along with doing a bait and switch on the community:

…customers must be able to trust that open source projects stay open. The maintainers of open source projects have the responsibility of keeping the source distribution open to everyone and not changing the rules midstream. When important open source projects that AWS and our customers depend on begin restricting access, changing licensing terms, or intermingling open source and proprietary software, we will invest to sustain the open source project and community.

Their post attempts to put their efforts on the hallowed ground of protecting open source while throwing Elastic (the company) under the bus. It’s clear from all of this that they have every intention of this becoming a fork with a life of its own. You don’t get a project owner to collaborate with you and accept your pull requests by coming out with a blog post that accuses them of doing a disservice to the community they fostered with their free code and creation. If they actually don’t intend this to be a fork, they have a very odd way of showing it.

Even more surprising is that they chose to try to use “Elasticsearch” in the name of the project. I predict they’ll be picking a new name for this in very short order, driven either by a cease and desist from Elastic or clearer thinking from their team. Not choosing a new and distinct name is actually a disservice to the community because it confuses things, which was their primary complaint about Elastic’s XPack code in the open repo.

While the individuals working on this at AWS may believe in what they’re doing here, the whole thing is tainted by AWS’ clear commercial ambitions. Here’s a quote from the post about their beliefs on open source:

At AWS, we believe that maintainers of an open source project have a responsibility to ensure that the primary open source distribution remains open and free of proprietary code so that the community can build on the project freely, and the distribution does not advantage any one company over another.

Taken at face value, many developers would agree with this sentiment. However, AWS is making sure that the open source version actually does favor one vendor over another. By open sourcing all of Elastic’s commercial functionality, any vendor that can buy hardware, bandwidth, and data centers at scale will have an advantage over others. AWS has a clear advantage here over Elastic.

This is like Microsoft giving away Internet Explorer for free in the 90’s and claiming it was the right thing to do for their users. They were taking advantage of their Windows platform monopoly to get wider distribution for their browser software to take out Netscape. Here AWS is taking advantage of their market leader position in hosted infrastructure and services to expand their footprint into additional hosted services (Elasticsearch). They’re using profits from that leadership position to invest in building free software, which under normal circumstances wouldn’t be viable. If you look at this in combination with their existing customer base and things like IAM, this is a monopolistic move.

Investing in open software at a loss to deposition a leader and commoditize the software is exactly what Google did with Kubernetes. Google Cloud needed a way to compete with the market leader, AWS. Kubernetes represents a perfect strategy for commoditizing the data center software in a free platform. Leading the development of that platform meant that Google was first to market with a solid Kubernetes offering. While having Kubernetes in open source is likely a net positive for developers at large, Google still has a commercial motive for its development. Incidentally, GKE continues to be the best way to run Kubernetes and I’m certain they’ve picked up many customers as a result of their open source strategy.

Finally, why is this the project that AWS is open sourcing? Last year they unveiled DocumentDB, a service targeting MongoDB users. It’s API compatible, but it’s not MongoDB. So they created a whole new implementation. Why isn’t that implementation also open source to encourage a healthy and growing MongoDB community? My inner cynic says that it’s because there was no need to do so.

All of this is just good business for Amazon so it makes perfect sense. However, I would prefer a more honest approach and a realistic viewpoint from their team on what this project really is: a fork of Elasticsearch.

What does this mean for Elastic and the community?

Of course, all of this is really just sour grapes. Who cares if AWS’ motives are pure, as long as they’re putting everything out in the open under Apache 2, right? Maybe, but maybe not. We have to consider what this might mean for the Elasticsearch community and the broader open source vendor community who will think long and hard about what they open source and what they close source.

Say this goes on to have some degree of popularity. That will effectively bifurcate the community. Once AWS picks a real name for this project, you’ll get blog posts and questions on Stack Overflow that ask questions either about Elasticsearch or OpenDistroES (or whatever name they pick). You’ll have conferences organized by Elastic and other ones organized by community members for OpenDistro (since it almost certainly won’t get coverage at Elastic’s event).

Early on, having the community fracture won’t be that big of a deal. The APIs are still the same, so it’s all kind of the same. However, what happens as the two projects diverge more and more? You’ll get different APIs, different operational capabilities and behaviors, and ultimately two different things. Although maybe that’s good too? More software for the software gods! But it does create an issue with developers building their career on these tools. Larger communities mean more jobs and opportunities to show your expertise.

The more interesting question is what will Elastic do now that this is out there. If they do nothing or accept all those upstream changes, I predict their business will be significantly negatively impacted. My guess is that Elastic will move to have more and more features in closed source and aim to offer those features either on-premise or, more likely, as cloud only services and add-ons to the core open source offering.

At that point, the set of Elastic developers that were driving the vision, features and platform forward will be focused mostly on closed source. And the Elastic community will then be looking to AWS to drive feature development forward. Management of a large-scale project is a significant effort beyond just the code side of things. Will AWS really strive to deliver all it can for the community, or just for customers of its managed service?

At that point, the righteous thing for AWS to do would be to contribute their new fork to a foundation like CNCF or Apache so that it can be taken forward. In the long term, that might be the best thing for the community, but it would obviously be ruinous for Elastic, which will cause a significant amount of uncertainty in the Elasticsearch community.

What does this mean for commercial OSS?

I’m interested in what effect this will have for commercial players in terms of what licenses they chose and how it might impact their fundraising prospects and commercial viability. From a licensing perspective, I think this bolsters my argument that source available and copyleft licenses are a disservice to the community. I think a clear distinction between what’s open and commercial is necessary with the commercial project being proper closed source. Part of AWS’ argument was that the co-mingling of the code was a primary motivator for this move.

It will also be very interesting is to see how this will play out with other open source startups in the infrastructure space. Will open source vendors become more aggressive in terms of what they keep closed? I predict that they will.

All of this is somewhat coincidental with discussions we’ve been having internally here at InfluxData around InfluxDB 2.0. Because we’re updating the major version, we’re taking another look at some features to determine if we can put more into the open source project. We want to have as much in the open as possible while continuing to build a growing business. However, we want there to be a clear line that separates open from commercial that makes sense to our community without confusion.

Much of this debate is centered around how we can build commercial offering that offers us some sort of moat against AWS or other vendors, while having an open source product that delivers real value. Of course, InfluxDB isn’t quite  as old or as popular as Redis, MongoDB, or Elasticsearch (all projects and vendors that AWS has come for), but as we build our community and grow our business we become a more appealing target.

I can say that InfluxDB 2.0 will have at least as much out in the open as InfluxDB 1.x. We’ve already released alpha versions and written about what we’re trying to achieve with InfluxDB 2.0. We’ve also decoupled our scripting and query language, Flux, from the InfluxDB project and licensed it under MIT so that it can have a life of its own. Philosophically, our preference is for liberal licenses for our open source work and we want to put as much of it out there as possible.

In the meantime, I hope AWS picks a new name for their fork. And if it gains popularity, I hope they decide to contribute it to The Apache Software Foundation or some other organizing body like that. Neither of those would be very good for Elastic the company, but at this point the cat’s out of the bag. Elastic may have to go back to the drawing board and build out a new differentiated commercial offering.