Rust Object Store Donation

Navigate to:

Today we are happy to officially announce that InfluxData has donated a generic object store implementation to the Apache Arrow project.

Using this crate, the same code can easily interact with AWS S3, Azure Blob Storage, Google Cloud Storage, local files, memory, and more by a simple runtime configuration change.

You can find the latest release on crates.io.

We expect this will accelerate the pace of innovation within the Rust ecosystem. Whether you are building a cloud-agnostic service to handle user-uploaded videos, images, and documents, a high-performance analytics system, or something else that needs access to commodity object storage, this crate can help you and we can’t wait to see what people build with it.

Why do you need an object store crate?

Aside from providing bulk data storage for many cloud-based services, we believe the future of analytic systems in particular involves querying data stored on object storage.

Object store is the generic term for what might be loosely described as an “infinite FTP server in the cloud”, that offers almost unlimited highly available and durable key-value storage on demand. Alongside virtual machines and block storage, object storage is one of the key commodity services provided by all modern cloud service providers. Examples include S3, Microsoft Azure Blob Storage, Google Cloud Storage, MinIO, Ceph Object Gateway, HDFS, and others.

To achieve this near-infinite scaling, object stores provide a subset of the functionality of traditional file systems such as NTFS or ext4. Specifically, they identify objects with a “key” and store arbitrary bytes as a value:

Rust Object Store Donation - Figure 1 Figure 1: Object stores store arbitrary bytes identified by a string key.

Unlike filesystems, object stores typically lack an explicit notion of directories, and best practice uses a restricted subset of ASCII for keys. Instead, path-like traversal is achieved using LIST operations with a prefix, and illegal character sequencers are percent-encoded.

Rust Object Store Donation - Figure 2

Figure 2: Object stores can LIST objects with a specified prefix, which can be used to group files together. In this example, asking for objects with prefix “/pictures/” results in all the .jpg objects, while asking for prefix “/parquet/” results in all the .parquet objects.

Consistently listing and traversing the quasi-directory structure encoded in the object keys across object store implementations and local file systems is one common source of frustration, as not only do filesystems behave very differently to object stores, but each of the object store implementations have their own quirks.

Having a focused, easy-to-use, high-performance, async object store library, written in idiomatic Rust, frees you from worrying about these details and lets you instead focus on your system’s logic. The underlying implementation is abstracted away from application code, and can easily be selected at runtime, allowing the same binary to run in multiple clouds.

This flexibility also facilitates local development as it allows testing against a local filesystem, or even an in-memory store, without requiring any additional binaries such as MinIO, and allowing the use of familiar tools such ls, cat or your choice of file browser.

How to use it?

Here is a simplistic example that finds the number of zeros in files that are on remote object storage:

let object_store: Arc<dyn ObjectStore> = get_object_store();

    // list all objects in the "parquet" prefix (aka directory)                                                                                                                     
    let path: Path = "parquet".try_into().unwrap();
    let list_stream = object_store
        .list(Some(&path))
        .await
        .expect("Error listing files");

    // List all files in the store                                                                                                                                                  
    list_stream
        .map(|meta| async {
            let meta = meta.expect("Error listing");

            // fetch the bytes from object store                                                                                                                                    
            let stream = object_store
                .get(&meta.location)
                .await
                .unwrap()
                .into_stream();

            // Count the zeros                                                                                                                                                      
            let num_zeros = stream
                .map(|bytes| {
                    let bytes = bytes.unwrap();
                    bytes.iter().filter(|b| **b == 0).count()
                })
                .collect::<Vec<usize>>()
                .await
                .into_iter()
                .sum::<usize>();

            (meta.location.to_string(), num_zeros)
        })
        .collect::<FuturesOrdered<_>>()
        .await
        .collect::<Vec<_>>()
        .await
        .into_iter()
        .for_each(|i| println!("{} has {} zeros", i.0, i.1));
}

Which prints out something like:

test_fixtures/parquet/1.parquet has 174 zeros
test_fixtures/parquet/2.parquet has 53 zeros

As written the code lists the files (in a paginated way) and fetches their contents in parallel. This may not be great if there are thousands of files. However, we can easily take advantage of the Rust streams and change

.collect::<FuturesOrdered<_>>()

to

.buffered(10)

Which will now limit the program to 10 GET requests in parallel.

The coolest part of the object_store crate is that the same code works for all the different object stores, and the only thing that changes is the definition of get_object_store

To read from S3:

fn get_object_store() -> Arc<dyn ObjectStore> {
    let s3 = AmazonS3Builder::new()
        .with_access_key_id(ACCESS_KEY_ID)
        .with_secret_access_key(SECRET_KEY)
        .with_region(REGION)
        .with_bucket_name(BUCKET_NAME)
        .build()
        .expect("error creating s3");

    Arc::new(s3)
}

To read from Azure:

fn get_object_store() -> Arc<dyn ObjectStore> {
    let azure = MicrosoftAzureBuilder::new()
        .with_account(STORAGE_ACCOUNT)
        .with_access_key(ACCESS_KEY)
        .with_container_name(BUCKET_NAME)
        .build()
        .expect("error creating azure");

    Arc::new(azure)
}

To read from GCP:

fn get_object_store() -> Arc<dyn ObjectStore> {
    let gcs = GoogleCloudStorageBuilder::new()
        .with_service_account_path(PATH_TO_SERVICE_ACCOUNT_JSON)
        .with_bucket_name(BUCKET_NAME)
        .build()
        .expect("error creating gcs");
    Arc::new(gcs)
}

To read from the local filesystem:

fn get_object_store() -> Arc<dyn ObjectStore> {
    let local_fs =
        LocalFileSystem::new_with_prefix(PREFIX)
          .expect("Error creating local file system");
    Arc::new(local_fs)
}

To reiterate, the major benefit is that you do not have to integrate different abstractions for the different object stores – the client code is always the same and under the covers uses the appropriate optimized implementation.

The object_store crate is also extensible which allows plugging in other object storage systems, while still retaining the ability to read files from the local filesystem, to take advantage of optimized file access offered by some systems – see GetFileResult.

A more full-featured and working example can be found in the rust_object_store_demo repository.

Why donate to Apache

The dream for Rust is the development productivity of Python or Ruby with the speed and memory efficiency of C/C++. Part of delivering this dream is ensuring that it integrates easily with the broader technology ecosystem, and in modern analytic systems this increasingly means data on object storage.

Thus, it is important to make it easy, and yet still efficient, for Rust programs to read and write data to object stores (AWS, S3, GCP). There are individual crates which implement cloud provider specific SDKs such as rusoto_s3 or Azure_storage; however, accessing the most common feature set via the same interface is often what is needed to accelerate the development of cross-cloud analytic systems. This crate is explicitly NOT meant to replace the full-blown cloud SDKs, but instead to provide a consistent object store abstraction that is portable across the many different underlying implementations.

We had exactly this requirement when we set out to develop influxdb_iox. InfluxDB and InfluxData Cloud run on AWS, GCP, Azure, and on-prem, and we needed IOx to do so as well. We could not find an existing library that suited our needs, so the InfluxData IOx team developed one within our project.

This effort was originally implemented by Rust Ecosystem Legend Carol (Nichols II Goulding) @carols10cents (primary author of the Rust Book) and heavily extended by Marco Neumann and Raphael Taylor-Davies as we crafted its integration into DataFusion.

IOx uses the Rust, Apache Arrow, Apache Parquet and DataFusion projects, which we also contribute to heavily, and it was increasingly important that IOx’s object store interactions were efficient via DataFusion. As we investigated the alternatives, we hit the point where this required deeper integration with the object store.

We hope that this donation further accelerates the creation of high-quality analytic systems in Rust and can’t wait to see what the community builds with it! We especially hope that the alignment with Apache Arrow will permit an elegantly integrated experience with libraries that can easily and efficiently read arrow-compatible files, such as parquet, CSV and newline-delimited JSON, natively from local or remote object storage. For applications that desire SQL or other higher level query engine capabilities, check out Apache Arrow DataFusion.

You can see more about the donation, and its rationale in this GitHub issue and this one as well.

What’s next

In the near term, we plan better integration with the parquet crate. In particular the async parquet reader has been explicitly developed with a generic object_store crate in mind. It currently supports projection, and row-group level predicate pushdown to minimize the data fetched from object storage, and support for page and row-level predicate pushdown is likely to land in the next release slated for the 22nd August 2022.

We also expect to continue to improve the integration with Apache Arrow DataFusion, ensuring it provides best in class support for querying data from object storage, efficiently decoupling IO from CPU-bound work, and making the most efficient use of modern multicore processors.

Finally there is an ongoing effort to move away from depending on large SDKs such as rusoto, and the Azure SDK for Rust. Whilst they have served us well, moving away from them will significantly reduce the dependency burden, simplify the implementation, and further improve consistency across the various implementations.

Join the community

We think a thriving community drives everyone forward. We encourage you to check out the crate, and lend us a hand! Try it out in your project and let us know how it goes, or find us on github here. There is a list of good open items for new comers here.

Kudos

Thank you to Raphael Taylor-Davies, Paul Dix, Nga Tran, and Marco Neumann who reviewed early versions of this document and contributed many improvements.