# mmap-sync ![build](https://img.shields.io/github/actions/workflow/status/cloudflare/mmap-sync/ci.yml?branch=main) [![docs.rs](https://docs.rs/mmap-sync/badge.svg)](https://docs.rs/mmap-sync) [![crates.io](https://img.shields.io/crates/v/mmap-sync.svg)](https://crates.io/crates/mmap-sync) [![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE) `mmap-sync` is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader processes, leveraging the benefits of memory-mapped files, wait-free synchronization, and zero-copy deserialization. We're using `mmap-sync` for large-scale machine learning, detailed in our blog post: ["Every Request, Every Microsecond: Scalable machine learning at Cloudflare"](http://blog.cloudflare.com/scalable-machine-learning-at-cloudflare). ## Overview At the core of `mmap-sync` is a `Synchronizer` structure that offers a simple interface with "write" and "read" methods, allowing users to read and write any Rust struct (`T`) that implements or derives certain rkyv traits. ```rust impl Synchronizer { /// Write a given `entity` into the next available memory mapped file. pub fn write(&mut self, entity: &T, grace_duration: Duration) -> Result<(usize, bool), SynchronizerError> { … } /// Reads and returns `entity` struct from mapped memory wrapped in `ReadResult` pub fn read(&mut self) -> Result, SynchronizerError> { … } } ``` Data is stored in shared mapped memory, allowing the `Synchronizer` to "write" and "read" from it concurrently. This makes `mmap-sync` a highly efficient and flexible tool for managing shared, concurrent data access. ## Mapped Memory The use of memory-mapped files offers several advantages over other inter-process communication (IPC) mechanisms. It allows different processes to access the same memory space, bypassing the need for costly serialization and deserialization. This design allows `mmap-sync` to provide extremely fast, low-overhead data sharing between processes. ## Wait-free Synchronization Our wait-free data access pattern draws inspiration from [Linux kernel's Read-Copy-Update (RCU) pattern](https://www.kernel.org/doc/html/next/RCU/whatisRCU.html) and the [Left-Right concurrency control technique](https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/left-right-2014.pdf). In our solution, we maintain two copies of the data in separate memory-mapped files. Write access to this data is managed by a single writer, with multiple readers able to access the data concurrently. We store the synchronization state, which coordinates access to these data copies, in a third memory-mapped file, referred to as "state". This file contains an atomic 64-bit integer, which represents an `InstanceVersion` and a pair of additional atomic 32-bit variables, tracking the number of active readers for each data copy. The `InstanceVersion` consists of the currently active data file index (1 bit), the data size (39 bits, accommodating data sizes up to 549 GB), and a data checksum (24 bits). ## Zero-copy Deserialization To efficiently store and fetch data, `mmap-sync` utilizes zero-copy deserialization with the help of the [rkyv](https://rkyv.org/) library, directly referencing bytes in the serialized form. This significantly reduces the time and memory required to access and use data. The templated type `T` for `Synchronizer` can be any Rust struct implementing specified `rkyv` traits. ## Getting Started To use `mmap-sync`, add it to your `Cargo.toml` under `[dependencies]`: ```toml [dependencies] mmap-sync = "1" ``` Then, import `mmap-sync` in your Rust program: ```rust use mmap_sync::synchronizer::Synchronizer; ``` Check out the provided examples for detailed usage: * [Writer process example](examples/writer.rs): This example demonstrates how to define a Rust struct and write it into shared memory using `mmap-sync`. * [Reader process example](examples/reader.rs): This example shows how to read data written into shared memory by a writer process. These examples share a [common](examples/common/mod.rs) module that defines the data structure being written and read. To run these examples, follow these steps: 1. Open a terminal and navigate to your project directory. 2. Execute the writer example with the command `cargo run --example writer`. 3. In the same way, run the reader example using `cargo run --example reader`. Upon successful execution of these examples, the terminal output should resemble: ```shell # Writer example written: 36 bytes | reset: false # Reader example version: 7 messages: ["Hello", "World", "!"] ``` Moreover, the following files will be created: ```shell $ stat -c '%A %s %n' /tmp/hello_world_* -rw-r----- 36 /tmp/hello_world_data_0 -rw-r----- 36 /tmp/hello_world_data_1 -rw-rw---- 16 /tmp/hello_world_state ``` With these steps, you can start utilizing `mmap-sync` in your Rust applications for efficient concurrent data access across processes. ## Tuning performance Using `tmpfs` volume will reduce the disk I/O latency since it operates directly on RAM, offering faster read and write capabilities compared to conventional disk-based storage: ```rust let mut synchronizer = Synchronizer::new("/dev/shm/hello_world".as_ref()); ``` This change points the synchronizer to use a shared memory object located in a `tmpfs` filesystem, which is typically mounted at `/dev/shm` on most Linux systems. This should help alleviate some of the bottlenecks associated with disk I/O. If `/dev/shm` does not provide enough space or if you want to create a dedicated `tmpfs` instance, you can set up your own with the desired size. For example, to create a 1GB `tmpfs` volume, you can use the following command: ```shell sudo mount -t tmpfs -o size=1G tmpfs /mnt/mytmpfs ``` ## Benchmarks To run benchmarks you first need to install `cargo-criterion` binary: ```shell cargo install cargo-criterion ``` Then you'll be able to run benchmarks with the following command: ```shell cargo criterion --bench synchronizer ``` Benchmarks presented below are executed on Linux laptop with `13th Gen Intel(R) Core(TM) i7-13800H` processor and compiler flags set to `RUSTFLAGS=-C target-cpu=native`. ```shell synchronizer/write time: [250.71 ns 251.42 ns 252.41 ns] thrpt: [3.9619 Melem/s 3.9774 Melem/s 3.9887 Melem/s] synchronizer/write_raw time: [145.25 ns 145.53 ns 145.92 ns] thrpt: [6.8531 Melem/s 6.8717 Melem/s 6.8849 Melem/s] synchronizer/read/check_bytes_true time: [40.114 ns 40.139 ns 40.186 ns] thrpt: [24.884 Melem/s 24.914 Melem/s 24.929 Melem/s] synchronizer/read/check_bytes_false time: [26.658 ns 26.673 ns 26.696 ns] thrpt: [37.458 Melem/s 37.491 Melem/s 37.512 Melem/s] ```