Crates.io | depot |
lib.rs | depot |
version | 0.2.0 |
source | src |
created_at | 2018-11-11 00:07:32.644873 |
updated_at | 2018-11-12 19:12:31.373525 |
description | A (disk) persistent queue library |
homepage | https://github.com/longshorej/depot |
repository | https://github.com/longshorej/depot |
max_upload_size | |
id | 95972 |
size | 53,684 |
Depot is a persistent queue library. You can store items on disk and later retrieve them as an ordered stream. An item is a collection of bytes (u8
) and is assigned a monotonically increasing id. The ids are not necessarily sequential.
It's important to note that Depot is focused strictly on low-level storage. Replication and remote access are outside the scope of Depot.
Depot is currently under development but should be "finished" within a few weeks. Development happens on Saturday and Sunday afternoons, CDT.
extern crate depot;
use depot::Queue;
fn main() {
// Create a queue that writes data into /tmp/my-queue (a directory)
let mut queue = Queue::new("/tmp/my-queue");
// Append an item
let message = format!("the quick brown fox jumped over the lazy dog");
let data = message.as_bytes();
queue.append(&data).unwrap();
queue.sync().unwrap();
// Read all of the items and print them
let mut stream = queue.stream(None).unwrap();
while let Some(item) = stream.next() {
println!("read item: {:?}", item.unwrap());
}
}
Depot stores its data in plain files using a binary encoding. An escape mechanism handles collisions on the record separator and failure bytes.
Each record stored in Depot costs a constant byte of overhead, plus ~2% overhead for the encoding mechanism. In the worst case, an item may require 100% of its size to store, if all of its bytes consist of those that need to be escaped. In general, this may increase by four bytes per item if a CRC mechanism is added to the implementation. Additionally, truncated items, which can occur due to power loss or crash, result in two bytes being added to them during recovery.
When opening the queue for appending, it reads the last byte of the file. If it's a 10, the presumption is that the system hasn't crashed.
However, if it isn't a 10 (and the file is not empty), Depot assumes that the previous writer has crashed, and it appends two 45 values, followed by 10. The API allows readers to differentiate between items that were fully written and those that were potentially only partially written. Note that it is not possible for these values to occur in an item's encoded payload, as they are translated to other values via an escape/control byte mechanism.
The low level primitive, Section, is largely limited by disk I/O speed. For a very flawed initial test, given a Lenovo Thinkpad, i7-6600U, with a consumer-grade SSD, 50 byte payloads, about 4.7M reads/sec can be performed by a single reader. This translates to ~300MB/sec. For a writer, given the same constraints, about 1.7M writes/sec, translating to ~110MB/sec. Be sure to take these measurements with a grain of salt.
The primary interface, Queue, has similar performance characteristics but measurements haven't been done yet.
Given its append only design, it should also perform well with "spinning rust" disks.
Multiple concurrent writers are not supported. A library such as semalock can be used if coordination between processes is required, but it's better to use messaging and a single writer if possible.
Conceptually, yes, but this hasn't been implemented yet. Given that queue's are split into files that contain a bounded number of items, each of these files can be rewritten and then atomically renamed over the old section. On Linux, readers that may have the old file open will continue to work until they release their file descriptor.
A single Depot queue can technically store ~3.8PB of data, given a limit of 1.9B files at ~2GB each. This is because depot uses a 64bit offset for efficiently resuming from a position in the queue. 32bits are used to address the file, and 32bits to address the position in the file. You're likely to run into underlying storage limitations before this, whether that is hardware (disk size) or software (filesystem). Nothing close to this has been tested, though.
Support is planned via JNI, see the jvm directory which is working toward a proof of concept to use Depot from Java and Scala code.
Depot is licensed under the Apache License, Version 2. See LICENSE.
Jason Longshore hello@jasonlongshore.com