| Crates.io | rsdos |
| lib.rs | rsdos |
| version | 0.2.0 |
| created_at | 2025-01-26 12:33:57.519612+00 |
| updated_at | 2025-01-28 02:27:57.982834+00 |
| description | key-value store for file I/O on disk |
| homepage | https://github.com/unkcpz/rsdos |
| repository | https://github.com/unkcpz/rsdos |
| max_upload_size | |
| id | 1531299 |
| size | 176,102 |
RSDOS - ([R]u[S]ty [D]isk-[O]bject[S]tore), is a fast, server-less, rust-native disk object store for dataset management.
It handles huge datasets without breaking a sweat—whether if you’re juggling thousands of tiny files or streaming multi-gigabyte blobs. It’s not designed as a backup solution, but rather for storing millions of files in a compact and manageable way.
It packs data intelligently to maximize disk usage, deduplicates content via SHA-256 hashing.
The tool appling on-the-fly compression (zstd as default or zlib) whenever it’s beneficial—no manual tuning required.
I keep I/O straightforward with streaming-based insert and extract methods so you don’t flood your RAM when dealing with large files.
Thanks to Rust’s memory safety guarantees, RSDOS delivers great performance without the usual headaches or subtle bugs. If you’re integrating with Python, that’s covered through pyo3 bindings.
More design details can be found at design notes
You can install RSDOS using various methods. Pick whichever approach suits your workflow or distribution:
To build from source (requires Rust and Cargo):
cargo install rsdos
This compiles RSDOS locally and places the rsdos binary in your Cargo bin directory (often ~/.cargo/bin).
For systems without Rust installed, or if you prefer manual downloads:
curl, for example:
curl -LO https://github.com/unkcpz/rsdos/releases/download/vX.Y.Z/rsdos-x86_64-unknown-linux-musl.tar.gz
tar xvf rsdos-x86_64-unknown-linux-musl.tar.gz
sudo mv rsdos /usr/local/bin/
rsdos --help
If you need the Python API or want to use RSDOS via Python scripts or Jupyter notebooks, you can install the Python wrapper:
pip install rsdos
(This also provides an rsdos CLI command if the package is set up accordingly.)
Once installed, confirm everything is working by running:
rsdos --version
Manage your large file datasets through CLI:
rsdos init --pack-size=512 --compression=zstd
# [info] Container initialized at ./container
rsdos add-files --to loose ./mydata1.txt ./mydata2.bin
# abc123... - mydata1.txt: 1.2 MB
# def456... - mydata2.bin: 3.4 MB
rsdos optimize pack
# [info] Packed 2 loose objects into pack file #1
rsdos status
# [container]
# Location = ./container
# Id = 0123456789abcdef
# ZipAlgo = zstd
#
# [container.count]
# Loose = 0
# Packs = 1
# Pack Files = 1
#
# [container.size]
# Loose = 0 B
# Packs = 4.6 MB
# Packs Files = 4.6 MB
Here’s a quick-start guide for the Python API, showcasing core operations:
from rsdos import Container, CompressMode
# 1. Create a new container (or open an existing one) at a specified path:
cnt = Container("/path/to/container")
# 2. Initialize the container with desired settings
cnt.init_container(
clear=False,
pack_size_target=4 * 1024 * 1024 * 1024, # 4 GB pack size target
loose_prefix_len=2,
hash_type="sha256",
compression_algorithm="zlib+1", # zlib with level +1
)
# 3. Add objects in loose storage
num_files = 10
content_list = [b"ExampleData" + str(i).encode("utf-8") for i in range(num_files)]
hashkeys = []
for content in content_list:
hkey = cnt.add_object(content)
hashkeys.append(hkey)
# 4. Pack all loose objects for optimal storage
cnt.pack_all_loose(CompressMode.YES)
# 5. Retrieve the content of the first file
retrieved_data = cnt.get_object_content(hashkeys[0])
print("Retrieved:", retrieved_data)
Batch Insertion
files_data = [b"file1", b"file2", b"file3"]
hashkeys = cnt.add_objects_to_pack(
content_list=files_data,
compress=True
)
print("Inserted files:", hashkeys)
Streaming to and from Files
import io
# Write from a file
with open("large_file.bin", "rb") as infile:
stream_hash = cnt.add_streamed_object(infile)
print("Stored large file, hash:", stream_hash)
# Read back into a file-like object
with cnt.get_object_stream(stream_hash) as instream:
if instream:
with open("restored_file.bin", "wb") as outfile:
outfile.write(instream.read())
else:
print("Object not found in container.")
RSDOS is heavily inspired by aiidateam/disk-objectstore, this reimplementation aims to explore alternative design and performance optimizations.sled as a K/V DB (v2)io_uring (v2)packs → packed (v2)