| Crates.io | block-db |
| lib.rs | block-db |
| version | 0.2.0 |
| created_at | 2024-11-15 22:11:37.845968+00 |
| updated_at | 2025-03-21 06:29:32.212145+00 |
| description | Local, multi-threaded, durable byte DB. |
| homepage | |
| repository | https://gitlab.com/robertlopezdev/block-db |
| max_upload_size | |
| id | 1449664 |
| size | 141,131 |
Local, multi-threaded, durable byte DB.
cargo add block-db
A BlockDB manages a write-ahead log (WAL) and a collection of DataFiles. Each DataFile maintains its own WAL and a binary file that stores DataBlocks — each composed of one or more chunks.
DataFile is configurable via max_file_size.DataBlock is configurable via chunk_size.Each DataBlock is associated with a BlockKey, allowing BlockDB to function as a persistent, atomic key-value store for arbitrary byte data.
BlockDB is designed to be durable, minimal, and predictable. Unlike many modern storage engines, it avoids hidden memory buffers, background threads, or delayed writes — making it ideal for systems where correctness and resource footprint matter more than raw throughput.
A key benefit of this approach is an almost nonexistent memory footprint, along with an ergonomic and reliable foundation for building higher-level DBMS layers.
This project is still in its early stages, and as it's developed alongside other database systems, major optimizations and refinements will continue to be made over time.
use block_db::{batch::BatchResult, BlockDB};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut block_db = BlockDB::open("./db", None).await?;
// Write bytes
let block_key_one = block_db.write(b"Hello").await?;
// Free bytes
let freed_bytes = block_db.free(&block_key_one).await?;
// 4_096 (Default `chunk_size`)
println!("{freed_bytes}");
// Write bytes in the previously freed space
let block_key_two = block_db.write(b"World!").await?;
// Batch writes then frees
let BatchResult {
freed_bytes,
new_block_keys,
} = block_db
.batch(vec![b"Hallo", b"Welt!"], vec![&block_key_two])
.await?;
// 4_096 (Default `chunk_size`)
println!("{freed_bytes}");
// Read Bytes
// None
println!("{:?}", block_db.read(&block_key_one).await?);
// None
println!("{:?}", block_db.read(&block_key_two).await?);
// Some(b"Hallo")
println!("{:?}", block_db.read(&new_block_keys[0]).await?);
// Some(b"Welt!")
println!("{:?}", block_db.read(&new_block_keys[1]).await?);
// Compact `DataFile`s by removing all free `DataBlock`s
block_db.compact_data_files().await?;
Ok(())
}
This sections contains very useful information on how the Database works, but does not cover syntax in depth, for that: Full crate documentation can be found here at docs.rs.
When constructing a BlockDB, you can provide BlockDBOptions to configure two key parameters:
chunk_size: The size (in bytes) of a chunk within a DataBlock.
Default: 4_096
max_file_size: The maximum size (in bytes) of a DataFile.
Default: 4_096_000_000
max_file_size can be changed later by re-opening the BlockDB with a new BlockDBOptions.
chunk_size, however, cannot be changed after the initial creation of the database.
These options are stored on disk in JSON format. After the initial creation, you may pass None to BlockDB::open, and it will automatically load the previously stored options.
The max_file_size option can be a bit misleading. Rather than being a strict size limit, it functions more like a soft threshold—and even then, it may only be exceeded once per DataFile.
Consider a fresh BlockDB instance with max_file_size set to 10 GB. Here's how writes are distributed:
1: Write 10 GB
2: Write 1 GB
3: Write 25 GB
4: Write 1 GB (three times)
Resulting Distribution
DataFile(1):
└── DataBlock(10 GB)
DataFile(2):
├── DataBlock(1 GB)
└── DataBlock(25 GB)
DataFile(3):
├── DataBlock(1 GB)
├── DataBlock(1 GB)
└── DataBlock(1 GB)
DataFiles?There is no internal-index of the creation order of DataFiles, so if there are multiple non-full DataFiles, the first detected non-full DataFile is written to; if this is a feature you would like, to write to a specific DataFile, please create a issue or PR.
Some methods are annotated in their doc-comments as either Non-corruptible or Corruptible. If a method is marked Corruptible, it's important to understand how to handle potential corruption scenarios.
If you encounter an Error::Corrupted, it will only come from a method marked as Corruptible. This typically indicates an issue with filesystem (FS) or hardware stability, which has caused an operation to fail and left the system in a corrupted state.
Before proceeding, ensure that the FS and hardware are stable. Then, attempt to recover by calling BlockDB::uncorrupt, passing in the action extracted from the Error::Corrupted { action, .. }. This operation may also fail, but it will continue to return Error::Corrupted, allowing you to retry uncorruption multiple times safely and without overlap.
In the rare event that one or more DataFiles become deadlocked during an uncorrupt attempt, this signals a more serious issue—likely a problem with the write-ahead log (WAL) or the binary file itself. In such cases, automatic recovery is no longer possible.
Optimizations
DataBlock integrity feature
Open to any contributions. All tests must pass, and the new features or changes should "make sense" based on the current API.
MIT License
Copyright (c) 2024 Robert Lopez
See LICENSE.md
I plan to continue maintaining this project for the foreseeable future.