| Crates.io | dataspool-rs |
| lib.rs | dataspool-rs |
| version | 0.2.0 |
| created_at | 2025-12-22 21:48:18.754223+00 |
| updated_at | 2026-01-08 05:28:54.275771+00 |
| description | Efficient data bundling system with indexed .spool files and SQLite vector database |
| homepage | |
| repository | https://github.com/Blackfall-Labs/dataspool-rs |
| max_upload_size | |
| id | 2000320 |
| size | 49,747 |
DataSpool is a high-performance data bundling library that eliminates filesystem overhead by concatenating multiple items (cards, images, binary blobs) into a single indexed .spool file with SQLite-based metadata and vector embeddings.
use dataspool::{SpoolBuilder, SpoolEntry};
// Create spool builder
let mut builder = SpoolBuilder::new();
// Add entries
builder.add_entry(SpoolEntry {
id: "item1".to_string(),
data: b"Item 1 data".to_vec(),
});
builder.add_entry(SpoolEntry {
id: "item2".to_string(),
data: b"Item 2 data".to_vec(),
});
// Write to file
builder.write_to_file("data.spool")?;
use dataspool::SpoolReader;
// Open spool
let reader = SpoolReader::open("data.spool")?;
// Read specific entry
let data = reader.read_entry(0)?; // Read first entry
println!("Item 0: {} bytes", data.len());
// Iterate entries
for (index, entry) in reader.iter_entries().enumerate() {
let data = entry?;
println!("Item {}: {} bytes", index, data.len());
}
use dataspool::{PersistentVectorStore, DocumentRef};
// Create persistent store
let mut store = PersistentVectorStore::new("vectors.db")?;
// Add document with embedding
let doc_ref = DocumentRef {
id: "doc1".to_string(),
file_path: "data.spool".to_string(),
source: "web-scrape".to_string(),
metadata: Some(r#"{"title": "Example"}"#.to_string()),
spool_offset: Some(0),
spool_length: Some(1024),
};
let embedding = vec![0.1, 0.2, 0.3, 0.4]; // Example embedding vector
store.add_document_ref(&doc_ref, &embedding)?;
// Search by vector similarity
let query_vector = vec![0.15, 0.25, 0.35, 0.45];
let results = store.search(&query_vector, 10)?;
for result in results {
println!("ID: {}, Score: {:.3}", result.id, result.score);
}
.spool file:
βββββββββββββββββββββββββββββββ
β Magic: "SP01" (4 bytes)β
β Version: 1 (1 byte) β
β Card Count (4 bytes)β
β Index Offset (8 bytes)β
βββββββββββββββββββββββββββββββ€
β Card 0 Data β
β Card 1 Data β
β ... β
β Card N Data β
βββββββββββββββββββββββββββββββ€
β Index: β
β [offset0, len0] β
β [offset1, len1] β
β ... β
β [offsetN, lenN] β
βββββββββββββββββββββββββββββββ
.db file (SQLite):
βββββββββββββββββββββββββββββββ
β documents table: β
β - id β
β - file_path β
β - source β
β - metadata (JSON) β
β - spool_offset β
β - spool_length β
βββββββββββββββββββββββββββββββ€
β embeddings table: β
β - doc_id β
β - vector (BLOB) β
βββββββββββββββββββββββββββββββ
SP01 (4 bytes) - Identifies spool format1 (1 byte) - Format version[offset: u64, length: u64] pairsβββββββββββββββ
β DataCard β (compressed CML)
ββββββββ¬βββββββ
β
v
βββββββββββββββ ββββββββββββββββ
β SpoolBuilderβββββ>β .spool file β
βββββββββββββββ ββββββββββββββββ
β
v
ββββββββββββββββ
β SpoolReader β
ββββββββ¬ββββββββ
β
βββββββββββββββββββββ΄ββββββββββββββββββββ
v v
ββββββββββββββββββββ ββββββββββββββββββ
β PersistentVector β β .db (SQLite) β
β Store β<ββββββββββββββββββ - documents β
ββββββββββββββββββββ β - embeddings β
ββββββββββββββββββ
Bundle thousands of documentation cards into a single file:
// Build spool from cards
let mut builder = SpoolBuilder::new();
for card in documentation_cards {
builder.add_entry(SpoolEntry {
id: card.id,
data: card.compressed_data,
});
}
builder.write_to_file("rust-stdlib.spool")?;
// Create vector index
let mut store = PersistentVectorStore::new("rust-stdlib.db")?;
for (i, embedding) in embeddings.iter().enumerate() {
store.add_document_ref(&DocumentRef {
id: format!("card_{}", i),
file_path: "rust-stdlib.spool".to_string(),
spool_offset: Some(offsets[i]),
spool_length: Some(lengths[i]),
...
}, embedding)?;
}
Store image collections with metadata:
let mut builder = SpoolBuilder::new();
for image_path in image_paths {
let data = std::fs::read(&image_path)?;
builder.add_entry(SpoolEntry {
id: image_path.file_stem().unwrap().to_string(),
data,
});
}
builder.write_to_file("images.spool")?;
Archive arbitrary binary data with fast random access:
// Write blobs
let mut builder = SpoolBuilder::new();
builder.add_entry(SpoolEntry { id: "blob1".into(), data: blob1 });
builder.add_entry(SpoolEntry { id: "blob2".into(), data: blob2 });
builder.write_to_file("blobs.spool")?;
// Random access read
let reader = SpoolReader::open("blobs.spool")?;
let blob1_data = reader.read_entry(0)?; // Direct access, no scan
Benchmark results (3,309 items, Rust stdlib documentation):
| Operation | Time | Notes |
|---|---|---|
| Build spool | ~200ms | Writing all items + index |
| Read single item | <1ms | Direct byte offset seek |
| Read all items | ~50ms | Sequential read |
| SQLite insert (1 doc) | ~0.5ms | With embedding |
| Vector search (10 results) | ~5ms | Cosine similarity + index |
| Approach | Read Speed | Storage Overhead | Random Access |
|---|---|---|---|
| Individual files | Slow (3,309 inodes) | High (4KB/file) | Yes |
| tar archive | Slow (must scan) | Low | No |
| zip archive | Fast | Medium | Yes |
| DataSpool | Fast | Minimal | Yes |
[dependencies]
dataspool = "0.1.0"
bytepunch = "0.1.0" # For compressed item decompression
dataspool
βββ bytepunch (compression)
βββ rusqlite (SQLite database)
βββ serde (serialization)
βββ thiserror (error handling)
Basic spool read/write and persistent vector store.
asyncAsync APIs for non-blocking I/O:
[dependencies]
dataspool = { version = "0.1.0", features = ["async"] }
use dataspool::async_api::AsyncSpoolReader;
let reader = AsyncSpoolReader::open("data.spool").await?;
let data = reader.read_entry(0).await?;
Add to Cargo.toml:
[dependencies]
dataspool = "0.1.0"
Or with async support:
[dependencies]
dataspool = { version = "0.1.0", features = ["async"] }
# Run all tests
cargo test
# Run with logging
RUST_LOG=debug cargo test
# Test specific module
cargo test spool
cargo test persistent_store
See examples/ directory:
build_spool.rs - Build a spool from filesread_spool.rs - Read entries from a spoolvector_search.rs - Semantic search with embeddingsRun with:
cargo run --example build_spool
cargo run --example read_spool
cargo run --example vector_search
Extracted from the SAM (Societal Advisory Module) project, where it provides the spool bundling system for knowledge base archival.
MIT - See LICENSE for details.
Magnus Trent magnus@blackfall.dev