| Crates.io | vsf |
| lib.rs | vsf |
| version | 0.2.3 |
| created_at | 2024-08-14 14:37:33.534577+00 |
| updated_at | 2026-01-16 07:11:34.466899+00 |
| description | Versatile Storage Format |
| homepage | |
| repository | https://github.com/nickspiker/vsf |
| max_upload_size | |
| id | 1337434 |
| size | 1,380,739 |
A self-describing binary format designed for optimal integer encoding and type safety.
VSF addresses a fundamental challenge in binary formats: how to efficiently encode integers of any size while maintaining O(1) skip-ability. The solution enables efficient storage of everything from a single photon's wavelength to the number of atoms in the observable universe (and yes, both fit comfortably).
Most binary formats face a tradeoff when encoding integers:
Fixed-width approach (TIFF, PNG, HDF5):
Variable-width approach (Protobuf, MessagePack):
VSF's solution - Exponential-width with explicit size markers:
Value 42: 'u' '3' 0x2A (2 decimal digits)
Value 4,096: 'u' '4' 0x10 0x00 (4 digits)
Value 2^32-1: 'u' '5' + 4 bytes (10 digits)
Value 2^64-1: 'u' '6' + 8 bytes (20 digits)
Value 2^128-1: 'u' '7' + 16 bytes (39 digits)
Value 2^256-1: 'u' '8' + 32 bytes (78 digits)
RSA-16384 prime: 'u' 'D' + 2048 bytes (4932 digits)
Actual max: 'u' 'Z' + 8 GB (~20 billion digits)
Properties:
Every binary format faces this question: "How do you encode a number when you don't know how big it will be?"
Until VSF, every format in existence picked one of these three bad answers:
Answer 0: "We'll use fixed sizes" (TIFF, PNG, HDF5)
Answer 1: "We'll use continuation bits" (Protobuf, MessagePack)
Answer 2: "We'll store the length first" (Most TLV formats)
VSF introduces Exponential Width Encoding (EWE) - a novel byte-aligned scheme where ASCII markers map directly to exponential size classes:
How it works:
0. Type: 'u' (unsigned), 'i' (signed), etc.
1. Size: ASCII character '0'-'Z'
2. Data: Exactly 2^(ASCII) bits follow
3. 0=bool, 3=8 bits, 4=16 bits, 5=32 bits, 6=64 bits, ..., Z=2^36 bits (8 GB)
Example: 'u' '5' [0x01234567]
│ │ └─ Data (2^5 bits = 32 bits = 4 bytes)
│ └─ Size class marker
└─ Type marker
Result: O(1) seekability + unbounded integers
Why this works:
Every number can be represented as mantissa × 2^exponent:
Novel properties of EWE:
Let's look at what it costs to encode numbers of different magnitudes:
Value 42: 2 bytes overhead + 1 byte data = 3 bytes total
Value 2^64-1: 2 bytes overhead + 8 bytes data = 10 bytes total
RSA-16384 prime: 2 bytes overhead + 2048 bytes = 2050 bytes total
The overhead stays negligible even for numbers larger than the universe.
Here are real-world numbers that break other formats but VSF handles trivially:
❌ Planck volumes in observable universe: ~10^185
(Needs 185 bits, Protobuf stops at 64)
✅ VSF: 'u' 'B' + 23 bytes = 25 bytes total
❌ Cryptographic keys (RSA-16384 = 2048 bytes)
JSON can't represent integers > 2^53 exactly
✅ VSF: 'u' 'D' + 2048 bytes = 2050 bytes
❌ Storing 1 million boolean flags as u64
Wastes 8 MB instead of 125 KB
✅ VSF bitpacked: 125KB (1000x smaller)
With marker 'Z' (ASCII 90), VSF can encode:
2^(2^36) = 2^68,719,476,736 possible values
That's a memory address with ~20.7 billion digits!
For context:
- Atoms in universe: ~10^80 (needs 266 bits)
- Planck volumes in universe: ~10^185 (needs 615 bits)
VSF handles all of these with **two bytes of overhead.**
You will run out of storage, memory, and life WAY before VSF hits any limits.
Today's "unreasonably large" is tomorrow's "barely sufficient":
1970s: "640KB ought to be enough for anybody" 1990s: "Why would anyone need more than 4GB?" (u32 addresses) 2010s: "2^64 is effectively infinite" (IPv6, filesystems) 2020s: Quantum computing, cosmological simulations, genomic databases hitting 2^64 limits
VSF's design principle: Stop predicting the future. Build a format that mathematically cannot impose artificial limits.
VSF is the only format that combines:
This is possible because I solved the fundamental problem: How do you easily encode the exponent of arbitrarily large numbers?
Answer: Directly, using ASCII characters as exponential size class markers (Exponential Width Encoding).
Every other format either: 0. Uses fixed exponents (hits limits, wastes space on small numbers), or
VSF is written entirely in Rust with zero wildcards in all match statements:
match self {
VsfType::u0(value) => encode_bool(value),
VsfType::u3(value) => encode_u8(value),
VsfType::u4(value) => encode_u16(value),
// ... 208 more explicit cases ...
VsfType::p(tensor) => encode_bitpacked(tensor),
}
// No _ => I forgot?
Why this matters:
Why Rust specifically? It's the only language that gives you proven:
This isn't possible in any other language:
Ways VSF can break:
That's it. Those are the only ways VSF breaks. Everything else is systematically impossible! Cool eh?
Camera RAW data, scientific sensors, and ML models often use non-standard bit depths:
// 12-bit camera RAW (common in photography)
BitPackedTensor {
shape: vec![4096, 3072], // 12.6 megapixels
bit_depth: 12,
data: packed_bytes,
}
// 18 MB vs 24 MB as a sixteen bit array
Supports 1-256 bits per element efficiently.
Hashes, signatures, and keys are first-class types:
VsfType::a(algorithm, mac_tag) // Message Authentication Code
VsfType::h(algorithm, hash) // Hash (BLAKE3, SHA-256, etc.)
VsfType::g(algorithm, signature) // Signature (Ed25519, ECDSA, RSA)
VsfType::k(algorithm, pubkey) // Public key
VSF natively supports Spirix - two's complement floating-point as an alternative to IEEE-754:
VsfType::s53(spirix_scalar) // 32-bit fraction, 8-bit exponent Scalar (F5E3)
VsfType::c64(spirix_circle) // 64-bit fractions, 16-bit exponent Circle (complex numbers!)
Why Spirix exists: IEEE-754 breaks fundamental math:
What Spirix fixes:
Undefined states that actually tell you what went wrong: Instead of IEEE's generic NaN, Spirix tracks why something became undefined:
[℘ ⬆+⬆] - You added two exploded values (whoops!)[℘ ⬇/⬇] - You divided two vanished values[℘ ⬆×⬇] - Multiplied infinity by Zero?VSF stores all 25 Scalar types (F3-F7 × E3-E7) and 25 Circle types as first-class primitives.
Store Earth coordinates with millimeter precision:
VsfType::w(WorldCoord::from_lat_lon(47.6062, -122.3321))
Uses Fuller's Dymaxion projection - 2.14mm precision in 8 bytes.
Unicode strings with global frequency table:
VsfType::x(text) // Automatically compressed
// ~36% compression on English text
// 83 MB/s encode, 100+ MB/s decode 2025 average CPU
✅ Pretty damn complete type system - 211 variants:
✅ Encoding/decoding
✅ Huffman text compression (requires text feature)
cargo build --features textVsfType::l for ASCII labels instead of VsfType::x✅ Cryptographic support (requires crypto feature)
✅ Camera RAW builders
✅ Builder pattern with dot notation
raw.camera.iso_speed = Some(800.0)✅ Zero-copy mmap support
'p' [bit_depth] [ndim] [shapes...] then mmap the data✅ Hierarchical field names
"camera.sensor", "raw.calibration", etc.builder.add_section("camera.sensor", items)✅ Type-safe schema system
(d"field_name":value)✅ Two-tier parsing API
VsfSection::parse): Schema-agnostic, extracts raw data, no validationSectionBuilder::parse): Schema-validated, type-safe, modify→re-encode workflow🚧 Structured capability tokens - Formal capability types built on existing crypto primitives (g, k, h, a)
Note on File I/O: VSF gives you bytes - do whatever you want with them:
let bytes = encode(&my_data)?;
std::fs::write("data.vsf", &bytes)?; // Or network, database, embedded, etc.
File I/O is intentionally out of scope - you know your use case better than we do. Network streaming? Memory-mapped regions? SQLite blobs? Custom compression? VSF doesn't make opinions about your storage layer
use vsf::{VsfType, BitPackedTensor, Tensor};
// Store 12-bit camera RAW
let raw = BitPackedTensor::pack(12, vec![4096, 3072], &pixel_data);
let encoded = VsfType::p(raw).flatten();
// Store a tensor (8-bit grayscale image)
let tensor = Tensor::new(vec![1920, 1080], grayscale_data);
let img = VsfType::t_u3(tensor);
// Store text (automatically Huffman compressed)
let doc = VsfType::x("Hello, world!".to_string());
// Store a hash (BLAKE3)
use vsf::crypto_algorithms::HASH_BLAKE3;
let hash = VsfType::h(HASH3, hash_bytes);
// Round-trip
let decoded = VsfType::parse(&encoded)?;
assert_eq!(original, decoded);
VSF uses optional features to keep the default build minimal:
[dependencies]
vsf = "0.2" # Core only (no optional features)
vsf = { version = "0.2", features = ["text"] } # + Huffman compression
vsf = { version = "0.2", features = ["crypto"] } # + Ed25519, X25519, AES-GCM
vsf = { version = "0.2", features = ["spirix"] } # + Spirix arithmetic
| Feature | Enables | Use Case |
|---|---|---|
| (none) | Core types, tensors, encoding/decoding | Basic serialization |
text |
VsfType::x Huffman-compressed strings |
Human-readable text storage |
crypto |
Ed25519 signatures, X25519 key exchange, AES-GCM | Signed/encrypted data |
spirix |
Spirix two's-complement floats (s33-s77, c33-c77) |
Two's complement floating-point |
Note: Without text, use VsfType::l for ASCII labels. Without crypto, hash types (h) still work (BLAKE3 is always available), but signatures (g) and encryption require the feature.
VSF provides two approaches to parsing sections, each suited to different use cases:
VsfSection::parse()Schema-agnostic parsing that extracts raw data without validation. Located in src/file_format.rs.
use vsf::VsfSection;
let mut ptr = 0;
let section = VsfSection::parse(&bytes, &mut ptr)?;
// Returns VsfSection { name: String, fields: Vec<VsfField> }
// No schema required, no validation performed
// Caller controls pointer position (useful for streaming)
Use when:
SectionBuilder::parse()Schema-validated parsing for type-safe workflows. Located in src/schema/section.rs.
use vsf::schema::{SectionSchema, SectionBuilder, TypeConstraint};
// Define expected structure
let schema = SectionSchema::new("camera")
.field("iso", TypeConstraint::AnyUnsigned)
.field("shutter", TypeConstraint::AnyFloat)
.field("model", TypeConstraint::AnyString);
// Parse with validation
let builder = SectionBuilder::parse(schema, §ion_bytes)?;
// Validates section name matches schema
// Validates each field value against type constraints
// Returns SectionBuilder for modify → re-encode workflow
// Modify and re-encode
builder.set("iso", 1600u32)?;
let new_bytes = builder.encode()?;
Use when:
| Aspect | VsfSection::parse() |
SectionBuilder::parse() |
|---|---|---|
| Schema required | No | Yes |
| Validates types | No | Yes |
| Field storage | Single value per field | Multi-value per field |
| Pointer control | External (&mut usize) |
Internal |
| Returns | VsfSection |
SectionBuilder |
| Use case | General parsing, tooling | Type-safe applications |
Both parse the same [d"name"(d"field":value)...] binary format—SectionBuilder::parse() adds schema enforcement on top.
use vsf::builders::build_raw_image;
use vsf::types::BitPackedTensor;
// Just image data - no metadata
let samples: Vec<u64> = vec![2048; 4096 * 3072]; // 12-bit, mid-gray
let image = BitPackedTensor::pack(12, vec![4096, 3072], &samples);
let bytes = build_raw_image(image, None, None, None)?;
// That's it! File includes mandatory BLAKE3 hash automatically
use vsf::builders::build_raw_image;
use vsf::types::BitPackedTensor;
use vsf::crypto_algorithms::HASH_BLAKE3;
// Create image with metadata
let samples: Vec<u64> = vec![2048; 4096 * 3072];
let image = BitPackedTensor::pack(12, vec![4096, 3072], &samples);
let bytes = build_raw_image(
image, // Only required field
Some(RawMetadata {
cfa_pattern: Some(vec![b'R', b'G', b'G', b'B']),
black_level: Some(64.),
white_level: Some(4095.),
// Calibration frame hashes (algorithm + bytes)
dark_frame_hash: Some((HASH_BLAKE3, dark_hash)),
flat_field_hash: Some((HASH_BLAKE3, flat_hash)),
// ...
}),
Some(CameraSettings {
iso_speed: Some(800.),
shutter_time_s: Some(1./60.),
aperture_f_number: Some(2.8),
focal_length_m: Some(0.024), // 24mm in meters
// ...
}),
Some(LensInfo { /* lens details */ }),
)?;
// File hash computed automatically - no additional steps needed!
VSF treats integrity verification and data provenance as architectural requirements, not optional add-ons. While other formats bolt on checksums as an afterthought (or skip them entirely), VSF makes verification impossible to ignore.
Every VSF file includes mandatory BLAKE3 verification - no exceptions, no opt-out:
// Just build - hash is computed automatically
let bytes = builder.build()?;
// Verify integrity later
verify_file_hash(&bytes)?; // Returns Ok(()) or Err("corruption detected")
How it works:
0. Header contains hb3[32][hash] placeholder covering entire file
build() automatically computes BLAKE3 over the complete file (using zero-out procedure)Why this matters:
Performance: BLAKE3 is essentially free
"But won't hashing everything slow down my writes?" Nope!
BLAKE3 throughput on modern hardware:
For context, typical hardware limits:
BLAKE3 is faster than your storage. The hash computation happens while you're waiting for the disk write anyway - literally zero added latency in most cases.
This is similar to Rust's bounds checking: "But won't array bounds checks slow me down?" In practice, the optimizer eliminates most checks, and the remaining ones are drowned out by cache misses. Safety first, performance second - and you get both anyway.
Traditional formats like TIFF, PNG, and HDF5 make integrity checks optional, if even supported. VSF makes them unavoidable.
Hashes, signatures, and keys aren't byte blobs - they're strongly-typed primitives:
VsfType::h(algorithm, hash) // Integrity verification
VsfType::g(algorithm, signature) // Authentication & authorization
VsfType::k(algorithm, pubkey) // Identity
VsfType::a(algorithm, mac_tag) // Message authentication
These primitives enable:
Algorithm identifiers prevent type confusion - the compiler enforces verification.
Concrete examples:
// Hash - integrity verification
VsfType::h(HASH_BLAKE3, hash_bytes) // Algorithm ID prevents confusion
VsfType::h(HASH_SHA256, sha256_bytes) // Type system enforces verification
// Signature - authentication and non-repudiation
VsfType::g(SIG_ED25519, signature) // 64 bytes, Ed25519
VsfType::g(SIG_ECDSA_P256, signature) // NIST P-256
// Public key - identity
VsfType::k(KEY_ED25519, pubkey) // 32 bytes
VsfType::k(KEY_RSA_2048, pubkey) // 256 bytes
// MAC - message authentication
VsfType::a(MAC_HMAC_SHA256, mac_tag) // 32 bytes
Algorithm identifiers prevent type confusion attacks:
Supported algorithms:
Lock specific sections with signatures while allowing other sections to be modified freely:
// Camera signs RAW sensor data at capture
let bytes = raw.build()?;
let bytes = sign_section(bytes, "raw", &camera_private_key)?;
// Later: Add thumbnail without breaking RAW signature
builder.add_section("thumbnail", thumbnail_data);
// Later: Add EXIF metadata without breaking RAW signature
builder.add_section("exif", exif_data);
// Verify original RAW data is untouched
verify_section_signature(&bytes, "raw", &camera_public_key)?;
Use cases:
Why per-section matters:
Traditional whole-file signatures break on any modification - even benign metadata updates. VSF's per-section signatures enable:
Camera RAW files reference external calibration frames (dark, flat, bias). VSF embeds cryptographic hashes to verify frame integrity:
RawMetadata {
// Each hash includes algorithm ID + hash bytes
dark_frame_hash: Some((HASH_BLAKE3, dark_hash)),
flat_field_hash: Some((HASH_BLAKE3, flat_hash)),
bias_frame_hash: Some((HASH_BLAKE3, bias_hash)),
vignette_correction_hash: Some((HASH_BLAKE3, vignette_hash)),
distortion_correction_hash: Some((HASH_BLAKE3, distortion_hash)),
}
Why this matters:
Other formats treat verification as optional or external:
| Format | File Integrity | Cryptographic Types | Per-Section Signing |
|---|---|---|---|
| TIFF | ❌ None | ❌ Byte blobs | ❌ Not supported |
| PNG | ⚠️ Optional CRC (can strip) | ❌ Byte blobs | ❌ Not supported |
| HDF5 | ⚠️ Optional checksums | ❌ Byte blobs | ❌ Not supported |
| JPEG | ❌ None | ❌ Byte blobs | ❌ Not supported |
| Protobuf | ❌ None | ❌ Byte blobs | ❌ Not supported |
| VSF | ✅ Mandatory BLAKE3 | ✅ First-class types | ✅ Built-in support |
VSF makes data provenance impossible to ignore: 0. Can't create unverifiable files - hash is computed automatically
For systems where data integrity matters - forensic photography, scientific measurements, medical imaging, financial records, legal documents - VSF provides cryptographic guarantees from the ground up, not as a retrofit.
Despite mandatory hashing, VSF maintains O(1) seek performance:
Traditional formats force a choice: fast seeking OR integrity checks. VSF gives you both.
VSF's cryptographic types aren't just for verification - they're the foundation for capability-based permissions:
Traditional ACLs (what UNIX does):
Capabilities (what VSF enables):
// v0.3+ will enable:
let capability = Capability {
resource: VsfType::h(HASH_BLAKE3, file_hash),
permission: "read",
granted_to: VsfType::k(KEY_ED25519, editor_pubkey),
granted_by: VsfType::k(KEY_ED25519, camera_pubkey),
expires: EtType::f6(eagle_time + 30_days),
location: WorldCoord::from_lat_lon(47.6062, -122.3321),
};
let signed_cap = VsfType::g(SIG_ED25519, sign(&capability, camera_private_key));
// Signature proves: camera granted permission to editor
// No central authority needed - crypto proves authorization
// Capability is self-contained, unforgeable, delegatable
VSF v0.2 provides the cryptographic primitives (g, k, h, a). v0.3 will add structured capability types built on these foundations.
Variable-width encoding that's provably optimal for byte-aligned systems. Small numbers use small encodings, large numbers use large encodings.
211 strongly-typed variants with complete pattern matching. Compiler verifies every case is handled.
Optional Spirix integration for two's complement floating-point. Eagle Time for physics-bounded timestamps.
Signatures, hashes, keys, and MACs as first-class types, not afterthoughts.
Each value includes its type information. Files can be parsed without external schema.
See "Core Innovation: Exponential-Width Integer Encoding" section above for complete details.
Quick summary:
u, i, etc.) + size marker (ASCII '3'-'Z')'p' marker (1 byte)
ndim (variable-length)
bit_depth (1 byte: 0x0C for 12-bit, 0x00 for 256-bit)
shape dimensions (each variable-length encoded)
packed data (bits packed into bytes, MSB-first)
Efficient for non-standard bit depths common in sensors and quantized ML models.
VSF is part of a broader computational foundation:
Each component addresses fundamental problems that irritated me for a minute now.
VSF is in active development. Core encoding/decoding is stable.
Custom open-source:
See LICENSE for full terms.
VSF solves the universal integer encoding problem thru exponential-width encoding with explicit size markers. This enables:
If you need efficient encoding of varied-size integers, bitpacked tensors, or cryptographic primitives with perfect type safety, VSF is your only option!
Written in Rust with ZERO wildcards.