nibblerun

Crates.ionibblerun
lib.rsnibblerun
version0.1.6
created_at2026-01-22 12:20:53.868184+00
updated_at2026-01-22 12:20:53.868184+00
descriptiontime series compression library for temperature and humidity data
homepage
repositoryhttps://github.com/nanit/nibblerun
max_upload_size
id2061448
size427,895
(quatrix)

documentation

README

nibblerun

A high-performance time series compression library for Rust, optimized for temperature sensor data.

Features

  • Bit-packed delta encoding with variable-length codes
  • Zero-run length encoding for repeated values
  • Configurable interval-based timestamp quantization
  • Automatic averaging of multiple readings within the same interval
  • ~250M readings/second encoding throughput

Example

use nibblerun::{Encoder, decode};

// Create encoder with 5-minute (300-second) intervals (default)
let mut enc: Encoder<i32> = Encoder::new();

// Append readings (timestamp, value)
// Readings are quantized to interval boundaries
enc.append(1761000000, 23).unwrap();  // 00:00:00 -> interval 0
enc.append(1761000150, 25).unwrap();  // 00:02:30 -> same interval, averaged with above
enc.append(1761000300, 24).unwrap();  // 00:05:00 -> interval 1
enc.append(1761000600, 22).unwrap();  // 00:10:00 -> interval 2

// Serialize to bytes
let bytes = enc.to_bytes();
println!("Compressed size: {} bytes", bytes.len());

// Decode back (interval must match encoding)
let readings = decode::<i32, 300>(&bytes);
for r in readings {
    println!("ts: {}, value: {}", r.ts, r.value);
}
// Output:
// ts: 1761000000, value: 24  (average of 23 and 25)
// ts: 1761000300, value: 24
// ts: 1761000600, value: 22

// Custom interval example (10-minute intervals)
let mut enc_10min: Encoder<i32, 600> = Encoder::new();

Handling Gaps

Missing intervals are preserved in the output:

use nibblerun::Encoder;

let mut enc: Encoder<i32> = Encoder::new();

enc.append(1761000000, 22).unwrap();   // 00:00 - interval 0
enc.append(1761000300, 23).unwrap();   // 00:05 - interval 1
// No data for 00:10, 00:15, 00:20...
enc.append(1761003000, 25).unwrap();   // 00:50 - interval 10

let readings = enc.decode();
assert_eq!(readings.len(), 3);
assert_eq!(readings[2].ts - readings[1].ts, 2700); // 45-minute gap preserved

How It Works

Timestamp Quantization

Timestamps are quantized to configurable intervals (default: 300 seconds). The first reading's timestamp becomes the base, and all subsequent readings are mapped to interval indices:

interval_idx = (timestamp - base_ts) / interval

Multiple readings in the same interval are averaged together.

Delta Encoding

Temperature values are stored as deltas from the previous reading. Deltas are encoded with variable-length bit codes optimized for typical temperature data:

Delta Encoding Bits
0 (repeated) 0 1
±1 10x 3
±2 1110x 5
±3 to ±10 1111110xxxx 11
±11 to ±1023 11111110xxxxxxxxxxx 19

Zero-Run Encoding

Consecutive zero deltas (unchanged temperatures) are optimized for efficiency:

Run Length Encoding Bits Notes
1-7 0 × n 1-7 Individual zeros (more efficient than run encoding)
8-21 11110xxxx 9 Run-length encoding
22-149 111110xxxxxxx 13 Run-length encoding

Gap Encoding

Missing intervals (sensor offline, network issues) are encoded efficiently:

Gap Size Encoding Bits Notes
1 interval 110 3 Optimized for common single-interval gaps
2-65 intervals 11111111xxxxxx 14 Larger gaps

Assumptions and Limitations

Assumptions

  • Timestamps are monotonically increasing: Out-of-order readings return an error
  • Timestamps are Unix seconds: The library uses an epoch base of 1,760,000,000 (~2025)
  • Temperature changes are gradual: The encoding is optimized for small deltas (±10)

Limitations

Limit Value Notes
Max readings per encoder 65,535 count is u16
Max delta between readings ±1,023 Larger deltas return error
Max readings per interval 1,023 Additional readings return error
Min timestamp 1,760,000,000 ~2025-10-09, returns error if earlier
Interval range 1-65,535 seconds ~18 hours max

Performance Characteristics

  • Encoding: O(1) per reading, ~250M readings/second
  • Decoding: O(n) where n = reading count
  • Compression: ~40-50 bytes/day for typical temperature data (vs ~3.5KB raw)
  • Memory: 56-64 bytes per encoder (depends on value type) + output buffer

Analyzing Your Data

The analysis tool is included as nbl-analyze:

# Build and run
cargo build --release --bin nbl-analyze
./target/release/nbl-analyze /path/to/csv/directory/

# With options
./target/release/nbl-analyze /path/to/csv/ --max-files 10000 --progress 1000

CSV files should have the format:

ts,temperature
1760000000,22
1760000300,23

Values of -1000 are treated as gap markers and excluded from temperature statistics.

Testing

Run unit tests:

cargo test

Run property-based tests (included in unit tests via proptest):

cargo test proptests

Property Tests

The library includes 12 property-based tests that verify invariants across random inputs:

Property Description
prop_size_accuracy size() == to_bytes().len()
prop_count_consistency decode().len() == count()
prop_roundtrip_via_bytes decode(to_bytes()) == decode()
prop_monotonic_timestamps Decoded timestamps are strictly increasing
prop_idempotent_serialization Multiple to_bytes() calls return identical results
prop_timestamp_alignment All timestamps align to interval boundaries
prop_lossy_compression_bounds Decoded temps are within [min, max] of interval inputs
prop_single_reading_identity Single reading per interval decodes exactly
prop_averaging_within_interval Multiple readings per interval are averaged correctly
prop_timestamp_quantization Timestamps are quantized to interval boundaries
prop_gap_preservation Gaps between readings are preserved correctly
prop_interval_deduplication Multiple readings in same interval produce one output

Fuzzing

The library includes fuzz targets using cargo-fuzz. Install cargo-fuzz first:

cargo install cargo-fuzz

Fuzz Targets

Target Description
fuzz_roundtrip Tests encode/decode invariants with arbitrary inputs
fuzz_decode Tests that decode() never panics on arbitrary bytes
fuzz_idempotent Tests that multiple to_bytes() calls return identical results
fuzz_lossy_bounds Tests that decoded temps are within [min, max] of interval inputs
fuzz_single_reading Tests that single reading per interval decodes exactly
fuzz_averaging Tests that multiple readings per interval are averaged correctly
fuzz_gaps Tests that gaps between readings are preserved correctly
fuzz_lossless Tests lossless compression with one reading per interval at exact boundaries, including gaps

Run fuzz targets:

# Roundtrip fuzzing (tests encode/decode invariants)
cargo fuzz run fuzz_roundtrip

# Decode-only fuzzing (tests decode never panics on arbitrary input)
cargo fuzz run fuzz_decode

# Run with time limit (e.g., 60 seconds)
cargo fuzz run fuzz_roundtrip -- -max_total_time=60

Code Coverage

Prerequisites

Install the required tools:

# Install cargo-llvm-cov for unit test coverage
cargo install cargo-llvm-cov

# Install cargo-fuzz for fuzz testing (requires nightly)
cargo install cargo-fuzz

# Install llvm-tools for fuzz coverage reports
rustup component add llvm-tools-preview
rustup component add --toolchain nightly llvm-tools-preview

Running Coverage

Using the Makefile:

# Unit test coverage summary
make coverage

# Unit test coverage with HTML report
make coverage-html
# Report at: target/llvm-cov/html/index.html

# Run all fuzz targets (30 seconds each)
make fuzz

# Generate combined fuzz coverage report
make fuzz-coverage

Or manually:

# Unit test coverage
cargo llvm-cov --summary-only
cargo llvm-cov --html

# Fuzz coverage (single target)
cargo +nightly fuzz run fuzz_decode -- -max_total_time=30
cargo +nightly fuzz coverage fuzz_decode

CLI Tools

The crate includes three command-line utilities for generating, visualizing, and analyzing encoded data.

nbl-gen

Generate sample nibblerun time series data:

# Generate 24 hours of data (288 readings at 5-min intervals)
nbl-gen day.nbl

# Generate with random gaps (sensor offline periods)
nbl-gen day.nbl --gaps

# Generate with occasional temperature spikes
nbl-gen day.nbl --spikes

# Customize readings count and interval
nbl-gen custom.nbl --readings 100 --interval 600 --base-temp 25

Options:

  • --readings N - Number of readings (default: 288)
  • --gaps - Include random gaps (5% chance per reading)
  • --spikes - Include occasional large temperature changes (2% chance)
  • --base-temp N - Base temperature in Celsius (default: 22)
  • --interval N - Interval in seconds (default: 300)

nbl-viz

Visualize the internal bit-level structure of encoded data as SVG:

# Generate SVG visualization
nbl-viz day.nbl -o day.svg

# Output defaults to input filename with .svg extension
nbl-viz day.nbl  # creates day.svg

nbl-analyze

Analyze CSV files to compute delta frequency distributions and encoding statistics:

# Analyze all CSV files in a directory
nbl-analyze /path/to/csv/directory/

# Limit to first N files
nbl-analyze /path/to/csv/ --max-files 10000

# Show progress every N files
nbl-analyze /path/to/csv/ --progress 5000

Options:

  • --max-files N - Maximum files to process (default: 0 = all)
  • --progress N - Show progress every N files (default: 10000)

Output includes:

  • Delta frequency distribution by encoding tier
  • Zero-run length histogram
  • Gap analysis
  • Bit cost breakdown
  • Optimization recommendations

License

MIT

Commit count: 57

cargo fmt