nibblerun

Crates.io	nibblerun
lib.rs	nibblerun
version	0.1.6
created_at	2026-01-22 12:20:53.868184+00
updated_at	2026-01-22 12:20:53.868184+00
description	time series compression library for temperature and humidity data
homepage
repository	https://github.com/nanit/nibblerun
max_upload_size
id	2061448
size	427,895

(quatrix)

documentation

README

nibblerun

A high-performance time series compression library for Rust, optimized for temperature sensor data.

Features

Bit-packed delta encoding with variable-length codes
Zero-run length encoding for repeated values
Configurable interval-based timestamp quantization
Automatic averaging of multiple readings within the same interval
~250M readings/second encoding throughput

Example

use nibblerun::{Encoder, decode};

// Create encoder with 5-minute (300-second) intervals (default)
let mut enc: Encoder<i32> = Encoder::new();

// Append readings (timestamp, value)
// Readings are quantized to interval boundaries
enc.append(1761000000, 23).unwrap();  // 00:00:00 -> interval 0
enc.append(1761000150, 25).unwrap();  // 00:02:30 -> same interval, averaged with above
enc.append(1761000300, 24).unwrap();  // 00:05:00 -> interval 1
enc.append(1761000600, 22).unwrap();  // 00:10:00 -> interval 2

// Serialize to bytes
let bytes = enc.to_bytes();
println!("Compressed size: {} bytes", bytes.len());

// Decode back (interval must match encoding)
let readings = decode::<i32, 300>(&bytes);
for r in readings {
    println!("ts: {}, value: {}", r.ts, r.value);
}
// Output:
// ts: 1761000000, value: 24  (average of 23 and 25)
// ts: 1761000300, value: 24
// ts: 1761000600, value: 22

// Custom interval example (10-minute intervals)
let mut enc_10min: Encoder<i32, 600> = Encoder::new();

Handling Gaps

Missing intervals are preserved in the output:

use nibblerun::Encoder;

let mut enc: Encoder<i32> = Encoder::new();

enc.append(1761000000, 22).unwrap();   // 00:00 - interval 0
enc.append(1761000300, 23).unwrap();   // 00:05 - interval 1
// No data for 00:10, 00:15, 00:20...
enc.append(1761003000, 25).unwrap();   // 00:50 - interval 10

let readings = enc.decode();
assert_eq!(readings.len(), 3);
assert_eq!(readings[2].ts - readings[1].ts, 2700); // 45-minute gap preserved

How It Works

Timestamp Quantization

Timestamps are quantized to configurable intervals (default: 300 seconds). The first reading's timestamp becomes the base, and all subsequent readings are mapped to interval indices:

interval_idx = (timestamp - base_ts) / interval

Multiple readings in the same interval are averaged together.

Delta Encoding

Temperature values are stored as deltas from the previous reading. Deltas are encoded with variable-length bit codes optimized for typical temperature data:

Delta	Encoding	Bits
0 (repeated)	`0`	1
±1	`10x`	3
±2	`1110x`	5
±3 to ±10	`1111110xxxx`	11
±11 to ±1023	`11111110xxxxxxxxxxx`	19

Zero-Run Encoding

Consecutive zero deltas (unchanged temperatures) are optimized for efficiency:

Run Length	Encoding	Bits	Notes
1-7	`0` × n	1-7	Individual zeros (more efficient than run encoding)
8-21	`11110xxxx`	9	Run-length encoding
22-149	`111110xxxxxxx`	13	Run-length encoding

Gap Encoding

Missing intervals (sensor offline, network issues) are encoded efficiently:

Gap Size	Encoding	Bits	Notes
1 interval	`110`	3	Optimized for common single-interval gaps
2-65 intervals	`11111111xxxxxx`	14	Larger gaps

Assumptions and Limitations

Assumptions

Timestamps are monotonically increasing: Out-of-order readings return an error
Timestamps are Unix seconds: The library uses an epoch base of 1,760,000,000 (~2025)
Temperature changes are gradual: The encoding is optimized for small deltas (±10)

Limitations

Limit	Value	Notes
Max readings per encoder	65,535	`count` is u16
Max delta between readings	±1,023	Larger deltas return error
Max readings per interval	1,023	Additional readings return error
Min timestamp	1,760,000,000	~2025-10-09, returns error if earlier
Interval range	1-65,535 seconds	~18 hours max

Performance Characteristics

Encoding: O(1) per reading, ~250M readings/second
Decoding: O(n) where n = reading count
Compression: ~40-50 bytes/day for typical temperature data (vs ~3.5KB raw)
Memory: 56-64 bytes per encoder (depends on value type) + output buffer

Analyzing Your Data

The analysis tool is included as nbl-analyze:

# Build and run
cargo build --release --bin nbl-analyze
./target/release/nbl-analyze /path/to/csv/directory/

# With options
./target/release/nbl-analyze /path/to/csv/ --max-files 10000 --progress 1000

CSV files should have the format:

ts,temperature
1760000000,22
1760000300,23

Values of -1000 are treated as gap markers and excluded from temperature statistics.

Testing

Run unit tests:

cargo test

Run property-based tests (included in unit tests via proptest):

cargo test proptests

Property Tests

The library includes 12 property-based tests that verify invariants across random inputs:

Property	Description
`prop_size_accuracy`	`size() == to_bytes().len()`
`prop_count_consistency`	`decode().len() == count()`
`prop_roundtrip_via_bytes`	`decode(to_bytes()) == decode()`
`prop_monotonic_timestamps`	Decoded timestamps are strictly increasing
`prop_idempotent_serialization`	Multiple `to_bytes()` calls return identical results
`prop_timestamp_alignment`	All timestamps align to interval boundaries
`prop_lossy_compression_bounds`	Decoded temps are within [min, max] of interval inputs
`prop_single_reading_identity`	Single reading per interval decodes exactly
`prop_averaging_within_interval`	Multiple readings per interval are averaged correctly
`prop_timestamp_quantization`	Timestamps are quantized to interval boundaries
`prop_gap_preservation`	Gaps between readings are preserved correctly
`prop_interval_deduplication`	Multiple readings in same interval produce one output

Fuzzing

The library includes fuzz targets using cargo-fuzz. Install cargo-fuzz first:

cargo install cargo-fuzz

Fuzz Targets

Target	Description
`fuzz_roundtrip`	Tests encode/decode invariants with arbitrary inputs
`fuzz_decode`	Tests that `decode()` never panics on arbitrary bytes
`fuzz_idempotent`	Tests that multiple `to_bytes()` calls return identical results
`fuzz_lossy_bounds`	Tests that decoded temps are within [min, max] of interval inputs
`fuzz_single_reading`	Tests that single reading per interval decodes exactly
`fuzz_averaging`	Tests that multiple readings per interval are averaged correctly
`fuzz_gaps`	Tests that gaps between readings are preserved correctly
`fuzz_lossless`	Tests lossless compression with one reading per interval at exact boundaries, including gaps

Run fuzz targets:

# Roundtrip fuzzing (tests encode/decode invariants)
cargo fuzz run fuzz_roundtrip

# Decode-only fuzzing (tests decode never panics on arbitrary input)
cargo fuzz run fuzz_decode

# Run with time limit (e.g., 60 seconds)
cargo fuzz run fuzz_roundtrip -- -max_total_time=60

Code Coverage

Prerequisites

Install the required tools:

# Install cargo-llvm-cov for unit test coverage
cargo install cargo-llvm-cov

# Install cargo-fuzz for fuzz testing (requires nightly)
cargo install cargo-fuzz

# Install llvm-tools for fuzz coverage reports
rustup component add llvm-tools-preview
rustup component add --toolchain nightly llvm-tools-preview

Running Coverage

Using the Makefile:

# Unit test coverage summary
make coverage

# Unit test coverage with HTML report
make coverage-html
# Report at: target/llvm-cov/html/index.html

# Run all fuzz targets (30 seconds each)
make fuzz

# Generate combined fuzz coverage report
make fuzz-coverage

Or manually:

# Unit test coverage
cargo llvm-cov --summary-only
cargo llvm-cov --html

# Fuzz coverage (single target)
cargo +nightly fuzz run fuzz_decode -- -max_total_time=30
cargo +nightly fuzz coverage fuzz_decode

CLI Tools

The crate includes three command-line utilities for generating, visualizing, and analyzing encoded data.

nbl-gen

Generate sample nibblerun time series data:

# Generate 24 hours of data (288 readings at 5-min intervals)
nbl-gen day.nbl

# Generate with random gaps (sensor offline periods)
nbl-gen day.nbl --gaps

# Generate with occasional temperature spikes
nbl-gen day.nbl --spikes

# Customize readings count and interval
nbl-gen custom.nbl --readings 100 --interval 600 --base-temp 25

Options:

--readings N - Number of readings (default: 288)
--gaps - Include random gaps (5% chance per reading)
--spikes - Include occasional large temperature changes (2% chance)
--base-temp N - Base temperature in Celsius (default: 22)
--interval N - Interval in seconds (default: 300)

nbl-viz

Visualize the internal bit-level structure of encoded data as SVG:

# Generate SVG visualization
nbl-viz day.nbl -o day.svg

# Output defaults to input filename with .svg extension
nbl-viz day.nbl  # creates day.svg

nbl-analyze

Analyze CSV files to compute delta frequency distributions and encoding statistics:

# Analyze all CSV files in a directory
nbl-analyze /path/to/csv/directory/

# Limit to first N files
nbl-analyze /path/to/csv/ --max-files 10000

# Show progress every N files
nbl-analyze /path/to/csv/ --progress 5000

Options:

--max-files N - Maximum files to process (default: 0 = all)
--progress N - Show progress every N files (default: 10000)

Output includes:

Delta frequency distribution by encoding tier
Zero-run length histogram
Gap analysis
Bit cost breakdown
Optimization recommendations

License

MIT

Commit count: 57

nibblerun

documentation

README

nibblerun

Features

Example

Handling Gaps

How It Works

Timestamp Quantization

Delta Encoding

Zero-Run Encoding

Gap Encoding

Assumptions and Limitations

Assumptions

Limitations

Performance Characteristics

Analyzing Your Data

Testing

Property Tests

Fuzzing

Fuzz Targets

Code Coverage

Prerequisites

Running Coverage

CLI Tools

nbl-gen

nbl-viz

nbl-analyze

License

cargo fmt