Crates.io | tsz-macro |
lib.rs | tsz-macro |
version | 1.1.6 |
source | src |
created_at | 2023-03-25 16:49:44.446481 |
updated_at | 2024-07-26 20:01:39.156084 |
description | Delta-delta, Delta compression for time series data |
homepage | https://github.com/qsib-cbie/tsz |
repository | https://github.com/qsib-cbie/tsz |
max_upload_size | |
id | 820388 |
size | 69,966 |
A portable implementation for bit-packing on space-constrained systems for time-series integral data sampled on embedded systems.
Inspiration drawn from Delta encoding, IC FIFO compression, and Gorilla timestamp compression. https://www.vldb.org/pvldb/vol8/p1816-teller.pdf
✅ tsz
is designed to lean heavily on delta and delta-delta compression. As shown by Gorilla in practice, data points that change the same way compress very well (96% of timestamps could be represented by 1 bit in the Gorilla block).
Periodic integral data, like those from embedded sensor data, that changes with low noise rates will mostly follow the same compression patterns as the timestamps generated by consitently clocked intervals.
✅ tsz
is designed to take advantage of the integral data patterns produced by ICs that generate consistent bit-width integral data over time with similar magnitudes of change.
✅ tsz
is designed to compress on 32-bit Cortex-M and emit framed packets that would be considered very small outside of the embedded ecosystem. It is compatible with (and much faster on 64-bit) and does limit the size of the compressed data.
✅ tsz
is designed to emit half-byte aligned words that work with a second-pass compression algorithm such as LZ4 or ZSTD.
❌ tsz
is not designed to handle oscillating change or irregular event time streams optimally but can encode that information about as well as uncompressed.
❌ tsz
is not designed to handle floating-point or fixed-point data. Use of fixed-point is functional but not optimal, and floating-point is not supported.
❌ tsz
is not designed to optimize perfectly predictable data. Real-life instruments have some non-zero noise that often prevents perfect linearity.
❌ tsz
is not designed to prioritize (de)compression rates over memory usage or compression ratio.
The tsz
interface is designed to be as simple and result in as much inlining and static dispatch as possible. It uses a procedural macro to generate the necessary traits and structs to compress row-major data into column-major bytes and decompress column-major bytes into column-major or row-major data.
// Use a procedural macro to generate a compressor and decompressor for a row struct
mod abcd {
use tsz_compress::prelude::*;
#[derive(Copy, Clone, CompressV2, DecompressV2)]
pub struct AbcdRow {
pub ts: i64,
pub a: i8,
pub b: i16,
pub c: i32,
pub d: i64,
}
pub use compress::AbcdRowCompressorImpl;
pub use decompress::AbcdRowDecompressorImpl;
}
// Expose the structs you want to use
use abcd::AbcdRow;
use abcd::compress::AbcdRowCompressorImpl;
use abcd::decompress::AbcdRowDecompressorImpl;
// Initialize a compressor with preallocated space, using alloc::vec::Vec internally
let mut compressor = TestRowCompressorImpl::new(32);
// Compress all the rows you want
compressor.compress(TestRow { ts: 0, a: 1, b: 2, c: 3, d: 4 });
assert!(compressor.row_count() == 1);
// Finalize the compression, flushing the compressor's pending bits
let bytes = c.finish();
// Initialize the decompressor
let mut decompressor = TestRowDecompressorImpl::new();
// Decompress the bit buffer
decompressor.decompress(&bytes).unwrap();
// Access data by column
let a = decompressor.col_a();
// Rotate the data back to row-major
let rows= decompressor.rows();
If you want to compress into an existing buffer, you can use the finish_into(&mut vec_buf)
method to avoid allocating a new buffer. If necessary, it will continue appending to the buffer by reserving exactly the extra space it needs.
let mut compressor = TestRowCompressorImpl::new(32);
let mut bytes: Vec<u8> = Vec::new();
bytes.extend([0xDE, 0xAD, 0xBE, 0xEF]);
compressor.finish_into(&mut bytes);
assert_eq!(vec_buf[0], 0xDE);
assert_eq!(vec_buf[1], 0xAD);
assert_eq!(vec_buf[2], 0xBE);
assert_eq!(vec_buf[3], 0xEF);
Check out the benchmarks for more info in tsz-bench.
This is accessible behind the CompressV2
and DecompressV2
procedural macros. The delta scheme is currently employed by default with delta-delta work in progress. Delta can be better for systems that sample some noise that make it slightly unpredictable. Delta-delta can be far more compressible with second pass compression when delta-delta is often 0.
The compression scheme includes a single bit before each word to indicate:
the following is a truncated binary encoding header indicating the number of following bits and the bits for the delta-delta from the previous delta. Each delta-delta is zigzag encoded
the following is a truncated binary encoding header indicating the number of bit-packed deltas (not delta-deltas) in the next 32-bits. Each delta is zigzag encoded
In the updated scheme, a second pass compression algorithm such as LZ4 or ZSTD greatly improve compression ratios. An space-optimized second pass algorithm would include entropy coding with the minimum word size as 4 bits. All headers and delta bit sequences are 4 bit aligned, with octets tending towards 0000 for constant slope and 1111 for 10 consecutive data points within +-3. Values in delta zigzag encoding may also include octets of leading 0s.
This is accessible behind the DeltaEncodable
, Compressible
, and Decompressible
procedural macros. This may still be an appropriate choice, if you have highly predictable data or do not intend to make a second pass with another algorithm. However, the scheme must operate over bits (using bitvec
), which is slower than the nibble-aligned optimized compression structure used in TSZ V2.
The initial compression scheme implementation included:
The following example encodes 2 timestamps and 4 values. The first timestamp is an SoC uptime ms. The second timestamp is UTC us. The values are 4 channels of int16_t data incrementing slowly and sometimes resetting. Data in this example is collected at 1 Hz.
soc (uint64_t) | utc (int64_t) | channel0 (int16_t) | channel1 (int16_t) | channel2 (int16_t) | channel3 (int16_t) |
---|---|---|---|---|---|
250 | 1675465460000000 | 0 | 100 | 200 | 300 |
1250 | 1675465461000153 | 2 | 101 | 200 | 299 |
2250 | 1675465462000512 | 4 | 103 | 201 | 301 |
3251 | 1675465463000913 | 7 | 104 | 202 | 302 |
4251 | 1675465464001300 | 9 | 105 | 203 | 303 |
Compresses down by 3.2x in example here, extrapolating to 5.9x per 251 byte packet if example continued.
| soc_bits | utc_bits | channel0_bits | channel1_bits | channel2_bits | channel3_bits | | -------- | -------- | ------------- | ------------- | ------------- | ------------- | --- | | 16 | 64 | 8 | 8 | 16 | 16 | | 17 | 40 | 6 | 6 | 6 | 1 | 6 | | 1 | 10 | 1 | 6 | 6 | 6 | | 6 | 10 | 6 | 6 | 1 | 6 | | 6 | 10 | 6 | 1 | 1 | 1 |
See the docs for more info.
For maximal compression ratio, a linear sequence of integers, such as an incrementing integer, has a delta-delta of 0. In this trivialized example, we have the smallest delta-delta, 0. A second pass with LZ4 or ZSTD would compress this down to basically nothing. Similarly, a delta-delta of 0 is equivalent to encoding a constant delta, which would also be highly compressible by a second pass.
The V2 compression scheme includes math for each row happening on the native bit-width. i64 math can be very expensive on 32-bit microcontrollers, so the internal data structure uses usize for storing bits. To use i64 on 32-bit systems, specify the bit-wdith to use that will not overflow from sample to sample. You can explicitly select the bit-width to use like so:
use tsz_compress::prelude::*;
#[derive(Copy, Clone, CompressV2, DecompressV2)]
pub struct AbcdRow {
#[tsz(delta = "i32")]
pub ts: i64,
pub a: i8,
pub b: i16,
#[tsz(delta = "i16")]
pub c: i32,
#[tsz(delta = "i8")]
pub d: i64,
}
This allows the first two rows to use the normal column width, then all delta/delta-delta instructions operate on the specified bit-width. For example, the epoch timestamp in microseconds may be 8 bytes on the first and second row, then a 50Hz analog front-end will have deltas around 20000 microseconds calculated with 32 bits for the rest of the compression.