![Rust](https://github.com/qsib-cbie/tsz/actions/workflows/rust.yml/badge.svg)
[![Crate](https://img.shields.io/crates/v/tsz-compress)](https://crates.io/crates/tsz-compress)
[![DOI](https://zenodo.org/badge/597249911.svg)](https://zenodo.org/badge/latestdoi/597249911)

# tsz :: Compact Integral Time-Series Compression

A portable implementation for bit-packing on space-constrained systems for time-series integral data sampled on embedded systems.

Inspiration drawn from Delta encoding, IC FIFO compression, and Gorilla timestamp compression. https://www.vldb.org/pvldb/vol8/p1816-teller.pdf

## Pros and Cons

✅ `tsz` is designed to lean heavily on delta and delta-delta compression. As shown by Gorilla in practice, data points that change the same way compress very well (96% of timestamps could be represented by 1 bit in the Gorilla block).
Periodic integral data, like those from embedded sensor data, that changes with low noise rates will mostly follow the same compression patterns as the timestamps generated by consitently clocked intervals.

✅ `tsz` is designed to take advantage of the integral data patterns produced by ICs that generate consistent bit-width integral data over time with similar magnitudes of change.

✅ `tsz` is designed to compress on 32-bit Cortex-M and emit framed packets that would be considered very small outside of the embedded ecosystem. It is compatible with (and much faster on 64-bit) and does limit the size of the compressed data.

✅ `tsz` is designed to emit half-byte aligned words that work with a second-pass compression algorithm such as LZ4 or ZSTD.

❌ `tsz` is not designed to handle oscillating change or irregular event time streams optimally but can encode that information about as well as uncompressed.

❌ `tsz` is not designed to handle floating-point or fixed-point data. Use of fixed-point is functional but not optimal, and floating-point is not supported.

❌ `tsz` is not designed to optimize perfectly predictable data. Real-life instruments have some non-zero noise that often prevents perfect linearity.

❌ `tsz` is not designed to prioritize (de)compression rates over memory usage or compression ratio.

## Interface

The `tsz` interface is designed to be as simple and result in as much inlining and static dispatch as possible. It uses a procedural macro to generate the necessary traits and structs to compress row-major data into column-major bytes and decompress column-major bytes into column-major or row-major data.

```rust

// Use a procedural macro to generate a compressor and decompressor for a row struct
mod abcd {
    use tsz_compress::prelude::*;
    #[derive(Copy, Clone, CompressV2, DecompressV2)]
    pub struct AbcdRow {
        pub ts: i64,
        pub a: i8,
        pub b: i16,
        pub c: i32,
        pub d: i64,
    }

    pub use compress::AbcdRowCompressorImpl;
    pub use decompress::AbcdRowDecompressorImpl;
}

// Expose the structs you want to use
use abcd::AbcdRow;
use abcd::compress::AbcdRowCompressorImpl;
use abcd::decompress::AbcdRowDecompressorImpl;


// Initialize a compressor with preallocated space, using alloc::vec::Vec internally
let mut compressor = TestRowCompressorImpl::new(32);

// Compress all the rows you want
compressor.compress(TestRow { ts: 0, a: 1, b: 2, c: 3, d: 4 });
assert!(compressor.row_count() == 1);

// Finalize the compression, flushing the compressor's pending bits
let bytes = c.finish();

// Initialize the decompressor
let mut decompressor = TestRowDecompressorImpl::new();

// Decompress the bit buffer
decompressor.decompress(&bytes).unwrap();

// Access data by column
let a = decompressor.col_a();

// Rotate the data back to row-major
let rows= decompressor.rows();
```

If you want to compress into an existing buffer, you can use the `finish_into(&mut vec_buf)` method to avoid allocating a new buffer. If necessary, it will continue appending to the buffer by reserving exactly the extra space it needs.

```rust
let mut compressor = TestRowCompressorImpl::new(32);
let mut bytes: Vec<u8> = Vec::new();
bytes.extend([0xDE, 0xAD, 0xBE, 0xEF]);
compressor.finish_into(&mut bytes);
assert_eq!(vec_buf[0], 0xDE);
assert_eq!(vec_buf[1], 0xAD);
assert_eq!(vec_buf[2], 0xBE);
assert_eq!(vec_buf[3], 0xEF);
```

## Benchmarks

Check out the benchmarks for more info in [tsz-bench](./tsz-bench/README.md).

## TSZ V2 Compression Scheme

This is accessible behind the `CompressV2` and `DecompressV2` procedural macros. The delta scheme is currently employed by default with delta-delta work in progress. Delta can be better for systems that sample some noise that make it slightly unpredictable. Delta-delta can be far more compressible with second pass compression when delta-delta is often 0.

The compression scheme includes a single bit before each word to indicate:

- the following is a truncated binary encoding header indicating the number of following bits and the bits for the delta-delta from the previous delta. Each delta-delta is zigzag encoded

  1. 0, 00, 1 bit
  1. 0, 01, 5 bits
  1. 0, 10, 9 bits
  1. 0, 110, 16 bits
  1. 0, 111, 64 bits

- the following is a truncated binary encoding header indicating the number of bit-packed deltas (not delta-deltas) in the next 32-bits. Each delta is zigzag encoded
  1. 1, 10, 1, 1 sample (64 bits)
  1. 1, 01, 1, 1 sample (32 bits)
  1. 1, 00, pad 0, 2 samples (16 bits)
  1. 1, 01, pad 000, 3 samples (10 bits)
  1. 1, 10, pad 0, 4 samples (8 bits)
  1. 1, 110, 8 samples (4 bits)
  1. 1, 111, pad 00, 10 samples (3 bits)

In the updated scheme, a second pass compression algorithm such as LZ4 or ZSTD greatly improve compression ratios. An space-optimized second pass algorithm would include entropy coding with the minimum word size as 4 bits. All headers and delta bit sequences are 4 bit aligned, with octets tending towards 0000 for constant slope and 1111 for 10 consecutive data points within +-3. Values in delta zigzag encoding may also include octets of leading 0s.

## TSZ V1 Compression Scheme

This is accessible behind the `DeltaEncodable`, `Compressible`, and `Decompressible` procedural macros. This may still be an appropriate choice, if you have highly predictable data or do not intend to make a second pass with another algorithm. However, the scheme must operate over bits (using `bitvec`), which is slower than the nibble-aligned optimized compression structure used in TSZ V2.

The initial compression scheme implementation included:

1. A VLQ encoding of the full value for each column for the first row
2. A VLQ encoding of the delta from the first row to the second row
3. For each subsequent row, a unary encoding header indicating the number of following bits and the bits for the delta-delta from the previous delta.
   1. header 0, 0 bits
   1. header 10, 4 bits
   1. header 110, 7 bits
   1. header 1110, 9 bits
   1. header 11110, 12 bits
   1. header 111110, 15 bits
   1. header 1111110, 18 bits
   1. header 11111110, 32 bits
   1. header 111111110, 64 bits

## Example for V1

The following example encodes 2 timestamps and 4 values. The first timestamp is an SoC uptime ms. The second timestamp is UTC us. The values are 4 channels of int16_t data incrementing slowly and sometimes resetting. Data in this example is collected at 1 Hz.

| soc (uint64_t) | utc (int64_t)    | channel0 (int16_t) | channel1 (int16_t) | channel2 (int16_t) | channel3 (int16_t) |
| -------------- | ---------------- | ------------------ | ------------------ | ------------------ | ------------------ |
| 250            | 1675465460000000 | 0                  | 100                | 200                | 300                |
| 1250           | 1675465461000153 | 2                  | 101                | 200                | 299                |
| 2250           | 1675465462000512 | 4                  | 103                | 201                | 301                |
| 3251           | 1675465463000913 | 7                  | 104                | 202                | 302                |
| 4251           | 1675465464001300 | 9                  | 105                | 203                | 303                |

Compresses down by 3.2x in example here, extrapolating to 5.9x per 251 byte packet if example continued.

| soc_bits | utc_bits | channel0_bits | channel1_bits | channel2_bits | channel3_bits |
| -------- | -------- | ------------- | ------------- | ------------- | ------------- | --- |
| 16       | 64       | 8             | 8             | 16            | 16            |
| 17       | 40       | 6             | 6             | 6             | 1             | 6   |
| 1        | 10       | 1             | 6             | 6             | 6             |
| 6        | 10       | 6             | 6             | 1             | 6             |
| 6        | 10       | 6             | 1             | 1             | 1             |

See the docs for more info.

### Best-case Compression Example

For maximal compression ratio, a linear sequence of integers, such as an incrementing integer, has a delta-delta of 0. In this trivialized example, we have the smallest delta-delta, 0. A second pass with LZ4 or ZSTD would compress this down to basically nothing. Similarly, a delta-delta of 0 is equivalent to encoding a constant delta, which would also be highly compressible by a second pass.

### Performance Configuration

The V2 compression scheme includes math for each row happening on the native bit-width. i64 math can be very expensive on 32-bit microcontrollers, so the internal data structure uses usize for storing bits. To use i64 on 32-bit systems, specify the bit-wdith to use that will not overflow from sample to sample. You can explicitly select the bit-width to use like so:

```rust
use tsz_compress::prelude::*;
#[derive(Copy, Clone, CompressV2, DecompressV2)]
pub struct AbcdRow {
    #[tsz(delta = "i32")]
    pub ts: i64,
    pub a: i8,
    pub b: i16,
    #[tsz(delta = "i16")]
    pub c: i32,
    #[tsz(delta = "i8")]
    pub d: i64,
}
```

This allows the first two rows to use the normal column width, then all delta/delta-delta instructions operate on the specified bit-width. For example, the epoch timestamp in microseconds may be 8 bytes on the first and second row, then a 50Hz analog front-end will have deltas around 20000 microseconds calculated with 32 bits for the rest of the compression.