# fast-smaz

Pure Rust implementation of smaz algorithm, very close to the original [antirez/smaz](https://github.com/antirez/smaz) 
C implementation with some differences.

## Example

```rust
use fast_smaz::Smaz;

fn main() {
    let my_message = "Hello world!";
    let compressed = my_message.smaz_compress();
    let decompressed = compressed.smaz_decompress().unwrap();
    assert_eq!(my_message.as_bytes(), decompressed);
}
```

## Inspiration

This implementation was inspired in both original [antirez/smaz](https://github.com/antirez/smaz) and the [peterc/rsmaz](https://github.com/peterc/rsmaz),
a Ruby implementation of smaz.

## Comparison

### fast-smaz and [antirez/smaz](https://github.com/antirez/smaz)

This implementation is very similar to the original ANSI C implementation of [SMAZ](https://github.com/antirez/smaz),
it uses the exact same hash-table and hashing algorithm.

I haven't measured how performance compares, but the ANSI C implementation can do a lot of assumptions both tied to 
ANSI C specification and correctness of the implementation, while this Rust implementation has some additional checks
to ensure the safetyness of the code, which may impact the performance, but should not be that impactful.

### fast-smaz and [silentsokolov/rust-smaz](https://github.com/silentsokolov/rust-smaz)

[silentsokolov/rust-smaz](https://github.com/silentsokolov/rust-smaz) uses `lazy_static!` and creates the codebook hash table
at first execution, instead of shipping a pre-generated hash table.

This translates into a slower encoding process (benchmarks on my computer shows 73% slower encoding), not only because of the overhead
of `lazy_static!` initialization, but also the overhead of the atomic operation for initialization check and the 
hash distribution itself.

> I've choosen to keep antirez original bucket-size and distribution because it looked well distributed,
> and performance results shows that it does have a very good performance.

Note that changing `lazy_static!` to [`OnceLock`](https://doc.rust-lang.org/nightly/std/sync/struct.OnceLock.html) does 
not help, since both are very similar.

The performance improvement only applies to the encoding process, the decoding process has a similar performance,
since both [silentsokolov/rust-smaz](https://github.com/silentsokolov/rust-smaz) and **fast-smaz** share a similar code,
which is a simple lookup to the reverse codebook.

Also, **fast-smaz** uses `const` for the codebook and reverse codebook, which allows the compiler to do aggressive optimizations,
this lowers cache-misses and RAM accesses, since `const` variables can (and probably will) be inlined at the use-site.

### Features

`custom-cookbook`: For development only, was used as a testing codebook before the proper translation of the
origional C implementation codebook. It's way more inefficient and with a poor distribution. It was constructed using
the `src/hashing.rs` algorithm, I didn't wast too much time improving it and ensuring its correctness, it was only used
to generate a correct and functional codebook for testing. Don't use this.