Crates.io | lzd |
lib.rs | lzd |
version | 0.1.1 |
source | src |
created_at | 2021-07-18 06:51:16.86337 |
updated_at | 2021-07-29 02:17:16.847988 |
description | LZ double-factor factorization |
homepage | https://github.com/kampersanda/lzd-rs |
repository | https://github.com/kampersanda/lzd-rs |
max_upload_size | |
id | 424307 |
size | 792,156 |
This library provides a Rust implementation of LZ double-factor factorization, an efficient grammar-based compression algorithm, proposed in the paper:
K Goto, H Bannai, S Inenaga, and M Takeda. LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding. In CPM, 2015.
use lzd::compressor::Compressor;
fn main() {
// Input text
let text = "abaaabababaabbabab".as_bytes();
// Factorization
let mut factors = Vec::new();
let defined_factors = Compressor::run(text, |id: usize| {
factors.push(id);
});
// Output factors
println!("factors: {:?}", factors);
// Statistics
println!("defined_factors: {:?}", defined_factors);
}
The output will be
factors: [97, 98, 97, 97, 256, 256, 256, 257, 98, 98, 258]
defined_factors: 261
NOTE: In this implementation, all 256 single characters are predefined as factors, so the number of factors defined will become 261.
use lzd::decompressor::Decompressor;
fn main() {
// Input text
let factors = [97, 98, 97, 97, 256, 256, 256, 257, 98, 98, 258];
// Defactorization
let mut text = String::new();
Decompressor::run(&factors, |c: u8| {
text.push(c as char);
});
// Decoded text
println!("text: {:?}", text);
}
The output will be
text: "abaaabababaabbabab"
This library provides two command line tools for compression and decompression. The tools will print the command line options by specifying the parameter -h
.
In the tools, LZ factors are serialized into a binary stream, in the same manner as tdc::BitCorder
of tudocomp.
lzd
commandIt compresses an input data and writes the result into a file with the extension lzd
. In the following case, english.50MB.lzd
will be written as the compressed file.
$ lzd english.50MB
Compressed filename will be /home/kampersanda/dataset/pizzachili/text/english/english.50MB.lzd
52428800 bytes were compressed into 16426243 bytes (31.33%)
52428800 characters were factorized into 6354129 LZD-factors (12.12%)
3177320 LZD-factors were defined
unlzd
commandIt decompresses a compressed file and writes the original data into a file without the extension lzd
. In the following case, english.50MB
will be written as the decompressed file.
$ ./target/release/unlzd english.50MB.lzd
This library is free software provided under MIT.