| Crates.io | lzr |
| lib.rs | lzr |
| version | 0.0.0 |
| created_at | 2025-04-26 15:47:49.713303+00 |
| updated_at | 2025-04-26 15:47:49.713303+00 |
| description | LZ77-based Compression Program/Library written in Rust. |
| homepage | |
| repository | https://github.com/ckmjreynolds/lzr |
| max_upload_size | |
| id | 1650376 |
| size | 18,568 |
This project started as an initial project to learn Rust. There were no specific goals other than learning and by extension readability. Since this was first and foremost a learning experience, I avoided referencing any existing code. Though "based on" existing algorithms, no attempt was made to faithfully follow any existing algorithm.
Note: All fields are big-endian.
magic |
sequences |
length |
checksum |
|---|---|---|---|
0x4C5A5200 |
See below. | 32-bit | Adler-32 |
magic - magic number ASCII "LZR" plus a version byte, currently 0x00.
sequences - is a list of LZR sequences.
length - is the lower 32-bits of the actual length of the original file, a file of length 2^32 bytes will have 0 here.
checksum - is the Adler-32 checksum of the original file.
This stage is roughly LZ77, at least as described by Wikipedia. This specification assumes existing knowledge of LZ77. There are four sequence formats which can be determined by the first one to three bits. There is always at least one literal byte. When distance is 0, the length encodes the number of literal bytes included. When distance is not 0, the length encodes the number of bytes to repeat and then one literal byte is also included.
format |
distance |
length |
literal(s) |
|---|---|---|---|
1-bit(0b0) |
5-bits | 2-bits | (1-4)-bytes |
format - is 0b0 for short repeat sequences.
distance - encodes a distance within the last 31 bytes for a repeat or 0 for a literal sequence.
length - encodes a repeat or literal length of 1...4 bytes.
literal(s) - are the literal bytes to be copied to the output.
format |
distance |
length |
literal(s) |
|---|---|---|---|
2-bits(0b10) |
10-bits | 4-bits | (2-17)-bytes |
format - is 0b10 for medium repeat sequences.
distance - encodes a distance within the last 1,023 bytes for a repeat or 0 for a literal sequence.
length - encodes a repeat or literal length of 2...17 bytes.
literal(s) - are the literal bytes to be copied to the output.
format |
distance |
length |
literal(s) |
|---|---|---|---|
3-bits(0b110) |
13-bits | 8-bits | (3-258)-bytes |
format - is 0b110 for long repeat sequences.
distance - encodes a distance within the last 8,191 bytes for a repeat or 0 for a literal sequence.
length - encodes a repeat or literal length of 3...258 bytes.
literal(s) - are the literal bytes to be copied to the output.
format |
distance |
length |
literal(s) |
|---|---|---|---|
3-bits(111b) |
19-bits | 10-bits | (4-1,027)-bytes |
format - is 0b111 for extended repeat sequences.
distance - encodes a distance within the last 524,287 bytes for a repeat or 0 for a literal sequence.
length - encodes a repeat or literal length of 4...1,027 bytes.
literal(s) - are the literal bytes to be copied to the output.