Crates.io | piz |
lib.rs | piz |
version | 0.5.1 |
source | src |
created_at | 2020-06-19 06:03:44.558217 |
updated_at | 2022-09-06 07:49:11.62152 |
description | piz (a Parallel Implementation of Zip) is a ZIP archive reader designed to concurrently decompress files using a simple API. |
homepage | |
repository | https://github.com/mrkline/piz-rs |
max_upload_size | |
id | 255594 |
size | 97,291 |
piz is a Zip archive reader designed to decompress any number of files concurrently using a simple API:
// For smaller files,
//
// let bytes = fs::read("foo.zip")
// let archive = ZipArchive::new(&bytes)?;
//
// works just fine. Memory map larger files!
let zip_file = File::open("foo.zip")?;
let mapping = unsafe { Mmap::map(&zip_file)? };
let archive = ZipArchive::new(&mapping)?;
// We can iterate through the entries in the archive directly...
//
// for entry in archive.entries() {
// let mut reader = archive.read(entry)?;
// // Read away!
// }
//
// ...but ZIP doesn't guarantee that entries are in any particular order,
// that there aren't duplicates, that an entry has a valid file path, etc.
// Let's do some validation and organize them into a tree of files and folders.
let tree = as_tree(archive.entries())?;
// With that done, we can get a file (or directory)'s metadata from its path.
let metadata = tree.lookup("some/specific/file")?;
// And read the file out, if we'd like:
let mut reader = archive.read(metadata)?;
let mut save_to = File::create(&metadata.file_name)?;
io::copy(&mut reader, &mut save_to)?;
// Readers are `Send`, so we can read out as many as we'd like in parallel.
// Here we'll use Rayon to read out the whole archive with all cores:
tree.files()
.par_bridge()
.try_for_each(|entry| {
if let Some(parent) = entry.file_name.parent() {
// Create parent directories as needed.
fs::create_dir_all(parent)?;
}
let mut reader = archive.read(entry)?;
let mut save_to = File::create(&entry.file_name)?;
io::copy(&mut reader, &mut save_to)?;
Ok(())
})?;
Zip is an interesting archive format: unlike compressed tarballs often seen
in Linux land (*.tar.gz
, *.tar.zst
, ...),
each file in a Zip archive is compressed independently,
with a central directory telling us where to find it.
This allows us to extract multiple files simultaneously so long as we can
read from multiple places at once.
Users can either read the entire archive into memory, or, for larger archives, memory-map the file. (On 64-bit systems, this allows us to treat archives as a contiguous byte range even if the file is much larger than physical RAM. 32-bit systems are limited by address space to archives under 4 GB, but piz should be well-behaved if the archive is small enough.)
See unzip/
for a simple CLI example that unzips a provided file
into the current directory.
test_harness/
contains some smoke tests against a few inputs, e.g.:
If it doesn't find these files, it creates them with a shell script (which assumes a Unix-y environment).
Piz currently provides limited metadata for each file (path, size, CRC32, last-modified time, etc.). Additional info - like file permissions - should be added later. Support for compression algorithms besides DEFLATE (like Bzip2) could also be added.
Many thanks to
Hans Wennborg for their fantastic article, Zip Files: History, Explanation and Implementation
Mathijs van de Nes's zip-rs, the main inspiration of this project and a great example of a Zip decoder in Rust