| Crates.io | safe_unzip |
| lib.rs | safe_unzip |
| version | 0.1.6 |
| created_at | 2025-12-31 20:26:59.145631+00 |
| updated_at | 2026-01-06 04:08:04.858901+00 |
| description | Secure zip extraction. Prevents Zip Slip and Zip Bombs. |
| homepage | https://tenuo.dev |
| repository | https://github.com/tenuo-ai/safe_unzip |
| max_upload_size | |
| id | 2015296 |
| size | 346,422 |
Secure archive extraction. Supports ZIP (core), TAR, and 7z (optional features).
Zip files can contain malicious paths that escape the extraction directory:
import zipfile
zipfile.ZipFile("evil.zip").extractall("/var/uploads")
# Extracts ../../etc/cron.d/pwned β /etc/cron.d/pwned π
This is Zip Slip, and Python's default behavior is still vulnerable.
Sort of. Python added warnings and ZipInfo.filename sanitization in 2014. In Python 3.12+, there's a filter parameter:
# The "safe" way β but who knows this exists?
zipfile.ZipFile("evil.zip").extractall("/var/uploads", filter="data")
The problem: the safe option is opt-in. The default is still vulnerable. Most developers don't read the docs carefully enough to discover filter="data".
safe_unzip makes security the default, not an afterthought.
use safe_unzip::extract_file;
extract_file("/var/uploads", "evil.zip")?;
// Err(PathEscape { entry: "../../etc/cron.d/pwned", ... })
# Python bindings β same safety
from safe_unzip import extract_file
extract_file("/var/uploads", "evil.zip")
# Raises: PathEscapeError
Security is the default. No special flags, no opt-in safety. Every path is validated. Malicious archives are rejected, not extracted.
zip / tar / zipfile?Because archive extraction is a security boundary, and most libraries treat it as a convenience function.
| Library | Default Behavior | Safe Option |
|---|---|---|
Python zipfile |
Vulnerable | filter="data" (opt-in, obscure) |
Python tarfile |
Vulnerable | filter="data" (opt-in, Python 3.12+) |
Rust zip |
Vulnerable | Manual path validation |
Rust tar |
Vulnerable | Manual path validation |
safe_unzip |
Safe by default | N/A β always safe |
If you're extracting untrusted archives, you need a library designed for that threat model.
If your zip files only come from trusted sources you control, the standard zip crate is fine. If users can upload archives, use safe_unzip.
safe_unzip command with --list, --verify, limits, and filteringonly() or glob patternsO_EXCLCLI:
cargo install safe_unzip --features cli
Rust:
[dependencies]
safe_unzip = "0.1"
Python:
pip install safe-unzip
| Feature | Default | Description |
|---|---|---|
tar |
β | TAR/TAR.GZ extraction |
async |
β | Tokio-based async API |
sevenz |
β | 7z extraction (heavier deps) |
# ZIP only (smallest, ~30 deps)
safe_unzip = "0.1"
# With TAR support (~40 deps)
safe_unzip = { version = "0.1", features = ["tar"] }
# With async API
safe_unzip = { version = "0.1", features = ["async"] }
# Kitchen sink (~85 deps)
safe_unzip = { version = "0.1", features = ["tar", "async", "sevenz"] }
Note: Python bindings always include TAR support.
The Python bindings are thin wrappers over the Rust implementation via PyO3. This means:
Security reviewers: the Python API is a direct binding, not a port.
# Extract archive to destination
safe_unzip archive.zip -d /var/uploads
# List contents without extracting
safe_unzip archive.zip --list
# Verify integrity (CRC32 check)
safe_unzip archive.zip --verify
# With limits
safe_unzip archive.zip -d /var/uploads --max-size 100M --max-files 1000
# Glob filtering
safe_unzip archive.zip -d /var/uploads --include "**/*.py" --exclude "**/test_*"
# Partial extraction
safe_unzip archive.zip -d /var/uploads --only README.md --only LICENSE
# Verbose output
safe_unzip archive.zip -d /var/uploads -v
use safe_unzip::extract_file;
// Extract with safe defaults
let report = extract_file("/var/uploads", "archive.zip")?;
println!("Extracted {} files ({} bytes)",
report.files_extracted,
report.bytes_written
);
use safe_unzip::Extractor;
let report = Extractor::new("/var/uploads")?
.extract_file("archive.zip")?;
use safe_unzip::Extractor;
// Extractor::new() errors if destination doesn't exist (catches typos)
// Extractor::new_or_create() creates it automatically
let report = Extractor::new_or_create("/var/uploads/new_folder")?
.extract_file("archive.zip")?;
// The convenience functions (extract_file, extract) also create automatically
use safe_unzip::extract_file;
extract_file("/var/uploads/new_folder", "archive.zip")?;
use safe_unzip::{Extractor, Limits};
let report = Extractor::new("/var/uploads")?
.limits(Limits {
max_total_bytes: 500 * 1024 * 1024, // 500 MB total
max_file_count: 1_000, // Max 1000 files
max_single_file: 50 * 1024 * 1024, // 50 MB per file
max_path_depth: 10, // No deeper than 10 levels
})
.extract_file("archive.zip")?;
use safe_unzip::Extractor;
// Only extract images
let report = Extractor::new("/var/uploads")?
.filter(|entry| {
entry.name.ends_with(".png") ||
entry.name.ends_with(".jpg") ||
entry.name.ends_with(".gif")
})
.extract_file("archive.zip")?;
println!("Extracted {} images, skipped {} other files",
report.files_extracted,
report.entries_skipped
);
Extract specific files by name or glob pattern:
use safe_unzip::Extractor;
// Extract only specific files
let report = Extractor::new("/var/uploads")?
.only(&["README.md", "LICENSE"])
.extract_file("archive.zip")?;
// Include by glob pattern
let report = Extractor::new("/var/uploads")?
.include_glob(&["**/*.py", "**/*.rs"])
.extract_file("archive.zip")?;
// Exclude by glob pattern
let report = Extractor::new("/var/uploads")?
.exclude_glob(&["**/__pycache__/**", "**/*.pyc"])
.extract_file("archive.zip")?;
Python:
from safe_unzip import Extractor
# Extract only specific files
report = Extractor("/var/uploads").only(["README.md", "LICENSE"]).extract_file("archive.zip")
# Include by pattern
report = Extractor("/var/uploads").include_glob(["**/*.py"]).extract_file("archive.zip")
# Exclude by pattern
report = Extractor("/var/uploads").exclude_glob(["**/__pycache__/**"]).extract_file("archive.zip")
Monitor extraction progress:
use safe_unzip::Extractor;
let report = Extractor::new("/var/uploads")?
.on_progress(|p| {
println!("[{}/{}] {} ({} bytes)",
p.entry_index + 1,
p.total_entries,
p.entry_name,
p.entry_size
);
})
.extract_file("archive.zip")?;
Python:
from safe_unzip import Extractor
def show_progress(p):
print(f"[{p['entry_index']+1}/{p['total_entries']}] {p['entry_name']}")
Extractor("/var/uploads").on_progress(show_progress).extract_file("archive.zip")
# Or with tqdm for a progress bar
from tqdm import tqdm
entries = list_zip_entries("archive.zip")
pbar = tqdm(total=len(entries))
def update_bar(p):
pbar.update(1)
pbar.set_description(p['entry_name'])
Extractor("/var/uploads").on_progress(update_bar).extract_file("archive.zip")
pbar.close()
Check archive integrity without extracting:
use safe_unzip::verify_file;
// Verify CRC32 for all entries
let report = verify_file("archive.zip")?;
println!("Verified {} entries ({} bytes)",
report.entries_verified,
report.bytes_verified
);
Or with the Extractor (useful if you want to verify then extract):
use safe_unzip::Extractor;
let extractor = Extractor::new("/var/uploads")?;
// First verify
extractor.verify_file("archive.zip")?;
// Then extract (re-reads archive, but guarantees integrity)
extractor.extract_file("archive.zip")?;
use safe_unzip::{Extractor, OverwritePolicy};
// Skip files that already exist
let report = Extractor::new("/var/uploads")?
.overwrite(OverwritePolicy::Skip)
.extract_file("archive.zip")?;
// Or overwrite them
let report = Extractor::new("/var/uploads")?
.overwrite(OverwritePolicy::Overwrite)
.extract_file("archive.zip")?;
// Default: Error if file exists
let report = Extractor::new("/var/uploads")?
.overwrite(OverwritePolicy::Error) // This is the default
.extract_file("archive.zip")?;
use safe_unzip::{Extractor, SymlinkPolicy};
// Default: silently skip symlinks
let report = Extractor::new("/var/uploads")?
.symlinks(SymlinkPolicy::Skip)
.extract_file("archive.zip")?;
// Or reject archives containing symlinks
let report = Extractor::new("/var/uploads")?
.symlinks(SymlinkPolicy::Error)
.extract_file("archive.zip")?;
| Mode | Speed | On Failure | Use When |
|---|---|---|---|
Streaming (default) |
Fast (1 pass) | Partial files remain | Speed matters; you'll clean up on error |
ValidateFirst |
Slower (2 passes) | No files if validation fails | Can't tolerate partial state |
β οΈ Neither mode is truly atomic. If extraction fails mid-write (e.g., disk full), partial files remain regardless of mode. ValidateFirst only prevents writes when validation fails (bad paths, limits exceeded), not when I/O fails during extraction.
use safe_unzip::{Extractor, ExtractionMode};
// Two-pass extraction:
// 1. Validate ALL entries (no disk writes)
// 2. Extract (only if validation passed)
let report = Extractor::new("/var/uploads")?
.mode(ExtractionMode::ValidateFirst)
.extract_file("untrusted.zip")?;
Use ValidateFirst when you can't tolerate partial state from malicious archives. Use Streaming (default) when speed matters and you can clean up on error.
use safe_unzip::Extractor;
use std::io::Cursor;
let zip_bytes: Vec<u8> = download_zip_somehow();
let cursor = Cursor::new(zip_bytes);
let report = Extractor::new("/var/uploads")?
.extract(cursor)?;
use safe_unzip::{Driver, TarAdapter};
// Extract a .tar file
let report = Driver::new("/var/uploads")?
.extract_tar_file("archive.tar")?;
// Extract a .tar.gz file
let report = Driver::new("/var/uploads")?
.extract_tar_gz_file("archive.tar.gz")?;
// With options
let report = Driver::new("/var/uploads")?
.filter(|entry| entry.name.ends_with(".txt"))
.validation(safe_unzip::ValidationMode::ValidateFirst)
.extract_tar_file("archive.tar")?;
The new Driver API provides a unified interface for all archive formats with the same security guarantees.
sevenz Feature)Enable the sevenz feature:
[dependencies]
safe_unzip = { version = "0.1", features = ["sevenz"] }
use safe_unzip::{Driver, SevenZAdapter};
// Extract a .7z file
let report = Driver::new("/var/uploads")?
.extract_7z_file("archive.7z")?;
// Or from bytes
let report = Driver::new("/var/uploads")?
.extract_7z_bytes(&seven_z_bytes)?;
Note: 7z archives are fully decompressed into memory before extraction, so large archives may use significant RAM.
Python:
from safe_unzip import extract_7z_file, Extractor
# Simple extraction
report = extract_7z_file("/var/uploads", "archive.7z")
# With options
report = Extractor("/var/uploads").extract_7z_file("archive.7z")
Enable the async feature for tokio-based async extraction:
[dependencies]
safe_unzip = { version = "0.1", features = ["async"] }
use safe_unzip::r#async::{extract_file, extract_tar_file, AsyncExtractor};
#[tokio::main]
async fn main() -> Result<(), safe_unzip::Error> {
// Simple async extraction
let report = extract_file("/var/uploads", "archive.zip").await?;
// TAR extraction
let report = extract_tar_file("/var/uploads", "archive.tar").await?;
// With options
let report = AsyncExtractor::new("/var/uploads")?
.max_total_bytes(500 * 1024 * 1024)
.max_file_count(1000)
.extract_file("archive.zip")
.await?;
Ok(())
}
Concurrent extraction of multiple archives:
use safe_unzip::r#async::{extract_file, extract_tar_bytes};
let (zip_result, tar_result) = tokio::join!(
extract_file("/uploads/a", "first.zip"),
extract_tar_bytes("/uploads/b", tar_data),
);
The async API uses spawn_blocking internally, so extraction runs in a thread pool without blocking the async runtime.
Python async support uses asyncio.to_thread() to run extraction in a thread pool:
import asyncio
from safe_unzip import async_extract_file, AsyncExtractor
async def main():
# Simple async extraction
report = await async_extract_file("/var/uploads", "archive.zip")
# TAR extraction
from safe_unzip import async_extract_tar_file
report = await async_extract_tar_file("/var/uploads", "archive.tar")
# With options
report = await (
AsyncExtractor("/var/uploads")
.max_total_mb(500)
.max_files(1000)
.extract_file("archive.zip")
)
asyncio.run(main())
Concurrent extraction:
async def extract_all(archives):
tasks = [
async_extract_file(f"/uploads/{i}", path)
for i, path in enumerate(archives)
]
return await asyncio.gather(*tasks)
| Threat | Attack Vector | Defense |
|---|---|---|
| Zip Slip | Entry named ../../etc/cron.d/pwned |
path_jail validates every path |
| Zip Bomb (size) | 42KB β 4PB expansion | max_total_bytes limit + streaming enforcement |
| Zip Bomb (count) | 1 million empty files | max_file_count limit |
| Zip Bomb (lying) | Declared 1KB, decompresses to 1GB | Strict size reader detects mismatch |
| Symlink Escape | Symlink to /etc/passwd |
Skip or reject symlinks |
| Symlink Overwrite | Create symlink, then overwrite target | Symlinks removed before overwrite |
| Path Depth | a/b/c/.../1000levels |
max_path_depth limit |
| Invalid Filename | Control chars, CON, NUL |
Filename sanitization |
| Overwrite | Replace sensitive files | OverwritePolicy::Error default |
| Setuid | Create setuid executables | Permission bits stripped |
| Encrypted Archives | Password handling complexity | Rejected (see Encrypted Archives) |
| Limit | Default | Description |
|---|---|---|
max_total_bytes |
1 GB | Total uncompressed size |
max_file_count |
10,000 | Number of files |
max_single_file |
100 MB | Largest single file |
max_path_depth |
50 | Directory nesting depth |
use safe_unzip::{extract_file, Error};
match extract_file("/var/uploads", "archive.zip") {
Ok(report) => {
println!("Success: {} files", report.files_extracted);
}
Err(Error::PathEscape { entry, detail }) => {
eprintln!("Blocked path traversal in '{}': {}", entry, detail);
}
Err(Error::TotalSizeExceeded { limit, would_be }) => {
eprintln!("Archive too large: {} bytes (limit: {})", would_be, limit);
}
Err(Error::FileTooLarge { entry, size, limit }) => {
eprintln!("File '{}' too large: {} bytes (limit: {})", entry, size, limit);
}
Err(Error::FileCountExceeded { limit }) => {
eprintln!("Too many files (limit: {})", limit);
}
Err(Error::AlreadyExists { path }) => {
eprintln!("File already exists: {}", path);
}
Err(Error::InvalidFilename { entry, .. }) => {
eprintln!("Invalid filename: {}", entry);
}
Err(Error::EncryptedEntry { entry }) => {
eprintln!("Encrypted entry not supported: {}", entry);
}
Err(e) => {
eprintln!("Extraction failed: {}", e);
}
}
ValidateFirst mode caches entries in memorysafe_unzip does not support password-protected zip files. Encrypted entries are rejected with Error::EncryptedEntry.
If you need to extract encrypted archives:
zip crate directlysafe_unzipThis is intentionalβencryption handling is outside our security scope. Password management, key derivation, and cryptographic validation are complex domains that deserve dedicated tooling.
ExtractionMode::ValidateFirst to validate before writing.ValidateFirst mode, limits are checked against ALL entries. Filtered entries still count toward limits. This is conservative: validation may reject archives that would succeed with filtering.These threats are not fully addressed (by design or complexity):
| Limitation | Reason |
|---|---|
| Case-insensitive collisions | On Windows/macOS, File.txt and file.txt map to the same file. We don't track extracted names to detect this. |
| Unicode normalization | cafΓ© (NFC) vs cafΓ© (NFD) appear identical but are different bytes. Full normalization requires ICU. |
| Concurrent extraction | If multiple threads/processes extract to the same destination, race conditions can occur. Use file locking or separate destinations. |
| Sparse file attacks | Not applicable to zip format. |
| Hard links | Zip format doesn't support hard links. |
| Device files | Zip format doesn't support special device files. |
For OverwriteMode::Error and OverwriteMode::Skip, we use atomic file creation (O_CREAT | O_EXCL) instead of check-then-create. This eliminates race conditions between checking if a file exists and creating it.
For OverwriteMode::Overwrite, symlinks are removed before writing to prevent symlink-following attacks, but there's a brief window between removal and creation.
These filenames are rejected for security:
\) β prevents Windows path separator confusionCON, PRN, AUX, NUL, COM1-9, LPT1-9We use cargo-fuzz with two targets:
# Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz
# Run the main extraction fuzzer
cargo +nightly fuzz run fuzz_extract
# Run the adapter fuzzer (tests parsing layer)
cargo +nightly fuzz run fuzz_zip_adapter
Fuzzing targets are in fuzz/fuzz_targets/. Run fuzzing before releases to catch parsing edge cases.
MIT OR Apache-2.0