| Crates.io | archive-pdf-urls |
| lib.rs | archive-pdf-urls |
| version | 0.5.1 |
| created_at | 2024-03-27 11:34:47.355992+00 |
| updated_at | 2025-03-17 12:25:52.630554+00 |
| description | Extract all links from a PDF and archive the URLs in the Internet Archive's Wayback Machine |
| homepage | |
| repository | https://github.com/thoth-pub/archive-pdf-urls/ |
| max_upload_size | |
| id | 1187811 |
| size | 77,486 |
This command-line tool extracts URLs from a PDF file and archives them using the Wayback Machine.
You can build and install the tool using Cargo:
cargo install archive-pdf-urls
The tool reads URLs from standard input, one URL per line, and archives them using the Wayback Machine.
Example usage:
archive-pdf-urls file.pdf --exclude https://some.pattern/\*
docker run --rm -v ./file.pdf:/file.pdf ghcr.io/thoth-pub/archive-pdf-urls file.pdf