archive-pdf-urls

Crates.ioarchive-pdf-urls
lib.rsarchive-pdf-urls
version0.4.1
sourcesrc
created_at2024-03-27 11:34:47.355992
updated_at2024-06-17 10:52:50.643153
descriptionExtract all links from a PDF and archive the URLs in the Internet Archive's Wayback Machine
homepage
repositoryhttps://github.com/thoth-pub/archive-pdf-urls/
max_upload_size
id1187811
size77,326
Javier Arias (ja573)

documentation

README

Archive PDF URLs

This command-line tool extracts URLs from a PDF file and archives them using the Wayback Machine.

Build status Crates.io

Installation

You can build and install the tool using Cargo:

cargo install archive-pdf-urls

Usage

The tool reads URLs from standard input, one URL per line, and archives them using the Wayback Machine.

Example usage:

archive-pdf-urls file.pdf --exclude https://some.pattern/\*

Docker usage

docker run --rm -v ./file.pdf:/file.pdf ghcr.io/thoth-pub/archive-pdf-urls file.pdf
Commit count: 55

cargo fmt