Crates.io | cc-downloader |
lib.rs | cc-downloader |
version | 0.1.0 |
source | src |
created_at | 2024-06-15 21:23:06.243621 |
updated_at | 2024-06-15 21:23:06.243621 |
description | A polite and user-friendly downloader for Common Crawl data. |
homepage | |
repository | |
max_upload_size | |
id | 1273133 |
size | 73,540 |
This is an experimental polite downloader for Common Crawl data writter in rust
. Currently it downloads Common Crawl data from the Cloudfront.
s3
Usage: cc-downloader [COMMAND]
Commands:
download-paths Download paths for a given snapshot
download Download files from a crawl
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
------
cc-downloader download -h
Download files from a crawl
Usage: cc-downloader download --path-file <PATHS> --output <OUTPUT> [PROGRESS]
Arguments:
[PROGRESS] Print progress #[arg(short, long)] [possible values: true, false]
Options:
--path-file <PATHS> Path file
-o, --output <OUTPUT> Otput folder
-h, --help Print help
------
cc-downloader download-paths -h
Download paths for a given snapshot
Usage: cc-downloader download-paths --snapshot <SNAPSHOT> --data-type <PATHS> --output <OUTPUT> [PROGRESS]
Arguments:
[PROGRESS] Print progress #[arg(short, long)] [possible values: true, false]
Options:
--snapshot <SNAPSHOT> Crawl reference
--data-type <PATHS> Data type
-o, --output <OUTPUT> Otput folder
-h, --help Print help