Crates.io | suckit |
lib.rs | suckit |
version | 0.2.0 |
source | src |
created_at | 2020-08-16 14:54:31.565415 |
updated_at | 2022-04-28 20:39:11.809324 |
description | SuckIT, Suck the InTernet |
homepage | https://github.com/skallwar/suckit |
repository | https://github.com/skallwar/suckit |
max_upload_size | |
id | 277254 |
size | 112,062 |
SuckIT
allows you to recursively visit and download a website's content to
your disk.
USAGE:
suckit [FLAGS] [OPTIONS] <url>
FLAGS:
-c, --continue-on-error Flag to enable or disable exit on error
--dry-run Do everything without saving the files to the disk
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Enable more information regarding the scraping process
--visit-filter-is-download-filter Use the dowload filter in/exclude regexes for visiting as well
OPTIONS:
-a, --auth <auth>...
HTTP basic authentication credentials space-separated as "username password host". Can be repeated for
multiple credentials as "u1 p1 h1 u2 p2 h2"
--delay <delay>
Add a delay in seconds between downloads to reduce the likelihood of getting banned [default: 0]
-d, --depth <depth>
Maximum recursion depth to reach when visiting. Default is -1 (infinity) [default: -1]
-e, --exclude-download <exclude-download>
Regex filter to exclude saving pages that match this expression [default: $^]
--exclude-visit <exclude-visit>
Regex filter to exclude visiting pages that match this expression [default: $^]
--ext-depth <ext-depth>
Maximum recursion depth to reach when visiting external domains. Default is 0. -1 means infinity [default:
0]
-i, --include-download <include-download>
Regex filter to limit to only saving pages that match this expression [default: .*]
--include-visit <include-visit>
Regex filter to limit to only visiting pages that match this expression [default: .*]
-j, --jobs <jobs> Maximum number of threads to use concurrently [default: 1]
-o, --output <output> Output directory
--random-range <random-range>
Generate an extra random delay between downloads, from 0 to this number. This is added to the base delay
seconds [default: 0]
-t, --tries <tries> Maximum amount of retries on download failure [default: 20]
-u, --user-agent <user-agent> User agent to be used for sending requests [default: suckit]
ARGS:
<url> Entry point of the scraping
A common use case could be the following:
suckit http://books.toscrape.com -j 8 -o /path/to/downloaded/pages/
As of right now, SuckIT
does not work on Windows.
To install it, you need to have Rust installed.
Check out this link for instructions on how to install Rust.
If you just want to install the suckit executable, you can simply run
cargo install --git https://github.com/skallwar/suckit
Now, run it from anywhere with the suckit
command.
suckit
can be installed from available AUR packages using an AUR helper. For example,
yay -S suckit
Want to contribute ? Feel free to open an issue or submit a PR !
SuckIT is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0)
See LICENSE-APACHE and LICENSE-MIT for details.