Crates.io | archiveis |
lib.rs | archiveis |
version | 0.4.0 |
source | src |
created_at | 2018-07-29 20:46:56.836379 |
updated_at | 2020-03-10 13:31:52.059264 |
description | Archive websites online using the archive.is capturing service. |
homepage | |
repository | https://github.com/mattsse/archiveis-rs |
max_upload_size | |
id | 76522 |
size | 75,945 |
Provides simple access to the Archive.is Capturing Service. Archive any url and get the corresponding archive.is link in return.
The ArchiveClient
is build with hyper
and uses futures for capturing archive.is links.
use archiveis::ArchiveClient;
use tokio::prelude::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ArchiveClient::default();
let archived = client.capture("http://example.com/").await?;
println!("targeted url: {}", archived.target_url);
println!("url of archived site: {}", archived.archived_url);
println!("archive.is submit token: {}", archived.submit_token);
Ok(())
}
archive.is uses a temporary token to validate a archive request.
The ArchiveClient
capture
function first obtains a new submit token via a GET request. The token is usually valid several minutes, and even if archive.is switched to a new in the meantime token,the older ones are still valid. So if we need to archive multiple links, we can only need to obtain the token once and then invoke the capturing service directly with capture_with_token
for each url. capture_all
returns a Vec of Results of every capturing request, so every single capture request gets executed regardless of the success of prior requests.
use archiveis::ArchiveClient;
use tokio::prelude::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ArchiveClient::default();
// the urls to capture
let urls = vec![
"http://example.com/",
"https://github.com/MattsSe/archiveis-rs",
"https://crates.io",
];
let (archived, failures) : (Vec<_>, Vec<_>) = client.capture_all(urls).await?.into_iter()
.partition(Result::is_ok);
let archived: Vec<_> = archived.into_iter().map(Result::unwrap).collect();
let failures: Vec<_> = failures.into_iter().map(Result::unwrap_err).collect();
if failures.is_empty() {
println!("all links successfully archived.");
} else {
for err in &failures {
if let archiveis::Error::MissingUrl(url) | archiveis::Error::ServerError(url) = err {
println!("Failed to archive url: {}", url);
}
}
}
Ok(())
}
Archive links using the archiveis
commandline application
cargo install archiveis --features cli
SUBCOMMANDS:
file Archive all the links in the line separated text file
links Archive all links provided as arguments
The file
and links
subcommands take the same flags and options (besides there primary target = links or a file)
USAGE:
archiveis links [FLAGS] [OPTIONS] -i <links>...
FLAGS:
-a, --append if the output file already exists, append instead of overwriting the file
--archives-only save only the archive urls
-h, --help Prints help information
--ignore-failures continue anyway if after all retries some links are not successfully archived
-s, --silent do not print anything
-t, --text save output as line separated text instead of json
-V, --version Prints version information
OPTIONS:
-i <links>... all links to should be archived via archive.is
-o <output> save all archived elements
-r, --retries <retries> how many times failed archive attempts should be tried again [default: 0]
Archive a set of links:
archiveis links -i "http://example.com/" "https://github.com/MattsSe/archiveis-rs"
Archive a set of links and safe result to archived.json
, retry failed attempts twice:
archiveis links -i "http://example.com/" "https://github.com/MattsSe/archiveis-rs" -o archived.json --retries 2
Archive all line separated links in file links.txt
and only safe the archive urls line separated to archived.txt
archiveis file -i links.txt -o archived.txt --text --archives-only
By default archiveis
aborts and doesn't output anything if there are still failed archive attempts after all retries. To ignore failures add the --ignore-failures
flag to write output without the failures.
archiveis file -i links.txt -o archived.json --ignore-failures
Licensed under either of these: