Crates.io | hyraigne |
lib.rs | hyraigne |
version | 0.1.4 |
source | src |
created_at | 2021-03-18 09:33:47.800708 |
updated_at | 2021-04-07 00:42:11.858286 |
description | Web spiders to scrap various man{ga,hua;hwa}s websites |
homepage | |
repository | https://github.com/TehUncleDolan/spiders |
max_upload_size | |
id | 370468 |
size | 1,722,214 |
Hyraigne is a library that provides web spiders (a.k.a. web crawlers) to scrape
websites like webtoons.com
or mangadex.org
and helps you download chapters
from there.
Here's a simple example that download a series from webtoons.com
:
use url::Url;
fn main() {
let url = Url::parse("https://www.webtoons.com/en/fantasy/tower-of-god/list?title_no=95")
.expect("invalid URL");
let opts = hyraigne::Options::new(1000, 3, "/home/me/Webtoons".into());
let filter = hyraigne::Filter::new(0..=u16::MAX, None, Vec::new());
let spider = hyraigne::get_spider_for(&url, opts).expect("unsupported URL");
let series = spider.get_series(&url)
.expect("failed to scrape series info");
let chapters = spider.get_chapters(&series, filter)
.expect("failed to scrape chapter list");
spider.mkdir(&chapters).expect("failed to setup workdir");
for chapter in chapters {
let pages = spider.get_pages(&chapter)
.expect("failed to scrape page list");
spider.download(&pages)
.expect("failed to download pages");
}
}
“Hyraigne” is an old word, from Middle French, for “spider”.