voyager
=========================
[](https://github.com/mattsse/voyager)
[](https://crates.io/crates/voyager)
[](https://docs.rs/voyager)
[](https://github.com/mattsse/voyager/actions?query=branch%3Amain)
With voyager you can easily extract structured data from websites.
Write your own crawler/scraper with voyager following a state machine model.
## Example
The examples use [tokio](https://tokio.rs/) as its runtime, so your `Cargo.toml` could look like this:
```toml
[dependencies]
voyager = { version = "0.1" }
tokio = { version = "1.8", features = ["full"] }
```
### Declare your own Scraper and model
```rust
// Declare your scraper, with all the selectors etc.
struct HackernewsScraper {
post_selector: Selector,
author_selector: Selector,
title_selector: Selector,
comment_selector: Selector,
max_page: usize,
}
/// The state model
#[derive(Debug)]
enum HackernewsState {
Page(usize),
Post,
}
/// The ouput the scraper should eventually produce
#[derive(Debug)]
struct Entry {
author: String,
url: Url,
link: Option,
title: String,
}
```
### Implement the `voyager::Scraper` trait
A `Scraper` consists of two associated types:
* `Output`, the type the scraper eventually produces
* `State`, the type, the scraper can drag along several requests that eventually lead to an `Output`
and the `scrape` callback, which is invoked after each received response.
Based on the state attached to `response` you can supply the crawler with new urls to visit with, or without a state attached to it.
Scraping is done with [causal-agent/scraper](https://github.com/causal-agent/scraper).
```rust
impl Scraper for HackernewsScraper {
type Output = Entry;
type State = HackernewsState;
/// do your scraping
fn scrape(
&mut self,
response: Response,
crawler: &mut Crawler,
) -> Result