extrablatt

Crates.io	extrablatt
lib.rs	extrablatt
version	0.1.1
source	src
created_at	2020-08-22 17:33:13.161346
updated_at	2020-08-22 17:57:42.41447
description	News, articles and text scraper
homepage
repository	https://github.com/mattsse/extrablatt
max_upload_size
id	279581
size	423,251

Matthias Seitz (mattsse)

documentation

https://docs.rs/extrablatt/

README

extrablatt

Customizable article scraping & curation library and CLI. Also runs in Wasm.

Basic Wasm example with some CORS limitations: https://mattsse.github.io/extrablatt/

Inspired by newspaper.

Html Scraping is done via select.rs.

Features

News url identification
Text extraction
Top image extraction
All image extraction
Keyword extraction
Author extraction
Publishing date
References

Customizable for specific news sites/layouts via the Extractor trait.

Documentation

Full Documentation https://docs.rs/extrablatt

Example

Extract all Articles from news outlets.

use extrablatt::Extrablatt;
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {

    let site = Extrablatt::builder("https://some-news.com/")?.build().await?;

    let mut stream = site.into_stream();
    
    while let Some(article) = stream.next().await {
        if let Ok(article) = article {
            println!("article '{:?}'", article.content.title)
        } else {
            println!("{:?}", article);
        }
    }

    Ok(())
}

Command Line

Install

cargo install extrablatt --features="cli"

Usage

USAGE:
    extrablatt <SUBCOMMAND>

SUBCOMMANDS:
    article     Extract a set of articles
    category    Extract all articles found on the page
    help        Prints this message or the help of the given subcommand(s)
    site        Extract all articles from a news source.

Extract a set of specific articles and store the result as json

extrablatt article "https://www.example.com/article1.html", "https://www.example.com/article2.html" -o "articles.json"

License

Licensed under either of these:

Apache License, Version 2.0, (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)

Commit count: 119

extrablatt

documentation

README

extrablatt

Features

Documentation

Example

Command Line

Install

Usage

Extract a set of specific articles and store the result as json

License

cargo fmt