sitescraper

Crates.iositescraper
lib.rssitescraper
version0.2.1
sourcesrc
created_at2021-11-01 21:57:41.105288
updated_at2023-12-04 18:12:25.553569
descriptionScraping Websites in Rust!
homepage
repositoryhttps://github.com/floscodes/rust-sitescraper
max_upload_size
id475450
size33,705
flopetautschnig (floscodes)

documentation

README

forthebadge made-with-rust

Scraping Websites! crates.io

Examples:

Get InnerHTML:

let html = "<html><body><div>Hello World!</div></body></html>";
     
let dom = sitescraper::parse_html(html).unwrap();
     
let filtered_dom = dom.filter("body");
     
println!("{}", filtered_dom.get_inner_html());
//Output: <div>Hello World!</div>

Get Text:

let html = "<html><body><div>Hello World!</div></body></html>";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter("body");

println!("{}", filtered_dom.get_text());
//Output: Hello World!

Get Text from single Tags:

use sitescraper;

let html = "<html><body><div>Hello World!</div></body></html>";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter("div");

println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!

Works also with

get_inner_html()

Filter by tag-name, attribute-name and attribute-value using a tuple:

use sitescraper;
 
let html = "<html><body><div id='hello'>Hello World!</div></body></html>";
 
let dom = sitescraper::parse_html(html).unwrap();
 
let filtered_dom = dom.filter(("div", "id", "hello"));
 
println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!

Works also with a tuple consisting of two string literals

let filtered_dom = dom.filter(("div", "id"));

You can also leave arguments out by passing "*" or "":

use sitescraper;

let html = "<html><body><div id="hello">Hello World!</div></body></html>";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter(("*", "id", "hello"));

println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!

or

use sitescraper;

let html = "<html><body><div id="hello">Hello World!</div></body></html>";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter(("", "", "hello"));

println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!

Get Website-Content:

use sitescraper;

let html = sitescraper::http::get("http://example.com/).await.unwrap();

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = sitescraper::filter!(dom, "div");

println!("{}", filtered_dom.get_inner_html());

Commit count: 25

cargo fmt