scraper_query

Crates.ioscraper_query
lib.rsscraper_query
version0.4.0
sourcesrc
created_at2024-10-22 11:40:21.716065
updated_at2024-11-08 07:23:47.298118
descriptionErgonomic Query for HTML with Scraper
homepagehttps://github.com/ifsheldon/scraper_query
repositoryhttps://github.com/ifsheldon/scraper_query
max_upload_size
id1418545
size23,493
(ifsheldon)

documentation

README

scraper_query

crates.io

scraper_query is a simple tool for you to query components in HTML documents with scraper so that you can easily do simple HTML manipulations, which are common in web crawling and web scraping and data cleaning.

Usage

use scraper::Html;
use scraper_query::*; // use `HTMLIndex`, `Tag`, `class`, `id`
use markup5ever::interface::tree_builder::TreeSink;

let mut document = Html::parse_document(HTML);
let index = HTMLIndex::new(&document);
// find all nodes with class "foo" and "bar"
let node_ids = index.query(class("foo") & class("bar"));
// find all nodes with id "foo"
let node_ids = index.query(id("foo"));  
// find all nodes with tag "h1" and class "foo"
let node_ids = index.query(Tag::H1 & class("foo"));  // same as `Tag::H1.and(class("foo"))`
// find all nodes with tag "h1" and not class "foo"
let node_ids = index.query(Tag::H1 & (!class("foo")));
// simple manipulation
for id in node_ids {
    document.remove_from_parent(&id);
}

License

MIT

Commit count: 29

cargo fmt