| Crates.io | web-parser |
| lib.rs | web-parser |
| version | 0.1.3 |
| created_at | 2025-06-11 22:01:01.454705+00 |
| updated_at | 2025-11-29 16:17:12.427264+00 |
| description | This website parser library allows asynchronous search, fetching and extracting data from web-pages in multiple formats. |
| homepage | |
| repository | https://github.com/fuderis/rs-web-parser |
| max_upload_size | |
| id | 1709163 |
| size | 20,733,040 |
This web page parser library allows asynchronous fetching and extracting of data from web pages in multiple formats.
search).SearchEngine trait (feature search).This tool is well-suited for web scraping and data extraction tasks, offering flexible parsing of HTML, plain text, and JSON to enable comprehensive data gathering from various web sources.
Requires the chromedriver tool installed!
use web_parser::prelude::*;
use macron::path;
#[tokio::main]
async fn main() -> Result<()> {
// WEB SEARCH:
let chrome_path = path!("bin/chromedriver/chromedriver.exe");
let session_path = path!("%/ChromeDriver/WebSearch");
// start search engine:
let mut engine = SearchEngine::<Duck>::new(
chrome_path,
Some(session_path),
false,
).await?;
println!("Searching results..");
// send search query:
let results = engine.search(
"Rust (programming language)", // query
&["support.google.com", "youtube.com"], // black list
1000 // sleep in millis
).await;
// handle search results:
match results {
Ok(cites) => {
println!("Result cites list: {:#?}", cites.get_urls());
/*
println!("Reading result pages..");
let contents = cites.read(
5, // cites count to read
&[ // tag name black list
"header", "footer", "style", "script", "noscript",
"iframe", "button", "img", "svg"
]
).await?;
println!("Results: {contents:#?}");
*/
}
Err(e) => eprintln!("Search error: {e}")
}
// stop search engine:
engine.stop().await?;
Ok(())
}
use web_parser::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
// READ PAGE AS HTML DOCUMENT:
// read website page:
let mut doc = Document::read("https://example.com/", User::random()).await?;
// select title:
let title = doc.select("h1")?.expect("No elements found");
println!("Title: '{}'", title.text());
// select descriptions:
let mut descrs = doc.select_all("p")?.expect("No elements found");
while let Some(descr) = descrs.next() {
println!("Description: '{}'", descr.text())
}
// READ PAGE AS PLAIN TEXT:
let text: String = Document::text("https://example.com/", User::random()).await?;
println!("Text: {text}");
// READ PAGE AS JSON:
let json: serde_json::Value = Document::json("https://example.com/", User::random()).await?.expect("Failed to parse JSON");
println!("Json: {json}");
Ok(())
}
Distributed under the MIT license.
You can find me here, also see my channel. I welcome your suggestions and feedback!
Copyright (c) 2025 Bulat Sh. (fuderis)