Crates.io | stream_crawler |
lib.rs | stream_crawler |
version | 0.1.1 |
source | src |
created_at | 2024-07-06 18:23:01.57854 |
updated_at | 2024-07-13 09:10:23.670489 |
description | A crate for scraping web pages and extracting URLs and endpoints. |
homepage | https://github.com/KenmogneThimotee/rust-crawler |
repository | https://github.com/KenmogneThimotee/rust-crawler |
max_upload_size | |
id | 1294274 |
size | 15,492 |
stream-scraper
is a Rust crate that provides an asynchronous web crawling utility. It processes URLs, extracts content and child URLs, and handles retry attempts for failed requests. It uses the tokio
runtime for asynchronous operations and the reqwest
library for HTTP requests.
tokio
<a>
tags in HTMLAdd this to your Cargo.toml
:
[dependencies]
stream_crawler = "0.1.0"
tokio = { version = "1", features = ["full"] }
reqwest = { version = "0.11", features = ["json"] }
scraper = "0.12"
use stream_crawler::scrape;
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() {
let urls = vec![
String::from("https://www.google.com"),
String::from("https://www.twitter.com"),
];
let mut result_stream = scrape(urls, 3, 5, 10).await;
while let Some(data) = result_stream.next().await {
println!("Processed URL: {:?}", data);
}
}
scrape
function :ProcessedUrl
structures.ProcessedUrl
structure :<a>
tags.This example demonstrates how to use the scrape
function to process a list of URLs.
use stream_crawler::scrape;
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() {
let urls = vec![
String::from("https://www.google.com"),
String::from("https://www.twitter.com"),
];
let mut result_stream = scrape(urls, 3, 5, 10).await;
while let Some(data) = result_stream.next().await {
println!("Processed URL: {:?}", data);
}
}
Refer to the inline documentation for detailed usage and examples.
ProcessedUrl
#[derive(Debug, PartialEq)]
pub struct ProcessedUrl {
pub parent: Option<String>,
pub url: String,
pub content: String,
pub children: Vec<String>,
}
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the MIT License.
This README.md
provides an overview of the crate, its features, installation instructions, and usage examples. You can customize it further based on your specific requirements.