| Crates.io | rust_web_crawler |
| lib.rs | rust_web_crawler |
| version | 0.1.2 |
| created_at | 2025-07-23 16:12:06.354054+00 |
| updated_at | 2025-07-25 08:20:25.404172+00 |
| description | A short summary of what your crate does |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1764987 |
| size | 30,869 |
A high-performance, concurrent web crawler implemented in Rust that demonstrates different approaches to web crawling with varying levels of concurrency and synchronization.
rust_web_crawler/
โโโ src/
โ โโโ main.rs # Main implementation with three crawler variants
โโโ Cargo.toml # Project dependencies and metadata
โโโ Cargo.lock # Locked dependency versions
โโโ README.md # This file
serial_crawler() functionconcurrent_mutex_crawler() functionconcurrent_channel_crawler() functiontrait Fetcher: Send + Sync + 'static {
fn fetch(&self, url: &str) -> Result<Vec<String>, String>;
}
Clone the repository
git clone https://github.com/nabil-Tounarti/rust-web-crawler.git
cd rust-web-crawler
Build the project
cargo build
Run the crawler
cargo run
# Run all tests
cargo test
# Run tests with output
cargo test -- --nocapture
# Run specific test
cargo test test_serial_crawler_happy_path
The project demonstrates three different crawling approaches:
| Approach | Concurrency | Thread Safety | Performance | Complexity |
|---|---|---|---|---|
| Serial | None | N/A | Slow | Low |
| Mutex | High | Mutex-protected | Medium | Medium |
| Channel | High | Channel-based | Fast | High |
The project includes comprehensive tests covering:
Basic Functionality Tests
test_serial_crawler_happy_pathtest_concurrent_mutex_crawler_happy_pathtest_concurrent_channel_crawler_happy_pathError Handling Tests
test_start_with_nonexistent_urltest_crawler_with_fetch_errorEdge Case Tests
test_single_url_no_linksTo implement a real web crawler, create a new fetcher:
use reqwest;
struct HttpFetcher {
client: reqwest::Client,
}
impl Fetcher for HttpFetcher {
fn fetch(&self, url: &str) -> Result<Vec<String>, String> {
// Implement real HTTP fetching logic
// Parse HTML and extract links
// Return discovered URLs
}
}
Extend the project with configuration options:
struct CrawlerConfig {
max_depth: usize,
max_concurrent_requests: usize,
request_timeout: Duration,
user_agent: String,
}
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT License - see the LICENSE file for details.
This project serves as an excellent learning resource for:
Note: This is currently a demonstration project using a fake fetcher. For production use, implement a real HTTP fetcher with proper error handling and rate limiting.