| Crates.io | xml_collected |
| lib.rs | xml_collected |
| version | 0.1.0 |
| created_at | 2025-01-26 10:07:00.877769+00 |
| updated_at | 2025-01-26 10:07:00.877769+00 |
| description | A Rust library to fetch and parse XML sitemaps, discover sub-sitemaps, and save extracted URLs to a file. |
| homepage | https://github.com/MarkoDragnic/xml_collected |
| repository | https://github.com/MarkoDragnic/xml_collected |
| max_upload_size | |
| id | 1531221 |
| size | 20,371 |
A Rust library to fetch and parse XML sitemaps from URLs, discover sub-sitemaps, and store the extracted URLs into a specified file.
<loc> tags.<loc> tags.To use this library in your Rust project, add the following to your Cargo.toml:
[dependencies]
quick-xml = "0.37.2"
reqwest = { version = "0.12", features = ["blocking", "json"] }
Below is an example of how to use the library in your project.
use xml_collected::fetch_and_parse_sitemap;
#[tokio::main]
async fn main() {
let sitemap_url = "https://example.com/sitemap.xml";
let output_path = "urls.txt";
// Fetch and parse the sitemap, saving URLs to a file
if let Err(err) = fetch_and_parse_sitemap(sitemap_url, output_path).await {
eprintln!("Error: {}", err);
} else {
println!("URLs have been saved to {}", output_path);
}
}
This is the main function of the library. It fetches the sitemap XML from a given URL, parses it, and stores all discovered URLs in a specified file. Arguments:
sitemap_url: A string containing the URL to the sitemap XML.
output_file_path: A string containing the path where the URLs should be saved.
Return:
Returns a Result<(), Box<dyn std::error::Error>>, where Ok(()) means the operation was successful, and Err contains an error message if something goes wrong.
async fn fetch_and_parse_sitemap(sitemap_url: &str, output_file_path: &str) -> Result<(), Box<dyn std::error::Error>> {
// Your function implementation here
}
This function will fetch the sitemap from the given URL, recursively parse all URLs (including sub-sitemaps), and store the URLs in the provided output file. How It Works:
If any error occurs during fetching or parsing, the function will print an error message to the standard error. Contributing
If you find bugs or have ideas for improvements, feel free to open an issue or submit a pull request. Contributions are welcome!
This project is licensed under the MIT License - see the LICENSE file for details.