| Crates.io | robotxt |
| lib.rs | robotxt |
| version | 0.6.1 |
| created_at | 2023-03-11 04:55:53.438591+00 |
| updated_at | 2024-03-07 15:59:56.707101+00 |
| description | The implementation of the Robots.txt (or URL exclusion) protocol with the support of crawl-delay, sitemap and universal match extensions. |
| homepage | https://github.com/spire-rs/kit/exclusion |
| repository | https://github.com/spire-rs/kit/exclusion |
| max_upload_size | |
| id | 807035 |
| size | 70,519 |
Also check out other spire-rs projects
here.
The implementation of the robots.txt (or URL exclusion) protocol in the Rust
programming language with the support of crawl-delay, sitemap and universal
* match extensions (according to the RFC specification).
parser to enable robotxt::{Robots}. Enabled by default.builder to enable robotxt::{RobotsBuilder, GroupBuilder}. Enabled by
default.optimal to optimize overlapping and global rules, potentially improving
matching speed at the cost of longer parsing times.serde to enable serde::{Deserialize, Serialize} implementation, allowing
the caching of related rules.user-agent in the provided robots.txt file:use robotxt::Robots;
fn main() {
let txt = r#"
User-Agent: foobot
Disallow: *
Allow: /example/
Disallow: /example/nope.txt
"#;
let r = Robots::from_bytes(txt.as_bytes(), "foobot");
assert!(r.is_relative_allowed("/example/yeah.txt"));
assert!(!r.is_relative_allowed("/example/nope.txt"));
assert!(!r.is_relative_allowed("/invalid/path.txt"));
}
robots.txt file in a declarative manner:use robotxt::RobotsBuilder;
fn main() -> Result<(), url::ParseError> {
let txt = RobotsBuilder::default()
.header("Robots.txt: Start")
.group(["foobot"], |u| {
u.crawl_delay(5)
.header("Rules for Foobot: Start")
.allow("/example/yeah.txt")
.disallow("/example/nope.txt")
.footer("Rules for Foobot: End")
})
.group(["barbot", "nombot"], |u| {
u.crawl_delay(2)
.disallow("/example/yeah.txt")
.disallow("/example/nope.txt")
})
.sitemap("https://example.com/sitemap_1.xml".try_into()?)
.sitemap("https://example.com/sitemap_1.xml".try_into()?)
.footer("Robots.txt: End");
println!("{}", txt.to_string());
Ok(())
}
Host directive is not supported.