Crates.io | robotxt |
lib.rs | robotxt |
version | 0.6.1 |
source | src |
created_at | 2023-03-11 04:55:53.438591 |
updated_at | 2024-03-07 15:59:56.707101 |
description | The implementation of the Robots.txt (or URL exclusion) protocol with the support of crawl-delay, sitemap and universal match extensions. |
homepage | https://github.com/spire-rs/kit/exclusion |
repository | https://github.com/spire-rs/kit/exclusion |
max_upload_size | |
id | 807035 |
size | 70,519 |
Also check out other spire-rs
projects
here.
The implementation of the robots.txt (or URL exclusion) protocol in the Rust
programming language with the support of crawl-delay
, sitemap
and universal
*
match extensions (according to the RFC specification).
parser
to enable robotxt::{Robots}
. Enabled by default.builder
to enable robotxt::{RobotsBuilder, GroupBuilder}
. Enabled by
default.optimal
to optimize overlapping and global rules, potentially improving
matching speed at the cost of longer parsing times.serde
to enable serde::{Deserialize, Serialize}
implementation, allowing
the caching of related rules.user-agent
in the provided robots.txt
file:use robotxt::Robots;
fn main() {
let txt = r#"
User-Agent: foobot
Disallow: *
Allow: /example/
Disallow: /example/nope.txt
"#;
let r = Robots::from_bytes(txt.as_bytes(), "foobot");
assert!(r.is_relative_allowed("/example/yeah.txt"));
assert!(!r.is_relative_allowed("/example/nope.txt"));
assert!(!r.is_relative_allowed("/invalid/path.txt"));
}
robots.txt
file in a declarative manner:use robotxt::RobotsBuilder;
fn main() -> Result<(), url::ParseError> {
let txt = RobotsBuilder::default()
.header("Robots.txt: Start")
.group(["foobot"], |u| {
u.crawl_delay(5)
.header("Rules for Foobot: Start")
.allow("/example/yeah.txt")
.disallow("/example/nope.txt")
.footer("Rules for Foobot: End")
})
.group(["barbot", "nombot"], |u| {
u.crawl_delay(2)
.disallow("/example/yeah.txt")
.disallow("/example/nope.txt")
})
.sitemap("https://example.com/sitemap_1.xml".try_into()?)
.sitemap("https://example.com/sitemap_1.xml".try_into()?)
.footer("Robots.txt: End");
println!("{}", txt.to_string());
Ok(())
}
Host
directive is not supported.