![](https://img.shields.io/crates/l/robots_txt.svg) [![crates.io](https://img.shields.io/crates/v/robots_txt)](https://crates.io/crates/robots_txt) [![crates.io](https://img.shields.io/crates/dv/robots_txt)](https://crates.io/crates/robots_txt) [![Build Status](https://travis-ci.org/alexander-irbis/robots_txt.svg)](https://travis-ci.org/alexander-irbis/robots_txt) ![Minimal rust version 1.36](https://img.shields.io/badge/stable-1.36+-green.svg) ![Nightly rust version from March 30, 2020](https://img.shields.io/badge/nightly-2020--03--30-yellow.svg) # robots_txt **robots_txt is a lightweight robots.txt parser and generator for robots.txt written in Rust.** Nothing extra. * [Documentation](https://docs.rs/robots_txt) ### Unstable The implementation is WIP. ## Installation Robots_txt is [available on crates.io](https://crates.io/crates/robots_txt) and can be included in your Cargo enabled project like this: Cargo.toml: ```toml [dependencies] robots_txt = "0.7" ``` ### Parsing & matching paths against rules ```rust use robots_txt::Robots; static ROBOTS: &'static str = r#" # robots.txt for http://www.site.com User-Agent: * Disallow: /cyberworld/map/ # this is an infinite virtual URL space # Cybermapper knows where to go User-Agent: cybermapper Disallow: "#; fn main() { let robots = Robots::from_str(ROBOTS); let matcher = SimpleMatcher::new(&robots.choose_section("NoName Bot").rules); assert!(matcher.check_path("/some/page")); assert!(matcher.check_path("/cyberworld/welcome.html")); assert!(!matcher.check_path("/cyberworld/map/object.html")); let matcher = SimpleMatcher::new(&robots.choose_section("Mozilla/5.0; CyberMapper v. 3.14").rules); assert!(matcher.check_path("/some/page")); assert!(matcher.check_path("/cyberworld/welcome.html")); assert!(matcher.check_path("/cyberworld/map/object.html")); } ``` ### Building & rendering main.rs: ```rust extern crate robots_txt; use robots_txt::Robots; fn main() { let robots1 = Robots::builder() .start_section("cybermapper") .disallow("") .end_section() .start_section("*") .disallow("/cyberworld/map/") .end_section() .build(); let conf_base_url: Url = "https://example.com/".parse().expect("parse domain"); let robots2 = Robots::builder() .host(conf_base_url.domain().expect("domain")) .start_section("*") .disallow("/private") .disallow("") .crawl_delay(4.5) .request_rate(9, 20) .sitemap("http://example.com/sitemap.xml".parse().unwrap()) .end_section() .build(); println!("# robots.txt for http://cyber.example.com/\n\n{}", robots1); println!("# robots.txt for http://example.com/\n\n{}", robots2); } ``` As a result we get ``` # robots.txt for http://cyber.example.com/ User-agent: cybermapper Disallow: User-agent: * Disallow: /cyberworld/map/ # robots.txt for http://example.com/ User-agent: * Disallow: /private Disallow: Crawl-delay: 4.5 Request-rate: 9/20 Sitemap: http://example.com/sitemap.xml Host: example.com ``` ## Alternatives * [messense/robotparser-rs](https://github.com/messense/robotparser-rs) robots.txt parser for Rust ## License Licensed under either of * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0) * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT) at your option. ### Contribution Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.