nom-psl

Crates.ionom-psl
lib.rsnom-psl
version1.2.0
sourcesrc
created_at2018-10-25 16:35:52.630271
updated_at2019-06-03 16:45:03.008851
descriptionFast public suffix list domain parsing, written in nom
homepagehttps://github.com/dwerner/nom-psl
repositoryhttps://github.com/dwerner/nom-psl
max_upload_size
id92608
size232,645
Daniel N. Werner (dwerner)

documentation

README

Crates.io

Faster public suffix domain parsing.

The scope of this library is limited to finding the tld+1 of a given domain from the public suffix list.

Approach:

  • Load public suffix list entries into memory
  • Match immutable, owned values of domains to be parsed
  • Leverage a user-sized lru cache for entries

Goals:

  • provide (mostly) compliant public suffix domain parsing.
  • avoid allocations during domain parsing.
  • offload as much work as possible to parsing stage.
  • avoid depedencies that might themselves bring unwanted baggage
  • inputs are not mutated, outputs are slices of inputs

Caveats:

  • still rely on idna crate for punycode parsing
  • we don't lower-case anything (for performance we ignore this)

Environment Variables

PUBLIC_SUFFIX_LIST_FILE=somefile - override which file will be loaded in place of public_suffix_list.dat

Example:

lazy_static! {
    static ref LIST: List = {
        let list = List::parse_source_file("public_suffix_list.dat", 10_000_000);
        list.expect("unable to parse PSL file")
    };
}

...

fn foo() {
    let domain = "abc.one.two.example.co.uk";
    let tldp1 = LIST.parse_domain(domain);
    
    assert_eq!(tldp1, Some("example.co.uk"));
}

TODO:

  • benchmarks
Commit count: 13

cargo fmt