html-filter

Crates.io	html-filter
lib.rs	html-filter
version	0.2.1
created_at	2025-03-13 22:12:40.798683+00
updated_at	2025-11-08 10:33:44.741259+00
description	Crate to parse, filter, search and edit an HTML file.
homepage
repository	https://github.com/t-webber/html-filter
max_upload_size
id	1591531
size	122,637

Tom Webber (t-webber)

documentation

README

HTML filter

This is a rust library that parses HTML source files and allows you to search in and filter this HTML with a specific set of rules.

This is a simple lightweight HTML parser, that converts an HTML file (in the &str format) to a tree representing the HTML tags and text.

Why use this crate ?

For HTML parsing and filtering
Lightweight and no dependencies
Public and accessible HTML tree representation
Easy interface to filter HTML

Getting started

You can install it with

cargo add html_filter

then us it like this:

use html_filter::*;

let html: &str = r#"
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>Html sample</title>
    </head>
    <body>
        <p>This is an html sample.</p>
    </body>
</html>
"#;

// Parse your HTML
let tree: Html = Html::parse(html).expect("Invalid HTML");

// Now you can use it!
// Beware, this doesn't always work as you can have
// difference ways to write the same HTML.
assert!(format!("{tree}") == html);

Find & filter

You can also use the find and filter methods to manage this HTML. To do this, you need to create your filtering options with the Filter type.

Filter

use html_filter::*;

let html: &str = r##"
  <section>
    <h1>Welcome to My Random Page</h1>
    <nav>
      <ul>
        <li><a href="/home">Home</a></li>
        <li><a href="/about">About</a></li>
        <li><a href="/services">Services</a></li>
        <li><a href="/contact">Contact</a></li>
      </ul>
    </nav>
  </section>
"##;

// Create your filter
let filter = Filter::new().tag_name("li");

// Parse your HTML
let filtered: Html = Html::parse(html).expect("Invalid HTML").filter(&filter);

// `filtered` contains the 4 `<li>`s from the above HTML string
if let Html::Vec(links) = filtered {
    assert!(links.len() == 4)
} else {
    unreachable!()
}

Find

The finder returns the first element that respects the filter:

use html_filter::*;

let html: &str = r##"
  <section>
    <h1>Welcome to My Random Page</h1>
    <nav>
      <ul>
        <li><a href="/home">Home</a></li>
        <li><a href="/about">About</a></li>
        <li><a href="/services">Services</a></li>
        <li><a href="/contact">Contact</a></li>
      </ul>
    </nav>
  </section>
"##;

// Create your filter
let filter = Filter::new().tag_name("a");

// Parse your HTML
let link: Html = Html::parse(html).expect("Invalid HTML").find(&filter);

// Check the result: link contains `<a href="/home">Home</a>`
if let Html::Tag { tag, child, .. } = link && let Html::Text(text) = *child {
    assert!(tag.as_name() == "a" && text == "Home");
} else {
    unreachable!()
}

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Disclaimer

[!IMPORTANT] Do not use this parser to check the syntax of your HTML code. Many HTML files are parsed without any errors by this parser, as the sole objective is to get a parsed version. Only breaking syntax errors raises errors.

Obviously, all valid HTML files work fine with this crate.

Commit count: 53