# parsercher
[![Crate](https://img.shields.io/crates/v/parsercher.svg)](https://crates.io/crates/parsercher)
[![API](https://img.shields.io/badge/api-3.1.6-green.svg)](https://docs.rs/parsercher)
**Parses and searches Tag documents. (e.g. HTML, XML)**
parsercher parses documents written in tags such as HTML and XML.
- Create a Dom structure tree from the tag document.
- Search for tags and text from the Dom structure tree.
- Search subtrees from the Dom structure tree.
## Usage
Add this to your `Cargo.toml`:
```
[dependencies]
parsercher = "3.1.6"
```
## License
[MIT](./LICENSE-MIT) OR [Apache-2.0](./LICENSE-APACHE)
## Examples
**Example of getting text from HTML.**
Create a tree of Dom structure from HTML and get the text of `li` tag that value of `class` attribute is `target`.
```rust
use parsercher;
use parsercher::dom::Tag;
let html = r#"
sample html
- first
- second
- therd
"#;
if let Ok(root_dom) = parsercher::parse(&html) {
let mut needle = Tag::new("li");
needle.set_attr("class", "target");
if let Some(texts) = parsercher::search_text_from_tag_children(&root_dom, &needle) {
assert_eq!(texts.len(), 2);
assert_eq!(texts[0], "first".to_string());
assert_eq!(texts[1], "therd".to_string());
}
}
```
**Example of searching a subtree from the Dom structure tree.**
Find a subtree that has a `ul` tag whose value in the `class` attribute is `targetList` and
two `li` tags under it. Also, the values of the `class` attribute of the `li` tag must be
`key1` and` key2`, respectively.
Looking for:
```text
```
```rust
use parsercher;
let doc = r#"
"#;
let root_dom = parsercher::parse(&doc).unwrap();
let needle = r#"
"#;
let result = root_dom.search(&needle).unwrap().unwrap();
for dom in result.iter() {
parsercher::print_dom_tree(&dom);
}
```
output:
```text
-
TEXT: "1-1"
-
TEXT: "1-2"
-
TEXT: "3-1"
-
TEXT: "3-2"
-
TEXT: "3-3"
```
**More complex examples of Dom structure tree**
```rust
use parsercher;
let html = r#"
sample html
Hello, world!
- first
- second
- therd
"#;
if let Ok(dom) = parsercher::parse(&html) {
println!("{:#?}", dom);
}
```
output:
```
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "root",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "!DOCTYPE",
attr: Some(
{
"html": "",
},
),
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: None,
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "html",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "head",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "meta",
attr: Some(
{
"charset": "UTF-8",
},
),
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: None,
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "title",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Text,
tag: None,
text: Some(
Text {
text: "sample html",
},
),
comment: None,
children: None,
},
],
),
},
],
),
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "body",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "h1",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Text,
tag: None,
text: Some(
Text {
text: "Hello, world!",
},
),
comment: None,
children: None,
},
],
),
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "div",
attr: Some(
{
"id": "content",
},
),
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: None,
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "ol",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "li",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Text,
tag: None,
text: Some(
Text {
text: "first",
},
),
comment: None,
children: None,
},
],
),
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "li",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Text,
tag: None,
text: Some(
Text {
text: "second",
},
),
comment: None,
children: None,
},
],
),
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "li",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Text,
tag: None,
text: Some(
Text {
text: "therd",
},
),
comment: None,
children: None,
},
],
),
},
],
),
},
Dom {
dom_type: Comment,
tag: None,
text: None,
comment: Some(
Comment {
comment: " All script code becomes one text ",
},
),
children: None,
},
Dom {
dom_type: Tag,
tag: Some(
Tag {
name: "script",
attr: None,
terminated: false,
terminator: false,
},
),
text: None,
comment: None,
children: Some(
[
Dom {
dom_type: Text,
tag: None,
text: Some(
Text {
text: "\n let content = document.getElementById(\'content\');\n content.textContent = \'content\';\n",
},
),
comment: None,
children: None,
},
],
),
},
],
),
},
],
),
},
],
),
}
```