# web-grep ## What this? Grep for HTML or XML. ```bash $ echo 'Hello' | web-grep '{}' Hello ``` ```bash $ echo 'Hello' | web-grep '{html}' --json {"html":"Hello"} ``` ```bash # List up all

-innerHTML $ cat << EOM | web-grep '

{}

'

hello

world

EOM hello world ``` ```bash # filtering with attributes $ cat << EOM | web-grep '

{}

'

hello

world

EOM world ``` ```bash # Place-holder {} can be attribute $ cat << EOM | web-grep '

world

'

hello

world

EOM here ``` ## How this? This is just a CLI for an awesome library, [tanakh/easy-scraper](https://github.com/tanakh/easy-scraper). ## Installation 1. Install cargo - Recommended Way: Install [rustup](https://rustup.rs/) 1. Then, - `cargo install web-grep` ## Usage ```bash $ web-grep [INPUT] ``` The `QUERY` is a HTML (XML) Pattern. Patterns are valid HTML structures which has placeholders for innerHTMLs or attributes. `web-grep` has various placeholders for cases. ## Placeholders ### Anonymous Palceholder `{}` If you need exact one placeholer in the pattern, use `{}`. ```html

{}

``` ```html

{}

``` `web-grep` outputs all texts matching for `{}`. ```bash $ echo "

1

2

3

" | web-grep "

{}

" 1 2 3 ``` ### Numbered Placeholders `{n}` ```html {2} ``` `web-grep` outputs matched texts for `{1}`, `{2}`... in order, separated by `\t`. ```bash $ echo 'fuga' | web-grep "{1}" fuga hoge ``` The delimiter can be specified with `-F`. ```bash $ echo 'fuga' | web-grep "{1}" -F ' ' fuga hoge ``` ### Named Placeholders `{xxx}` ```html {innerHTML} ``` The output can be formatted as JSON with `--json`. ```bash $ echo 'fuga' | web-grep "{html}" --json {"href":"hoge","html":"fuga"} ```