| Crates.io | scrape-cli |
| lib.rs | scrape-cli |
| version | 0.2.0 |
| created_at | 2026-01-16 16:22:07.933538+00 |
| updated_at | 2026-01-20 01:47:39.842896+00 |
| description | Command-line HTML extraction tool powered by scrape-rs |
| homepage | |
| repository | https://github.com/bug-ops/scrape-rs |
| max_upload_size | |
| id | 2048870 |
| size | 79,426 |
10-50x faster HTML extraction from command line. Rust-powered, shell-friendly.
cargo install scrape-cli
Download from GitHub Releases:
# macOS (Apple Silicon)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-darwin-aarch64.tar.gz | tar xz
# macOS (Intel)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-darwin-x86_64.tar.gz | tar xz
# Linux (x86_64)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-linux-x86_64.tar.gz | tar xz
# Linux (ARM64)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-linux-aarch64.tar.gz | tar xz
[!IMPORTANT] Requires Rust 1.88 or later when building from source.
# Extract h1 text from file
scrape 'h1' page.html
# Extract from stdin
curl -s example.com | scrape 'title'
# Extract links as JSON
scrape -o json 'a[href]' page.html
# Extract text content
scrape 'h1' page.html
# Output: Welcome to Our Site
# Extract attribute value
scrape -a href 'a.nav-link' page.html
# Output: /home
# /about
# /contact
# First match only
scrape -1 'p' page.html
# Output: First paragraph text
# Plain text (default)
scrape 'h1' page.html
# Output: Hello World
# JSON
scrape -o json 'a[href]' page.html
# Output: ["Link 1","Link 2"]
# Pretty JSON
scrape -o json -p 'a' page.html
# HTML fragments
scrape -o html 'div.content' page.html
# CSV (requires named selectors)
scrape -o csv -s name='td:nth-child(1)' -s price='td:nth-child(2)' table.html
# Output: name,price
# "Product A","$10.00"
# Extract multiple fields
scrape -o json \
-s title='h1' \
-s links='a[href]' \
-s images='img[src]' \
page.html
# Output: {"title":["Page Title"],"links":[...],"images":[...]}
# Process multiple files (parallel by default)
scrape 'h1' pages/*.html
# Output: page1.html: Welcome
# page2.html: About Us
# page3.html: Contact
# Control parallelism
scrape -j 4 'h1' pages/*.html
[!TIP] Batch processing uses all CPU cores by default. Use
-j Nto limit threads.
# NUL delimiter for xargs
scrape -0 -a href 'a' page.html | xargs -0 -I{} curl {}
# Suppress errors
scrape -q 'h1' *.html 2>/dev/null
# Disable filename prefix
scrape --no-filename 'h1' *.html
| Option | Short | Description |
|---|---|---|
--output FORMAT |
-o |
Output format: text, json, html, csv |
--select NAME=SEL |
-s |
Named selector extraction |
--attribute ATTR |
-a |
Extract attribute instead of text |
--first |
-1 |
Return only first match |
--pretty |
-p |
Pretty-print JSON output |
--null |
-0 |
Use NUL delimiter (for xargs) |
--color MODE |
-c |
Colorize: auto, always, never |
--parallel N |
-j |
Parallel threads for batch |
--quiet |
-q |
Suppress error messages |
--with-filename |
-H |
Always show filename prefix |
--no-filename |
Never show filename prefix |
v0.2.0 includes significant improvements:
| Code | Meaning |
|---|---|
| 0 | Success, matches found |
| 1 | No matches found |
| 2 | Runtime error (invalid selector, I/O error) |
| 4 | Argument validation error |
Powered by battle-tested libraries from the Servo browser engine: html5ever (HTML5 parser) and selectors (CSS selector engine).
| Platform | Package |
|---|---|
| Rust | scrape-core |
| Python | fast-scrape |
| Node.js | @fast-scrape/node |
| WASM | @fast-scrape/wasm |
MIT OR Apache-2.0