scrape-cli

Crates.io	scrape-cli
lib.rs	scrape-cli
version	0.2.0
created_at	2026-01-16 16:22:07.933538+00
updated_at	2026-01-20 01:47:39.842896+00
description	Command-line HTML extraction tool powered by scrape-rs
homepage
repository	https://github.com/bug-ops/scrape-rs
max_upload_size
id	2048870
size	79,426

Andrei G (bug-ops)

documentation

README

scrape-cli

10-50x faster HTML extraction from command line. Rust-powered, shell-friendly.

Installation

cargo install scrape-cli

Pre-built binaries

Download from GitHub Releases:

# macOS (Apple Silicon)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-darwin-aarch64.tar.gz | tar xz

# macOS (Intel)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-darwin-x86_64.tar.gz | tar xz

# Linux (x86_64)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-linux-x86_64.tar.gz | tar xz

# Linux (ARM64)
curl -L https://github.com/bug-ops/scrape-rs/releases/latest/download/scrape-linux-aarch64.tar.gz | tar xz

[!IMPORTANT] Requires Rust 1.88 or later when building from source.

Quick start

# Extract h1 text from file
scrape 'h1' page.html

# Extract from stdin
curl -s example.com | scrape 'title'

# Extract links as JSON
scrape -o json 'a[href]' page.html

Usage

Basic extraction

# Extract text content
scrape 'h1' page.html
# Output: Welcome to Our Site

# Extract attribute value
scrape -a href 'a.nav-link' page.html
# Output: /home
#         /about
#         /contact

# First match only
scrape -1 'p' page.html
# Output: First paragraph text

Output formats

# Plain text (default)
scrape 'h1' page.html
# Output: Hello World

# JSON
scrape -o json 'a[href]' page.html
# Output: ["Link 1","Link 2"]

# Pretty JSON
scrape -o json -p 'a' page.html

# HTML fragments
scrape -o html 'div.content' page.html

# CSV (requires named selectors)
scrape -o csv -s name='td:nth-child(1)' -s price='td:nth-child(2)' table.html
# Output: name,price
#         "Product A","$10.00"

Named selectors

# Extract multiple fields
scrape -o json \
  -s title='h1' \
  -s links='a[href]' \
  -s images='img[src]' \
  page.html
# Output: {"title":["Page Title"],"links":[...],"images":[...]}

Batch processing

# Process multiple files (parallel by default)
scrape 'h1' pages/*.html
# Output: page1.html: Welcome
#         page2.html: About Us
#         page3.html: Contact

# Control parallelism
scrape -j 4 'h1' pages/*.html

[!TIP] Batch processing uses all CPU cores by default. Use -j N to limit threads.

Pipeline integration

# NUL delimiter for xargs
scrape -0 -a href 'a' page.html | xargs -0 -I{} curl {}

# Suppress errors
scrape -q 'h1' *.html 2>/dev/null

# Disable filename prefix
scrape --no-filename 'h1' *.html

Options

Option	Short	Description
`--output FORMAT`	`-o`	Output format: text, json, html, csv
`--select NAME=SEL`	`-s`	Named selector extraction
`--attribute ATTR`	`-a`	Extract attribute instead of text
`--first`	`-1`	Return only first match
`--pretty`	`-p`	Pretty-print JSON output
`--null`	`-0`	Use NUL delimiter (for xargs)
`--color MODE`	`-c`	Colorize: auto, always, never
`--parallel N`	`-j`	Parallel threads for batch
`--quiet`	`-q`	Suppress error messages
`--with-filename`	`-H`	Always show filename prefix
`--no-filename`		Never show filename prefix

Performance

v0.2.0 includes significant improvements:

SIMD-accelerated parsing — 2-10x faster class selector matching on large documents
Batch parallelization — Scales near-linearly with thread count when processing multiple files
Zero-copy serialization — 50-70% memory reduction in output generation

Exit codes

Code	Meaning
0	Success, matches found
1	No matches found
2	Runtime error (invalid selector, I/O error)
4	Argument validation error

Built on Servo

Powered by battle-tested libraries from the Servo browser engine: html5ever (HTML5 parser) and selectors (CSS selector engine).

Related packages

Platform	Package
Rust	`scrape-core`
Python	`fast-scrape`
Node.js	`@fast-scrape/node`
WASM	`@fast-scrape/wasm`

License

MIT OR Apache-2.0

Commit count: 56

scrape-cli

documentation

README

scrape-cli

Installation

Quick start

Usage

Options

Performance

Exit codes

Built on Servo

Related packages

License

cargo fmt