| Crates.io | html2json |
| lib.rs | html2json |
| version | 0.5.7 |
| created_at | 2026-01-01 05:56:41.365683+00 |
| updated_at | 2026-01-25 17:32:17.511429+00 |
| description | HTML to JSON extractor |
| homepage | |
| repository | https://github.com/qretaio/html2json |
| max_upload_size | |
| id | 2015808 |
| size | 188,749 |
A Rust port of cheerio-json-mapper.
npm install @qretaio/html2json
cargo install html2json --features cli
cargo install --path . --features cli
# or from a git repository
cargo install --git https://github.com/qretaio/html2json --features cli
just install
import { extract } from "@qretaio/html2json";
const html = `
<article class="post">
<h2>My Article</h2>
<p class="author">John Doe</p>
<div class="tags">
<span>rust</span>
<span>wasm</span>
</div>
</article>
`;
const spec = {
title: "h2",
author: ".author",
tags: [
{
$: ".tags span",
name: "$",
},
],
};
const result = await extract(html, spec);
console.log(result);
// {
// "title": "My Article",
// "author": "John Doe",
// "tags": [{"name": "rust"}, {"name": "wasm"}]
// }
# Extract from file
html2json examples/hn.html --spec examples/hn.json
# Extract from stdin (pipe from curl)
curl -s https://news.ycombinator.com/ | html2json --spec examples/hn.json
# Extract from stdin (pipe from cat)
cat examples/hn.html | html2json --spec examples/hn.json
# Check output matches expected JSON (useful for testing/CI)
html2json examples/hn.html --spec examples/hn.json --check expected.json
--spec, -s <FILE> - Path to JSON extractor spec file (required)--check, -c <FILE> - Compare output against expected JSON file. Exits with 0 if match, 1 if differ (with colored diff).The spec is a JSON object where each key defines an output field and each value defines a CSS selector to extract that field.
{
"title": "h1",
"description": "p.description"
}
{
"link": "a.main | attr:href",
"image": "img.hero | attr:src"
}
{
"title": "h1 | trim",
"slug": "h1 | lower | regex:\\s+-",
"price": ".price | regex:\\$(\\d+\\.\\d+) | parseAs:int"
}
Available pipes:
trim - Trim whitespacelower - Convert to lowercaseupper - Convert to uppercasesubstr:start:end - Extract substringregex:pattern - Regex capture (first group)parseAs:int - Parse as integerparseAs:float - Parse as floatattr:name - Get attribute valuevoid - Extract from void elements, useful for extracting xml{
"items": [
{
"$": ".item",
"title": "h2",
"description": "p"
}
]
}
$ selector){
"$": "article",
"title": "h1",
"paragraphs": ["p"]
}
||){
"title": "h1.main || h1.fallback || h1"
}
?){
"title": "h1",
"description?": "p.description"
}
Optional fields that return null are removed from the output.
MIT