| Crates.io | scrapelect |
| lib.rs | scrapelect |
| version | 0.3.2 |
| created_at | 2024-08-04 00:35:17.231798+00 |
| updated_at | 2024-08-07 22:52:56.438707+00 |
| description | Interpreter for scrapelect, a CSS-inspired web scraping DSL |
| homepage | |
| repository | https://github.com/suaviloquence/scrapelect |
| max_upload_size | |
| id | 1324607 |
| size | 301,749 |
scrapelect is a web scraping language inspired by CSS that turns
a web page into structured JSON data. Select elements with CSS
selectors, apply filters to extract and modify the data you want from
a web page, and get the output in a structured, machine-readable,
interoperable format.
Install the Rust toolchain. Using cargo,
run:
$ cargo install scrapelect
to install the scrapelect interpreter.
Write a scrapelect program into a .scrp file. Documentation
for the language can be found in the scrapelect book.
A quick example, title.scrp, retrieves the title of a Wikipedia article:
title: .mw-page-title-main {
content: $element | text();
};
Run the scrp with the URL of the web page to scrape:
$ scrapelect title.scrp "https://en.wikipedia.org/wiki/Cat"
It will output:
{
"title": {
"content": "Cat"
}
}
scrapelect book
contains documentation on language concepts and how to write a scrapelect
program.scrapelectscrapelect book for more information on contributing to scrapelect.scrapelect is available under the MIT or Apache 2 licenses, at your
option. Copies of these licenses are included at
LICENSE-MIT and
LICENSE-APACHE
at the root directory.
scrapelect: scrape + select, also -lect