Crates.io | scrapelect |
lib.rs | scrapelect |
version | 0.3.2 |
source | src |
created_at | 2024-08-04 00:35:17.231798 |
updated_at | 2024-08-07 22:52:56.438707 |
description | Interpreter for scrapelect, a CSS-inspired web scraping DSL |
homepage | |
repository | https://github.com/suaviloquence/scrapelect |
max_upload_size | |
id | 1324607 |
size | 301,749 |
scrapelect
is a web scraping language inspired by CSS that turns
a web page into structured JSON data. Select elements with CSS
selectors, apply filters to extract and modify the data you want from
a web page, and get the output in a structured, machine-readable,
interoperable format.
Install the Rust toolchain. Using cargo
,
run:
$ cargo install scrapelect
to install the scrapelect
interpreter.
Write a scrapelect
program into a .scrp
file. Documentation
for the language can be found in the scrapelect
book.
A quick example, title.scrp
, retrieves the title of a Wikipedia article:
title: .mw-page-title-main {
content: $element | text();
};
Run the scrp
with the URL of the web page to scrape:
$ scrapelect title.scrp "https://en.wikipedia.org/wiki/Cat"
It will output:
{
"title": {
"content": "Cat"
}
}
scrapelect
book
contains documentation on language concepts and how to write a scrapelect
program.scrapelect
scrapelect
book for more information on contributing to scrapelect
.scrapelect
is available under the MIT or Apache 2 licenses, at your
option. Copies of these licenses are included at
LICENSE-MIT and
LICENSE-APACHE
at the root directory.
scrapelect: scrape + select, also -lect