Crates.io | spider_transformations |
lib.rs | spider_transformations |
version | 2.21.20 |
source | src |
created_at | 2024-09-21 11:37:35.50831 |
updated_at | 2024-12-12 19:25:13.885604 |
description | Transformation utils to use for Spider Web Crawler. |
homepage | |
repository | https://github.com/spider-rs/spider-transformations |
max_upload_size | |
id | 1382144 |
size | 207,067 |
The Rust spider cloud transformation library built for performance, AI, and multiple locales. The library is used on Spider Cloud for data cleaning.
[dependencies]
spider_transformations = "0"
use spider_transformations::transformation::content;
fn main() {
// page comes from the spider object when streaming.
let conf = content::TransformConfig::default();
let content = content::transform_content(&page, &conf, &None, &None);
}
There are several chunking utils in the transformation mod.
This project has rewrites and forks of html2md, and html2text for performance and bug fixes.
MIT