spider_transformations

Crates.iospider_transformations
lib.rsspider_transformations
version2.37.109
created_at2024-09-21 11:37:35.50831+00
updated_at2025-07-08 18:54:26.262879+00
descriptionTransformation utils to use for spider
homepagehttps://github.com/spider-rs/spider_transformations
repositoryhttps://github.com/spider-rs/spider_transformations
max_upload_size
id1382144
size297,066
Jeff Mendez (j-mendez)

documentation

README

spider_transformations

A high-performance transformation library for Rust, used by Spider Cloud for AI-powered content cleaning across multiple locales.

This project depends on the spider crate.

Usage

[dependencies]
spider_transformations = "2"
use spider_transformations::transformation::content;

fn main() {
    // page comes from the spider object when streaming.
    let mut conf = content::TransformConfig::default();
    conf.return_format = content::ReturnFormat::Markdown;
    let content = content::transform_content(&page, &conf, &None, &None);
}

Transform types

  1. Markdown
  2. Commonmark
  3. Text
  4. Markdown (Text Map) or HTML2Text
  5. WIP: HTML2XML

Enhancements

  1. Readability
  2. Encoding

Chunking

There are several chunking utils in the transformation mod.

This project has rewrites and forks of html2md, and html2text for performance and bug fixes.

License

MIT

Commit count: 1

cargo fmt