A high-performance transformation library for Rust, used by Spider Cloud for AI-powered content cleaning across multiple locales.
This project depends on the spider crate.
[dependencies]
spider_transformations = "2"use spider_transformations::transformation::content;
fn main() {
// page comes from the spider object when streaming.
let mut conf = content::TransformConfig::default();
conf.return_format = content::ReturnFormat::Markdown;
let content = content::transform_content(&page, &conf, &None, &None, &None);
}- Markdown
- Commonmark
- Text
- Markdown (Text Map) or HTML2Text
- HTML2XML
Convert Office documents directly to markdown with zero panics and zero locks:
- Excel (.xlsx)
- Word (.docx)
- PowerPoint (.pptx)
Enable with:
[dependencies]
spider_transformations = { version = "2", features = ["document"] }Document conversion is automatic — binary files matching Office formats are detected and converted to markdown tables and text. No configuration needed beyond enabling the feature.
- Readability
- Encoding
There are several chunking utils in the transformation mod.
This project has rewrites and forks of html2md, and html2text for performance and bug fixes.
MIT