| Crates.io | vectradb-chunkers |
| lib.rs | vectradb-chunkers |
| version | 0.1.0 |
| created_at | 2025-10-26 12:51:29.482983+00 |
| updated_at | 2025-10-26 12:51:29.482983+00 |
| description | Chunking utilities for VectraDB in Rust |
| homepage | https://github.com/Amrithesh-Kakkoth/VectraDB |
| repository | https://github.com/Amrithesh-Kakkoth/VectraDB |
| max_upload_size | |
| id | 1901372 |
| size | 95,885 |
A comprehensive Rust library for intelligent text chunking with multiple strategies optimized for different content types and use cases.
Optimized for general text documents with paragraph and sentence-based chunking.
use vectradb_chunkers::{create_chunker, ChunkingConfig};
let chunker = create_chunker("document");
let config = ChunkingConfig {
max_chunk_size: 1000,
overlap_size: 100,
preserve_semantics: true,
include_metadata: true,
custom_delimiters: None,
};
let chunks = chunker.chunk(text, &config)?;
Features:
Specialized for source code with structure-aware chunking.
let chunker = create_chunker("code");
let chunks = chunker.chunk(code_text, &config)?;
Features:
Optimized for markdown documents with heading hierarchy preservation.
let chunker = create_chunker("markdown");
let chunks = chunker.chunk(markdown_text, &config)?;
Features:
Advanced chunking for production environments with quality optimization.
use vectradb_chunkers::{ProductionChunker, ProductionConfig, ChunkingStrategy};
let chunker = ProductionChunker::new();
let config = ProductionConfig {
strategy: ChunkingStrategy::Adaptive,
min_chunk_size: 200,
max_chunk_size: 2000,
quality_threshold: 0.7,
enable_quality_scoring: true,
enable_dynamic_sizing: true,
preserve_context: true,
context_window_size: 100,
..Default::default()
};
let chunks = chunker.chunk_with_production_config(text, &config)?;
Features:
let config = ChunkingConfig {
max_chunk_size: 1000, // Maximum characters per chunk
overlap_size: 100, // Overlap between chunks
preserve_semantics: true, // Respect content boundaries
include_metadata: true, // Include rich metadata
custom_delimiters: None, // Custom boundary delimiters
};
let config = ProductionConfig {
base_config: ChunkingConfig::default(),
strategy: ChunkingStrategy::Adaptive,
min_chunk_size: 200,
max_chunk_size: 2000,
quality_threshold: 0.7,
enable_quality_scoring: true,
enable_dynamic_sizing: true,
preserve_context: true,
context_window_size: 100,
};
Each chunk contains:
Production chunking includes comprehensive quality scoring:
use vectradb_chunkers::{create_chunker, ChunkingConfig};
let chunker = create_chunker("document");
let config = ChunkingConfig::default();
let chunks = chunker.chunk("Your text here...", &config)?;
for chunk in chunks {
println!("Chunk: {}", chunk.content);
println!("Metadata: {:?}", chunk.metadata);
}
use vectradb_chunkers::{ProductionChunker, ProductionConfig, ChunkingStrategy};
let chunker = ProductionChunker::new();
let config = ProductionConfig {
strategy: ChunkingStrategy::Adaptive,
quality_threshold: 0.8,
enable_quality_scoring: true,
..Default::default()
};
let chunks = chunker.chunk_with_production_config(text, &config)?;
// Filter chunks by quality
let high_quality_chunks: Vec<_> = chunks.iter()
.filter(|chunk| {
chunk.metadata.get("overall_quality")
.and_then(|q| q.parse::<f64>().ok())
.map(|score| score >= 0.8)
.unwrap_or(false)
})
.collect();
regex: Pattern matching for content analysispulldown-cmark: Markdown parsingtree-sitter: Code parsing (optional, for advanced code analysis)serde: Serialization supportanyhow: Error handlingRun the examples to see all chunking methods in action:
cargo run --example chunking_examples
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.