| Crates.io | tessera-embeddings |
| lib.rs | tessera-embeddings |
| version | 0.1.0 |
| created_at | 2025-10-23 03:07:29.739631+00 |
| updated_at | 2025-10-23 03:07:29.739631+00 |
| description | Multi-paradigm embedding library: ColBERT, dense, sparse, vision-language, and time series models |
| homepage | https://github.com/tomWhiting/tessera |
| repository | https://github.com/tomWhiting/tessera |
| max_upload_size | |
| id | 1896535 |
| size | 4,222,758 |
tessera (noun, plural: tesserae) — A small block of stone, tile, glass, or other material used in the creation of a mosaic. From Latin tessera, meaning "a square tablet or die."
A multi-paradigm embedding library that combines five distinct approaches to semantic representation into a unified, production-ready framework.
Tessera provides state-of-the-art text and document embeddings through five complementary paradigms: dense single-vector embeddings for semantic similarity, multi-vector token embeddings for precise phrase matching, sparse learned representations for interpretable keyword search, vision-language embeddings for OCR-free document understanding, and probabilistic time series forecasting. The library supports 23+ production models with native GPU acceleration on Metal (Apple Silicon) and CUDA (NVIDIA GPUs), comprehensive batch processing, binary quantization for 32x compression, and seamless Rust + Python support via PyO3.
cargo add tessera
pip install tessera-embeddings
Or with UV:
uv add tessera-embeddings
use tessera::TesseraDense;
// Create embedder and encode text
let embedder = TesseraDense::new("bge-base-en-v1.5")?;
let embedding = embedder.encode("Machine learning is a subset of artificial intelligence")?;
// Compute semantic similarity
let score = embedder.similarity(
"What is machine learning?",
"Machine learning is a subset of artificial intelligence"
)?;
println!("Similarity: {:.4}", score);
from tessera import TesseraDense
# Create embedder and encode text
embedder = TesseraDense("bge-base-en-v1.5")
embedding = embedder.encode("Machine learning is a subset of artificial intelligence")
# Compute semantic similarity
score = embedder.similarity(
"What is machine learning?",
"Machine learning is a subset of artificial intelligence"
)
print(f"Similarity: {score:.4f}")
Tessera implements five fundamentally different approaches to semantic representation, each optimized for specific use cases.
Dense embeddings compress text into a single fixed-size vector through pooling operations over transformer hidden states. This approach excels at capturing broad semantic meaning and enables efficient similarity search through cosine distance or dot product. Tessera includes models from BGE, Nomic, GTE, Qwen, and Jina with dimensions ranging from 384 to 4096.
Use cases: Semantic search, clustering, topic modeling, recommendation systems.
Multi-vector embeddings preserve token-level granularity by representing each token as an independent vector. Similarity is computed through late interaction using MaxSim, which finds the maximum similarity between any query token and document token. This approach enables precise phrase matching and is particularly effective for information retrieval tasks.
Use cases: Precise search, question answering, passage retrieval, academic search.
Sparse embeddings map text to the vocabulary space, producing interpretable keyword-like representations with 99% sparsity. Each dimension corresponds to a token in the vocabulary, enabling efficient inverted index search while maintaining learned semantic expansion through contextualized term weights.
Use cases: Interpretable search, hybrid retrieval, keyword expansion, legal/medical search.
Vision-language embeddings enable OCR-free document understanding by encoding images and PDFs directly at the patch level. The model processes visual content through a vision transformer and projects patches into the same embedding space as text queries, enabling late interaction search over visual documents containing tables, figures, and handwriting.
Use cases: Document search, invoice processing, diagram retrieval, visual question answering.
Chronos Bolt provides zero-shot probabilistic forecasting through continuous-time embeddings of time series data. The model generates forecasts with nine quantile levels, enabling uncertainty quantification and risk-aware decision making without requiring task-specific fine-tuning.
Use cases: Demand forecasting, anomaly detection, capacity planning, financial prediction.
| Feature | Dense | Multi-Vector | Sparse | Vision | Time Series |
|---|---|---|---|---|---|
| Representation | Single vector | Token vectors | Vocabulary weights | Patch vectors | Temporal quantiles |
| Similarity Metric | Cosine/Dot | MaxSim | Dot product | MaxSim | N/A |
| Interpretability | Low | Medium | High | Medium | High |
| Speed | Fastest | Fast | Medium | Slow | Medium |
| Memory | Smallest | Small | Large | Large | Small |
| Precision | Good | Excellent | Good | Excellent | N/A |
| Quantization | No | Yes (32x) | No | No | No |
| Best For | Broad semantics | Exact phrases | Keywords | Visual docs | Forecasting |
Tessera provides 23 production-ready models across five paradigms:
Multi-Vector (9 models)
Dense (8 models)
Sparse (4 models)
Vision-Language (2 models)
Time Series (1 model, more coming)
Note: Additional models are being added regularly. Check models.json for the current list.
Tessera achieves competitive performance with state-of-the-art embedding libraries while providing unique capabilities through its multi-paradigm approach.
| Operation | Time | Throughput |
|---|---|---|
| Dense encoding (batch=1) | 8ms | 125 docs/sec |
| Dense encoding (batch=32) | 45ms | 711 docs/sec |
| ColBERT encoding (batch=1) | 12ms | 83 docs/sec |
| ColBERT encoding (batch=32) | 78ms | 410 docs/sec |
| Sparse encoding (batch=1) | 15ms | 67 docs/sec |
| Quantization (binary) | 0.3ms | 3,333 ops/sec |
Tessera models achieve strong performance on standard benchmarks:
Binary quantization for multi-vector embeddings provides 32x compression with minimal quality degradation:
Tessera automatically selects the best available compute device with intelligent fallback:
Models are loaded once and cached for efficient repeated encoding. Enable GPU support with Cargo features:
cargo add tessera --features metal # Apple Silicon
cargo add tessera --features cuda # NVIDIA GPUs
cargo add tessera # CPU only (default)
All embedders support batch operations that provide 5-10x throughput improvements over sequential encoding:
let embedder = TesseraDense::new("bge-base-en-v1.5")?;
let texts = vec!["text1", "text2", "text3"];
let embeddings = embedder.encode_batch(&texts)?; // Much faster than individual encodes
Selected models support variable embedding dimensions without model reloading, enabling trade-offs between quality and storage:
let embedder = TesseraMultiVector::builder()
.model("jina-colbert-v2")
.dimension(96) // Flexible dimension selection
.build()?;
Tessera uses the factory pattern with type-safe builders that prevent mismatched operations at compile time:
let dense_embedder = TesseraDense::new("bge-base-en-v1.5")?; // Dense embeddings
let multi_embedder = TesseraMultiVector::new("colbert-v2")?; // Multi-vector embeddings
let sparse_embedder = TesseraSparse::new("splade-cocondenser")?; // Sparse embeddings
// Type system prevents accidental mixing of different embedding types
Seamless NumPy interoperability for Python users without loss of performance:
from tessera import TesseraDense
import numpy as np
embedder = TesseraDense("bge-base-en-v1.5")
embedding = embedder.encode("text") # Returns NumPy array
embeddings = embedder.encode_batch(["text1", "text2"]) # Batch processing
Tessera includes two comprehensive Marimo notebooks that demonstrate the library's capabilities through interactive visualizations:
Compare dense, multi-vector, and sparse embeddings on the same dataset with UMAP dimensionality reduction. The notebook includes interactive query search showing how different paradigms represent and retrieve similar documents.
uv run marimo edit examples/notebooks/embedding_comparison.py
Explore zero-shot time series forecasting with Chronos Bolt through interactive controls for dataset selection and context length. The notebook visualizes prediction intervals and quantile distributions for uncertainty-aware forecasting.
uv run marimo edit examples/notebooks/timeseries_forecasting.py
All embedders support a builder pattern for advanced configuration:
use tessera::TesseraMultiVector;
use tessera::backends::candle::Device;
use tessera::quantization::QuantizationConfig;
let embedder = TesseraMultiVector::builder()
.model("jina-colbert-v2")
.device(Device::Cpu)
.dimension(96)
.quantization(QuantizationConfig::Binary)
.build()?;
Search across PDF documents and images without OCR:
use tessera::TesseraVision;
let vision = TesseraVision::new("colpali-v1.3-hf")?;
let score = vision.search_document(
"What is the total amount?",
"invoice.pdf"
)?;
Generate forecasts with uncertainty quantification:
from tessera import TesseraTimeSeries
import numpy as np
forecaster = TesseraTimeSeries("chronos-bolt-small")
context = np.random.randn(1, 2048).astype(np.float32)
# Point forecast (median)
forecast = forecaster.forecast(context)
# Full quantile distribution
quantiles = forecaster.forecast_quantiles(context)
q10, q50, q90 = quantiles[0, :, 0], quantiles[0, :, 4], quantiles[0, :, 8]
Tessera is built on Candle, a minimalist ML framework for Rust that provides efficient tensor operations and model inference. The library uses zero-copy operations where possible, implements comprehensive error handling with structured error types, and maintains a clear separation between model loading, encoding, and similarity computation.
All embeddings use float32 precision by default, with optional binary quantization for multi-vector embeddings. Models are downloaded automatically from HuggingFace Hub on first use and cached locally for subsequent runs.
The library includes 103 Rust tests covering all embedding paradigms, model loading, quantization, and error handling. Python bindings include comprehensive integration tests validating NumPy interoperability and error propagation.
# Run Rust tests
cargo test --all-features
# Run Python tests
uv run tests/python/test_python_bindings.py
Licensed under the Apache License, Version 2.0. You may obtain a copy of the license at http://www.apache.org/licenses/LICENSE-2.0.
If you use Tessera in your research, please cite:
@software{tessera2025,
title={Tessera: Multi-Paradigm Embedding Library},
author={Tessera Contributors},
year={2025},
url={https://github.com/tomWhiting/tessera}
}
Contributions are welcome. Please open an issue to discuss proposed changes before submitting pull requests. All contributions will be licensed under Apache 2.0.
Tessera builds on research and models from: ColBERT (Stanford NLP), BGE (BAAI), Nomic AI, Alibaba GTE, Qwen, Jina AI, SPLADE (Naver Labs), ColPali (Illuin/Vidore), and Chronos (Amazon Science).
Before publishing to crates.io and PyPI, verify: