| Crates.io | similarity |
| lib.rs | similarity |
| version | 0.2.0 |
| created_at | 2024-01-22 08:46:30.311098+00 |
| updated_at | 2025-07-02 20:46:51.793782+00 |
| description | A comprehensive Rust library for calculating similarity metrics between vectors, collections, and spectral data with both functional and trait-based APIs |
| homepage | |
| repository | https://github.com/um-univie/similarity |
| max_upload_size | |
| id | 1108474 |
| size | 156,701 |
A comprehensive Rust library for calculating similarity metrics between vectors, collections, and spectral data. Features both functional and trait-based APIs with optional parallel processing and FFT optimizations.
The library is organized into three main trait categories:
Similarity<InputType, OutputType>For comparing multiple entities and computing similarity or distance metrics:
EntropyMeasure<InputType, OutputType>For analyzing single entities with information-theoretic measures:
DataTransform<InputType, OutputType>For preprocessing and transforming data:
Add to your Cargo.toml:
[dependencies]
similarity = "0.2.0"
use similarity::*;
use similarity::similarity_traits::*;
use similarity::entropy_traits::*;
use similarity::transform_traits::*;
use std::collections::HashSet;
// Similarity between vectors
let a = [1.0, 2.0, 3.0];
let b = [2.0, 4.0, 6.0];
let cosine_sim = CosineSimilarity::similarity((&a, &b));
let euclidean_dist = EuclideanDistance::similarity((&a, &b));
// Set similarity
let mut set1 = HashSet::new();
set1.extend([1, 2, 3]);
let mut set2 = HashSet::new();
set2.extend([2, 3, 4]);
let jaccard = JaccardIndex::similarity((&set1, &set2));
// Entropy of spectral data
let spectrum = Spectrum::from_peaks(vec![
Peak { mz: 100.0, intensity: 0.6 },
Peak { mz: 200.0, intensity: 0.4 },
]);
let shannon_entropy = ShannonEntropy::entropy(&spectrum);
let tsallis_entropy = TsallisEntropy::entropy((&spectrum, 2.0));
// Data transformation
let mzs = [100.0, 200.0, 300.0];
let intensities = [0.5, 0.3, 0.2];
let transformed = WeightFactorTransformation::transform((&mzs, &intensities, 0.5, 2.0));
use similarity::*;
// All original functions remain available
let cosine_sim = cosine_similarity(&a, &b);
let euclidean_dist = euclidean_distance(&a, &b);
let entropy = calculate_entropy(&spectrum);
[dependencies]
similarity = { version = "0.2.0", features = ["parallel", "fft"] }
parallel: Enables Rayon-based parallel processing for large datasetsfft: Enables FFT-based optimizations for cross-correlation and convolutionThe trait-based API has zero performance overhead compared to direct function calls:
// These are equivalent in performance:
let result1 = cosine_similarity(&a, &b);
let result2 = CosineSimilarity::similarity((&a, &b));
For large datasets (10K+ elements), use the optimized variants:
let large_a: Vec<f64> = (0..100_000).map(|i| i as f64).collect();
let large_b: Vec<f64> = (0..100_000).map(|i| i as f64 * 1.1).collect();
// 2-3x faster for large vectors
let result = CosineSimilarityOptimized::similarity((&large_a, &large_b));
// Even faster with parallel processing
let result = CosineSimilarityParallel::similarity((&large_a, &large_b));
Run the comprehensive demo:
cargo run --example trait_demo
This demonstrates all available trait implementations with:
// For comparing two entities
pub trait Similarity<InputType, OutputType> {
fn similarity(input: InputType) -> OutputType;
}
// For analyzing single entities
pub trait EntropyMeasure<InputType, OutputType> {
fn entropy(input: InputType) -> OutputType;
}
// For transforming data
pub trait DataTransform<InputType, OutputType> {
fn transform(input: InputType) -> OutputType;
}
Similarity Traits:
CosineSimilarity, CosineSimilarityOptimized, CosineSimilarityParallelCosineDistance, CosineDistanceOptimized, CosineDistanceParallelEuclideanDistance, SquaredEuclideanDistancePearsonCorrelationDistance, PearsonCorrelationDistanceOptimized, PearsonCorrelationDistanceParallelJaccardIndexHitRate, OvershootRateCrossCorrelationOptimized, CrossCorrelationParallel, CrossCorrelationFFTOptimizedTimeShiftFinder, TimeShiftFinderFFTEntropySimilarity, EntropySimilarityOptimizedEntropy Traits:
ShannonEntropy, ShannonEntropyOptimizedTsallisEntropy, TsallisEntropyOptimizedTransform Traits:
WeightFactorTransformation, WeightFactorTransformationOptimizedThis project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome. Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.