Crates.io | fast-distances |
lib.rs | fast-distances |
version | 0.0.1 |
source | src |
created_at | 2024-12-13 00:00:21.614558 |
updated_at | 2024-12-13 00:00:21.614558 |
description | A rust library to provide distances for multidimensional arrays |
homepage | https://github.com/eugenehp/fast-distances |
repository | https://github.com/eugenehp/fast-distances |
max_upload_size | |
id | 1481753 |
size | 158,933 |
Rust Similarity and Distance Metrics Library
This Rust package provides a wide range of functions for computing various distance and similarity metrics between vectors or points in a high-dimensional space. These metrics are widely used in fields such as machine learning, statistics, data science, and computational biology.
Each module in this package implements a specific distance or similarity measure, some with gradient computations for optimization tasks. Below is a list of available modules:
approx_log_gamma: Approximation of the logarithm of the Gamma function.
bray_curtis: Bray-Curtis dissimilarity, a measure for ecological distance.
bray_curtis_grad: Gradient of the Bray-Curtis dissimilarity.
canberra: Canberra distance, a city block-like metric with a normalization.
canberra_grad: Gradient of the Canberra distance.
chebyshev: Chebyshev distance (L∞ distance), the maximum distance along any coordinate axis.
chebyshev_grad: Gradient of the Chebyshev distance.
correlation: Pearson correlation coefficient, a measure of linear correlation between two vectors.
cosine: Cosine similarity, measuring the cosine of the angle between two vectors.
cosine_grad: Gradient of the cosine similarity.
dice: Dice coefficient, a similarity measure often used in bioinformatics.
euclidean: Euclidean distance, the straight-line distance between two points.
euclidean_grad: Gradient of the Euclidean distance.
hamming: Hamming distance, the number of differing positions between two strings of equal length.
haversine: Haversine distance, used to calculate the great-circle distance between two points on a sphere.
haversine_grad: Gradient of the Haversine distance.
hellinger: Hellinger distance, a measure for comparing probability distributions.
hellinger_grad: Gradient of the Hellinger distance.
hyperboloid_grad: Gradient of the hyperboloid distance, a metric on hyperbolic spaces.
jaccard: Jaccard similarity coefficient, a measure of the intersection between two sets divided by their union.
kulsinski: Kulsinski similarity coefficient, a distance measure for binary vectors.
ll_dirichlet: Log-Likelihood of the Dirichlet distribution, used for probabilistic comparison of Dirichlet-distributed data.
log_beta: Log of the Beta distribution, used in statistical modeling.
log_single_beta: Logarithmic computation of a single Beta distribution.
mahalanobis: Mahalanobis distance, a distance metric that accounts for correlations between variables.
mahalanobis_grad: Gradient of the Mahalanobis distance.
manhattan: Manhattan distance (L1 distance), the sum of the absolute differences between coordinates.
manhattan_grad: Gradient of the Manhattan distance.
matching: Matching distance, a similarity measure based on matching elements in two sets.
minkowski: Minkowski distance, a generalization of both Euclidean and Manhattan distances.
minkowski_grad: Gradient of the Minkowski distance.
poincare: Poincaré distance, used for hyperbolic spaces and geometries.
rogers_tanimoto: Rogers-Tanimoto similarity, a distance measure for binary data.
russellrao: Russell-Rao similarity, a measure for binary vectors.
sokal_michener: Sokal-Michener similarity, a metric for categorical data.
sokal_sneath: Sokal-Sneath similarity, another metric for categorical data.
standardised_euclidean: Standardized Euclidean distance, which normalizes the Euclidean distance by the variance.
standardised_euclidean_grad: Gradient of the standardized Euclidean distance.
weighted_minkowski: Weighted Minkowski distance, a variant of Minkowski with weightings for each dimension.
weighted_minkowski_grad: Gradient of the weighted Minkowski distance.
yule: Yule's coefficient, used to measure association between two binary vectors.
Add this package to your Cargo.toml to use it in your project:
[dependencies] fast-distances = "0.1" Usage
To use one of the available distance or similarity metrics, import the respective module in your Rust code:
use distances::{cosine, euclidean, manhattan};
fn main() {
let vector1 = vec![1.0, 2.0, 3.0];
let vector2 = vec![4.0, 5.0, 6.0];
// Compute cosine similarity
let cosine_sim = cosine(&vector1, &vector2);
println!("Cosine Similarity: {}", cosine_sim);
// Compute Euclidean distance
let euclidean_dist = euclidean(&vector1, &vector2);
println!("Euclidean Distance: {}", euclidean_dist);
// Compute Manhattan distance
let manhattan_dist = manhattan(&vector1, &vector2);
println!("Manhattan Distance: {}", manhattan_dist);
}
Contributions are welcome! If you'd like to contribute a new metric or improve an existing one, feel free to open an issue or a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
This package draws from many well-established distance and similarity metrics commonly used in data analysis, machine learning, and information retrieval.