| Crates.io | rag-umap |
| lib.rs | rag-umap |
| version | 0.0.0 |
| created_at | 2025-09-14 23:22:31.21629+00 |
| updated_at | 2025-09-14 23:22:31.21629+00 |
| description | A simple Rust implementation of UMAP for dimensionality reduction |
| homepage | |
| repository | https://github.com/richardanaya/rag-umap |
| max_upload_size | |
| id | 1839252 |
| size | 84,829 |
A pure Rust implementation of UMAP (Uniform Manifold Approximation and Projection) for dimension reduction, based on the paper by Leland McInnes, John Healy, and James Melville.
Add this to your Cargo.toml:
[dependencies]
rag-umap = "0.1.0"
use rag_umap::{convert_to_2d, convert_to_3d};
// Your high-dimensional embeddings data
let embeddings = vec![
vec![1.0, 2.0, 3.0, 4.0, 5.0],
vec![2.0, 3.0, 4.0, 5.0, 6.0],
vec![3.0, 4.0, 5.0, 6.0, 7.0],
// ... more data points
];
// Convert to 2D using UMAP
let embedding_2d = convert_to_2d(embeddings.clone())?;
println!("2D embedding: {:?}", embedding_2d);
// Convert to 3D using UMAP
let embedding_3d = convert_to_3d(embeddings)?;
println!("3D embedding: {:?}", embedding_3d);
The conversion functions accept any type that implements Into<f64> + Copy:
// Works with integers
let int_embeddings = vec![vec![1, 2, 3], vec![4, 5, 6]];
let result = convert_to_2d(int_embeddings)?;
// Works with floats
let float_embeddings = vec![vec![1.0f32, 2.0f32], vec![3.0f32, 4.0f32]];
let result = convert_to_2d(float_embeddings)?;
The library uses optimized default parameters internally:
| Parameter | Description | Default |
|---|---|---|
n_neighbors |
Number of nearest neighbors | 15 (or data size - 1 if smaller) |
n_components |
Target embedding dimension | 2 for convert_to_2d, 3 for convert_to_3d |
min_dist |
Minimum distance between points in embedding | 0.1 |
n_epochs |
Number of optimization epochs | 200 |
negative_sample_rate |
Negative samples per positive sample | 5 |
spread |
Spread parameter for low-dimensional representation | 1.0 |
local_connectivity |
Local connectivity parameter | 1.0 |
repulsion_strength |
Repulsion strength parameter | 1.0 |
See the examples/ directory for complete examples:
cargo run --example basic_usage
This implementation includes several optimizations:
UMAP is based on:
The algorithm constructs a high-dimensional fuzzy topological representation of the data, then optimizes a low-dimensional representation to match this structure using cross-entropy minimization.
| Method | Local Structure | Global Structure | Scalability | Embedding Dimension |
|---|---|---|---|---|
| UMAP | ✓ | ✓ | High | Any |
| t-SNE | ✓ | Limited | Medium | Typically 2-3 |
| PCA | Limited | ✓ | High | Any |
| Isomap | ✓ | ✓ | Low | Any |
This implementation is based on the UMAP paper:
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426.
Contributions are welcome!