| Crates.io | topolog-embed |
| lib.rs | topolog-embed |
| version | 0.1.1 |
| created_at | 2025-12-23 10:41:06.168626+00 |
| updated_at | 2025-12-23 10:58:33.544594+00 |
| description | Topology-preserving embeddings in Rust: whitening, parametric UMAP, ... with Lance storage. |
| homepage | |
| repository | https://github.com/tuned-org-uk/topolog-embeddings |
| max_upload_size | |
| id | 2001291 |
| size | 259,620 |
Topology-preserving embeddings in Rust, built on top of the Burn 0.18 deep learning framework, with first-class Lance/genegraph-storage support for efficient columnar persistence. This crate focuses on preserving the roughness and smoothness of the original space (whitening, parametric UMAP) without flattening operations like aggressive normalization.
.lance or .parquet files via genegraph-storage’s load_dense_from_file.Vec<Vec<f64>> for in-memory experimentation.In Cargo.toml:
[dependencies]
topolog-embeddings = "0.1"
burn = { version = "0.18", default-features = false, features = ["std", "train"] }
log = "0.4"
env_logger = "0.11"
Enable the backend you want (CPU by default).
This usage example generates synthetic data, applies ZCA whitening, and logs statistics before and after the transform so you can verify mean-centering and variance normalization.
use burn::tensor::{Distribution, Tensor};
use log::info;
use topolog_embeddings::{
backend::{AutoBackend, get_device},
topolog::{Whitening, WhiteningConfig, WhiteningMethod},
};
fn main() {
// Initialize logger (RUST_LOG=info cargo run --example 02_whitening_demo)
env_logger::Builder::from_env(
env_logger::Env::default().default_filter_or("info")
).init();
info!("🎨 Whitening Transform Demonstration");
let device = get_device();
let n = 2048;
let d = 128;
info!("Generating random input: {} samples, {} dims", n, d);
let x: Tensor<AutoBackend, 2> =
Tensor::random([n, d], Distribution::Default, &device);
info!("Input shape: {:?}", x.dims());
// Pre-whitening global mean
let mean_before = x.clone().mean_dim(0);
let mean_scalar = mean_before.clone().mean().into_scalar().elem::<f32>();
info!("Global mean before whitening: {:.6}", mean_scalar);
// Apply ZCA whitening
let whitener = Whitening::new(WhiteningConfig {
eps: 1e-5,
method: WhiteningMethod::Zca,
});
info!("Applying ZCA whitening…");
let xw = whitener.forward(x);
info!("Whitened shape: {:?}", xw.dims());
// Post-whitening global mean
let mean_after = xw.clone().mean_dim(0);
let mean_scalar_after = mean_after.clone().mean().into_scalar().elem::<f32>();
info!("Global mean after whitening: {:.6}", mean_scalar_after);
}
Whitening uses a symmetric inverse square-root of the covariance matrix (via a Newton–Schulz–style iteration) to decorrelate features while keeping the embedding in the original coordinate system, rather than compressing to a latent basis. This matches the intent described in the crate’s whitening module.
This example applies whitening and then feeds the whitened features into a parametric UMAP MLP to obtain a low-dimensional embedding.
use burn::tensor::{Distribution, Tensor};
use log::info;
use topolog_embeddings::{
backend::{AutoBackend, get_device},
topolog::{ParametricUmap, ParametricUmapConfig, Whitening, WhiteningConfig},
};
fn main() {
env_logger::Builder::from_env(
env_logger::Env::default().default_filter_or("info")
).init();
info!("🚀 Topological Embedding Pipeline (Whitening + Parametric UMAP)");
let device = get_device();
// Synthetic input
let n = 1024;
let d = 256;
info!("Generating input data: {} samples, {} dims", n, d);
let x: Tensor<AutoBackend, 2> =
Tensor::random([n, d], Distribution::Default, &device);
info!("Input shape: {:?}", x.dims());
// Whitening
let whitener = Whitening::new(WhiteningConfig::default());
info!("Applying whitening…");
let xw = whitener.forward(x);
info!("Whitened shape: {:?}", xw.dims());
// Parametric UMAP model
let cfg = ParametricUmapConfig {
in_dim: d,
hidden_dim: 512,
out_dim: 16,
};
let model = ParametricUmap::<AutoBackend>::init(&cfg, &device);
info!(
"Parametric UMAP: Input({}) -> Hidden({}) -> Output({})",
cfg.in_dim, cfg.hidden_dim, cfg.out_dim
);
// Low-dimensional embedding
info!("Projecting to low-dimensional space…");
let z = model.forward(xw);
info!("Embedding shape: {:?}", z.dims());
info!("First embedding row: {}", z.slice([0..1, 0..cfg.out_dim]));
}
This layout (whitening → parametric encoder) matches the purpose of the crate: preserve topology in the original feature space and then learn a parametric mapping that retains these structures in a lower-dimensional embedding.
The crate also exposes helpers to get Burn tensors from either Vec<Vec<f64>> or from .lance/.parquet files via genegraph-storage’s load_dense_from_file implementation, which supports both Lance and Parquet layouts.[^1]
use std::path::Path;
use topolog_embeddings::{
backend::{AutoBackend, get_device},
data::{load_from_file, load_from_vec},
};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let device = get_device();
// 1) From file (.lance or .parquet)
let path = Path::new("data/my_dataset.lance");
let x_file = load_from_file::<AutoBackend>(path, &device).await?;
println!("Loaded from file: {:?}", x_file.dims());
// 2) From Vec<Vec<f64>>
let dense: Vec<Vec<f64>> = vec![
vec![0.1, 0.4, 0.5],
vec![0.4, 0.5, 0.2],
vec![0.03, 0.8, 0.56],
];
let x_vec = load_from_vec::<AutoBackend>(dense, &device)?;
println!("Loaded from vec: {:?}", x_vec.dims());
Ok(())
}
Internally, the file loader constructs a temporary LanceStorage and uses load_dense_from_file to support both Lance and Parquet formats, then converts the resulting column-major DenseMatrix<f64> into a row-major Tensor<f32> used by Burn.