| Crates.io | latentdirichletallocation |
| lib.rs | latentdirichletallocation |
| version | 0.1.0 |
| created_at | 2025-12-19 00:56:15.388972+00 |
| updated_at | 2025-12-19 00:56:15.388972+00 |
| description | A Rust implementation of Latent Dirichlet Allocation (LDA) using collapsed Gibbs sampling for topic modeling. |
| homepage | |
| repository | https://github.com/vulogov/LatentDirichletAllocation |
| max_upload_size | |
| id | 1993939 |
| size | 120,661 |
A Rust implementation of Latent Dirichlet Allocation (LDA) using collapsed Gibbs sampling for topic modeling.
This crate provides:
k (topics), alpha, betaθ)φ)flowchart TD
A[Raw Documents] --> B[Tokenization & Stopword Removal]
B --> C[Vocabulary Construction]
C --> D[Convert Docs to Word IDs]
D --> E[LDA Model Initialization]
E --> F[Gibbs Sampling Iterations]
F --> G[Compute θ and φ]
G --> H[Top Words per Topic]
LDA assumes:
We integrate out $ heta $ and $ \phi $, sampling topic assignments $ z $ using:
$$ p(z_{di} = t \mid \mathbf{z}{-di}, \mathbf{w}) \propto ig( n{d,t}^{-di} + lpha ig) \cdot rac{ n_{t,w}^{-di} + eta }{ n_t^{-di} + V eta } $$
Where:
Document-topic distribution: $$ heta_{d,t} = rac{ n_{d,t} + lpha }{ N_d + K lpha } $$
Topic-word distribution: $$ \phi_{t,w} = rac{ n_{t,w} + eta }{ n_t + V eta } $$
use lda::Lda;
fn main() {
let docs = [
"Rust is a systems programming language",
"Topic modeling with LDA in Rust",
"Gibbs sampling for probabilistic models"
];
let mut model = Lda::from_documents(3, 0.1, 0.01, &docs, 42);
model.train(500);
println!("Top words per topic:");
for (topic_id, words) in model.topics(5) {
println!("Topic {}: {:?}", topic_id, words);
}
}
iters for better convergence.alpha and beta for your dataset.