Crates.io | yake-rust |
lib.rs | yake-rust |
version | 1.0.3 |
created_at | 2022-09-25 20:20:02.623387+00 |
updated_at | 2025-02-19 14:09:08.06752+00 |
description | Yake (Yet Another Keyword Extractor) in Rust |
homepage | |
repository | https://github.com/quesurifn/yake-rust |
max_upload_size | |
id | 673792 |
size | 404,985 |
Yake is a statistical keyword extractor. It weighs several factors such as acronyms, position in paragraph, capitalization, how many sentences the keyword appears in, stopwords, punctuation and more.
For Yake ✨keyphrase✨ is an n-gram (1-, 2-, 3-) not starting nor ending in a stopword, not having numbers and punctuation inside, without long and short terms, etc.
The input text is split into sentences and terms via the segtok crate. Yake assigns an importance score to each term in the text.
Eventually, the most important terms:
✨Keyphrases✨ are ranked in order of importance (most important first).
Duplicates are then detected by Levenshtein distance and removed.
use yake_rust::{get_n_best, Config, StopWords};
fn main() {
let text = include_str!("input.txt");
let config = Config { ngrams: 3, ..Config::default() };
let ignored = StopWords::predefined("en").unwrap();
let keywords = get_n_best(10, &text, &ignored, &config);
println!("{:?}", keywords);
}
By default, stopwords for all languages are included. However, you can choose to include only specific ones:
[dependencies]
yake-rust = { version = "*", default-features = false, features = ["en", "de"] }