alphabet_detector

Crates.ioalphabet_detector
lib.rsalphabet_detector
version0.9.1
created_at2025-01-31 11:48:47.163386+00
updated_at2025-07-29 15:46:50.82148+00
descriptionNatural language alphabet detection library
homepage
repositoryhttps://github.com/RoDmitry/alphabet_detector
max_upload_size
id1537452
size347,618
Dmitry Rodionov (RoDmitry)

documentation

https://docs.rs/alphabet_detector/

README

Alphabet Detector

Crate API

Detects 402 alphabets of 325 languages in 170 scripts

One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage (language + script)

Does not have any models, just matches the alphabet. Not recommended to use as a standalone detector. It's more like a word separator + language prefilter for an actual language detector (Langram).

Splits text (iterator CharIndices) to words, and detects ScriptLanguages (language + script) of words by used letters (chars).

Extras

Look at the alphabets.rs to understand what languages have already defined alphabets. Some of them need validation.

Warning: can return words with chars from the Unicode private area (for example Lingala, Nuer or Yoruba languages), because of char normalization (composition with Inherited), and there are no such chars defined in Unicode.

Commit count: 104

cargo fmt