Crates.io | whatlang |
lib.rs | whatlang |
version | 0.16.4 |
source | src |
created_at | 2016-12-15 21:01:33.872021 |
updated_at | 2024-01-04 10:28:17.66793 |
description | Fast and lightweight language identification library for Rust. |
homepage | https://github.com/greyblake/whatlang-rs |
repository | https://github.com/greyblake/whatlang-rs |
max_upload_size | |
id | 7608 |
size | 675,528 |
Natural language detection for Rust with focus on simplicity and performance.
Example:
use whatlang::{detect, Lang, Script};
fn main() {
let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";
let info = detect(text).unwrap();
assert_eq!(info.lang(), Lang::Epo);
assert_eq!(info.script(), Script::Latin);
assert_eq!(info.confidence(), 1.0);
assert!(info.is_reliable());
}
For more details (e.g. how to blacklist some languages) please check the documentation.
Whatlang is used within the following big projects as direct or indirect dependency for language recognition. You're gonna be in a great company using Whatlang:
Feature | Description |
---|---|
enum-map |
Lang and Script implement Enum trait from enum-map |
arbitrary |
Support Arbitrary |
serde |
Implements Serialize and Deserialize for Lang and Script |
dev |
Enables whatlang::dev module which provides some internal API.It exists for profiling purposes and normal users are discouraged to to rely on this API. |
The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.
is_reliable
calculated?It is based on the following factors:
rate
in the code base.Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:
For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.
make bench
- run performance benchmarksmake doc
- generate and open docmake test
- run testsmake watch
- watch changes and run testsWhatlang | CLD2 | CLD3 | |
---|---|---|---|
Implementation language | Rust | C++ | C++ |
Languages | 68 | 83 | 107 |
Algorithm | trigrams | quadgrams | neural network |
Supported Encoding | UTF-8 | UTF-8 | ? |
HTML support | no | yes | ? |
You can support the project by donating NEAR tokens.
Our NEAR wallet address is whatlang.near
Whatlang is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.