Crates.io | nlprule |
lib.rs | nlprule |
version | 0.6.4 |
source | src |
created_at | 2021-01-17 10:41:28.588653 |
updated_at | 2021-04-24 09:41:23.226048 |
description | A fast, low-resource Natural Language Processing and Error Correction library. |
homepage | |
repository | https://github.com/bminixhofer/nlprule |
max_upload_size | 104857600 |
id | 343082 |
size | 349,311 |
A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based approach to NLP using resources from LanguageTool.
Install: pip install nlprule
Use:
from nlprule import Tokenizer, Rules
tokenizer = Tokenizer.load("en")
rules = Rules.load("en", tokenizer)
rules.correct("He wants that you send him an email.")
# returns: 'He wants you to send him an email.'
rules.correct("I can due his homework.")
# returns: 'I can do his homework.'
for s in rules.suggest("She was not been here since Monday."):
print(s.start, s.end, s.replacements, s.source, s.message)
# prints:
# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?
for sentence in tokenizer.pipe("A brief example is shown."):
for token in sentence:
print(
repr(token.text).ljust(10),
repr(token.span).ljust(10),
repr(token.tags).ljust(24),
repr(token.lemmas).ljust(24),
repr(token.chunks).ljust(24),
)
# prints:
# 'A' (0, 1) ['DT'] ['A', 'a'] ['B-NP-singular']
# 'brief' (2, 7) ['JJ'] ['brief'] ['I-NP-singular']
# 'example' (8, 15) ['NN:UN'] ['example'] ['E-NP-singular']
# 'is' (16, 18) ['VBZ'] ['be', 'is'] ['B-VP']
# 'shown' (19, 24) ['VBN'] ['show', 'shown'] ['I-VP']
# '.' (24, 25) ['.', 'PCT', 'SENT_END'] ['.'] ['O']
Recommended setup:
Cargo.toml
[dependencies]
nlprule = "<version>"
[build-dependencies]
nlprule-build = "<version>" # must be the same as the nlprule version!
build.rs
fn main() {
println!("cargo:rerun-if-changed=build.rs");
nlprule_build::BinaryBuilder::new(
&["en"],
std::env::var("OUT_DIR").expect("OUT_DIR is set when build.rs is running"),
)
.build()
.validate();
}
src/main.rs
use nlprule::{Rules, Tokenizer, tokenizer_filename, rules_filename};
fn main() {
let mut tokenizer_bytes: &'static [u8] = include_bytes!(concat!(
env!("OUT_DIR"),
"/",
tokenizer_filename!("en")
));
let mut rules_bytes: &'static [u8] = include_bytes!(concat!(
env!("OUT_DIR"),
"/",
rules_filename!("en")
));
let tokenizer = Tokenizer::from_reader(&mut tokenizer_bytes).expect("tokenizer binary is valid");
let rules = Rules::from_reader(&mut rules_bytes).expect("rules binary is valid");
assert_eq!(
rules.correct("She was not been here since Monday.", &tokenizer),
String::from("She was not here since Monday.")
);
}
nlprule
and nlprule-build
versions are kept in sync.
|Disambiguation rules| | |Grammar rules| | LT version | nlprule time | LanguageTool time | |
---|---|---|---|---|---|
English | 843 (100%) | 3725 (~ 85%) | 5.2 | 1 | 1.7 - 2.0 |
German | 486 (100%) | 2970 (~ 90%) | 5.2 | 1 | 2.4 - 2.8 |
Spanish | Experimental support. Not fully tested yet. |
See the benchmark issue for details.
Please submit a PR to add your project!
All credit for the resources used in nlprule goes to LanguageTool who have made a Herculean effort to create high-quality resources for Grammatical Error Correction and broader NLP.
nlprule is licensed under the MIT license or Apache-2.0 license, at your option.
The nlprule binaries (*.bin
) are derived from LanguageTool v5.2 and licensed under the LGPLv2.1 license. nlprule statically and dynamically links to these binaries. Under LGPLv2.1 ยง6(a) this does not have any implications on the license of nlprule itself.