Crates.io | n_gram |
lib.rs | n_gram |
version | 0.1.12 |
source | src |
created_at | 2024-02-26 07:39:02.504758 |
updated_at | 2024-06-10 13:23:01.885122 |
description | Simple library for training n-gram language models. |
homepage | |
repository | https://github.com/georgiyozhegov/n_gram.git |
max_upload_size | |
id | 1153133 |
size | 23,016 |
Simple tool for training n-gram language model. Inspired by this course.
use n_gram::*;
fn main() {
// Initializing model
let config = Config::default();
let mut model = Model::new(config);
// Loading and tokenizing corpus
let corpus = tiny_corpus()
.iter()
.map(|t| sos(eos(tokenize(t.to_owned()))))
.collect::<Vec<_>>();
model.train(corpus);
// Now you are ready to generate something
let mut tokens = sos(tokenize("The quick".to_owned()));
let max = 10; // max number of generated tokens
model.generate(&mut tokens, max);
// Save model
model.save("model.json").unwrap();
// Reset model
model.reset();
// Load model back
model.load("model.json").unwrap();
}
I've trained a trigram model on 20000 samples from the Tiny Stories dataset. Here are some examples of generated text:
cargo add n_gram
If you want to save & load your models:
cargo add n_gram --features=saveload
If you want to load tiny corpus for training:
cargo add n_gram --features=corpus