Crates.io | markov_str |
lib.rs | markov_str |
version | |
source | src |
created_at | 2024-08-14 14:13:52.00757+00 |
updated_at | 2025-01-03 06:55:36.115899+00 |
description | Markov Chain implementation optimized for text generation. |
homepage | |
repository | https://github.com/Brogolem35/markov_str |
max_upload_size | |
id | 1337387 |
Cargo.toml error: | TOML parse error at line 18, column 1 | 18 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
markov_str is a fast and memory efficient Markov Chain implementation, optimized for text generation.
serialize
flag is used.let training_path = "data";
// Gets the paths of evey file and directory in the training_path.
let tpaths = fs::read_dir(training_path)
.unwrap_or_else(|_| panic!("Can't read files from: {}", training_path));
// Only the files remain
let files = tpaths
.filter_map(|f| f.ok())
.filter(|f| match f.file_type() {
Err(_) => false,
Ok(f) => f.is_file(),
});
// Reads every file into a string
let contents = files.filter_map(|f| read_to_string(f.path()).ok());
// Creating the Markov Chain
let markov_chain = contents.fold(
MarkovChain::with_capacity(2, 8_000_000, Regex::new(WORD_REGEX).unwrap()),
|mut a, s| {
a.add_text(&s);
a
},
);
// Number of tokens
println!("{}", markov_chain.len());
// Generation
for _ in 0..10 {
println!("{}", markov_chain.generate_start("among the ", 25).unwrap());
}
This example is taken from the examples/main.rs
, you can run it by:
./get_data.sh
cargo run --release --example=main
./get_data.sh
will download the first 200 books from Project Gutenberg, which totals up to more than 100MBs of text.
markov_str is licensed under the MPL-2.0 license. You can use it in both open-source software of different licenses and proprietary software as long as changes to the original code is shared under the same license.
Feel free to open issues and pull requests. If you want to help with what I am currently working on, take a look at the Stuff left to do section.