# Vaporetto Vaporetto is a fast and lightweight pointwise prediction based tokenizer. ## Examples ```rust use std::fs::File; use vaporetto::{Model, Predictor, Sentence}; let f = File::open("../resources/model.bin")?; let model = Model::read(f)?; let predictor = Predictor::new(model, true)?; let mut buf = String::new(); let mut s = Sentence::default(); s.update_raw("まぁ社長は火星猫だ")?; predictor.predict(&mut s); s.fill_tags(); s.write_tokenized_text(&mut buf); assert_eq!( "まぁ/名詞/マー 社長/名詞/シャチョー は/助詞/ワ 火星/名詞/カセー 猫/名詞/ネコ だ/助動詞/ダ", buf, ); s.update_raw("まぁ良いだろう")?; predictor.predict(&mut s); s.fill_tags(); s.write_tokenized_text(&mut buf); assert_eq!( "まぁ/副詞/マー 良い/形容詞/ヨイ だろう/助動詞/ダロー", buf, ); ``` ## Feature flags The following features are disabled by default: * `kytea` - Enables the reader for models generated by KyTea. * `train` - Enables the trainer. * `portable-simd` - Uses the [portable SIMD API](https://github.com/rust-lang/portable-simd) instead of our SIMD-conscious data layout. (Nightly Rust is required.) The following features are enabled by default: * `std` - Uses the standard library. If disabled, it uses the core library instead. * `cache-type-score` - Enables caching type scores for faster processing. If disabled, type scores are calculated in a straightforward manner. * `fix-weight-length` - Uses fixed-size arrays for storing scores to facilitate optimization. If disabled, vectors are used instead. * `tag-prediction` - Enables tag prediction. * `charwise-pma` - Uses the [Charwise Daachorse](https://docs.rs/daachorse/latest/daachorse/charwise/index.html) instead of the standard version for faster prediction, although it can make to load a model file slower. ## Notes for distributed models The distributed models are compressed in the zstd format. If you want to load these compressed models, you must decompress them outside of the API. ```rust // Requires zstd crate or ruzstd crate let reader = zstd::Decoder::new(File::open("path/to/model.bin.zst")?)?; let model = Model::read(reader)?; ``` You can also decompress the file using the *unzstd* command, which is bundled with modern Linux distributions. ## License Licensed under either of * Apache License, Version 2.0 ([LICENSE-APACHE](../LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0) * MIT license ([LICENSE-MIT](../LICENSE-MIT) or http://opensource.org/licenses/MIT) at your option. ## Contribution Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.