//! # wordfreq-model //! //! This crate provides a loader for pre-compiled wordfreq models, //! allowing you to easily create [`WordFreq`] instances for various languages. //! //! ## Instructions //! //! The provided models are the same as those distributed in the [original Python package](https://doi.org/10.5281/zenodo.7199437). //! See the [original documentation](https://github.com/rspeer/wordfreq/tree/v3.0.2#sources-and-supported-languages) //! for the supported languages and their sources. //! //! You need to specify models you want to use with `features`. //! The feature names are in the form of `large-xx` or `small-xx`, where `xx` is the language code. //! For example, if you want to use the large-English and small-Japanese models, //! specify `large-en` and `small-ja` as follows: //! //! ```toml //! # Cargo.toml //! //! [dependencies.wordfreq-model] //! version = "0.2" //! features = ["large-en", "small-ja"] //! ``` //! //! There is no default feature. //! **Be sure to specify features you want to use.** //! //! ## Examples //! //! [`load_wordfreq`] can create a [`WordFreq`] instance from a [`ModelKind`] enum value. //! [`ModelKind`] will have the specified feature names in CamelCase, such as `LargeEn` or `SmallJa`. //! //! By default, only [`ModelKind::ExampleEn`] appears for tests. //! //! ``` //! use approx::assert_relative_eq; //! use wordfreq_model::load_wordfreq; //! use wordfreq_model::ModelKind; //! //! let wf = load_wordfreq(ModelKind::ExampleEn).unwrap(); //! assert_relative_eq!(wf.word_frequency("las"), 0.25); //! assert_relative_eq!(wf.word_frequency("vegas"), 0.75); //! assert_relative_eq!(wf.word_frequency("Las"), 0.25); // Standardized //! ``` //! //! ## Standardization //! //! As the above example shows, the model automatically standardizes words before looking them up (i.e., `Las` is handled as `las`). //! This is done by an instance [`Standardizer`] set up in the [`WordFreq`] instance. //! [`load_wordfreq`] automatically sets up an appropriate [`Standardizer`] instance for each language. //! //! ## Notes //! //! This crate downloads specified model files and embeds the models directly into the source code. //! **Specify as many models as you need** to avoid extra downloads and bloating the resulting binary. //! //! The actual model files to be used are placed [here](https://github.com/kampersanda/wordfreq-rs/releases/tag/models-v1) together with the credits. //! If you do not desire automatic model downloads and binary embedding, you can create instances from these files directly. //! See the instructions in [wordfreq]. use std::env; use anyhow::Result; use wordfreq::Standardizer; use wordfreq::WordFreq; /// Supported model kinds. /// /// Since only specified feature names are available, /// only [`ModelKind::ExampleEn`] will be displayed in docs.rs. /// Normally, those you specify will be available such as `LargeEn` or `SmallJa`. /// /// If models you want to use are not available, /// specify the features following the Instructions. pub enum ModelKind {{ /// Example data for tests. ExampleEn, {model_kind_block} }} const DATA_EXAMPLE_EN: &[u8] = include_bytes!(concat!(env!("OUT_DIR"), "/example_en.bin")); {const_block} /// Loads a pre-compiled [`WordFreq`] model, setting up an appropriate [`Standardizer`] instance. pub fn load_wordfreq(kind: ModelKind) -> Result {{ match kind {{ ModelKind::ExampleEn => {{ Ok(WordFreq::deserialize(DATA_EXAMPLE_EN)?.standardizer(Standardizer::new("en")?)) }} {match_block} }} }}