tiniestsegmenter

Crates.iotiniestsegmenter
lib.rstiniestsegmenter
version0.3.0
sourcesrc
created_at2024-05-11 16:08:44.078953
updated_at2024-09-24 13:27:02.203242
descriptionCompact Japanese segmenter
homepagehttps://github.com/jwnz/tiniestsegmenter
repositoryhttps://github.com/jwnz/tiniestsegmenter
max_upload_size
id1236896
size60,236
(jwnz)

documentation

README

TiniestSegmenter

A port of TinySegmenter written in pure, safe rust with no dependencies. You can find bindings for both Rust and Python.

TinySegmenter is an n-gram word tokenizer for Japanese text originally built by Taku Kudo (2008).

Usage

Add the crate to your project: cargo add tiniestsegmenter.

use tiniestsegmenter as ts;

fn main() {
    let tokens: Vec<&str> = ts::tokenize("ジャガイモが好きです。");
}
Commit count: 13

cargo fmt