Crates.io | tiniestsegmenter |
lib.rs | tiniestsegmenter |
version | 0.3.0 |
source | src |
created_at | 2024-05-11 16:08:44.078953 |
updated_at | 2024-09-24 13:27:02.203242 |
description | Compact Japanese segmenter |
homepage | https://github.com/jwnz/tiniestsegmenter |
repository | https://github.com/jwnz/tiniestsegmenter |
max_upload_size | |
id | 1236896 |
size | 60,236 |
A port of TinySegmenter written in pure, safe rust with no dependencies. You can find bindings for both Rust and Python.
TinySegmenter is an n-gram word tokenizer for Japanese text originally built by Taku Kudo (2008).
Usage
Add the crate to your project: cargo add tiniestsegmenter
.
use tiniestsegmenter as ts;
fn main() {
let tokens: Vec<&str> = ts::tokenize("ジャガイモが好きです。");
}