| Crates.io | kizame |
| lib.rs | kizame |
| version | 0.1.0 |
| created_at | 2026-01-05 23:52:54.0205+00 |
| updated_at | 2026-01-05 23:52:54.0205+00 |
| description | KizaMe (刻め!) - CLI for MeCrab morphological analyzer and data pipeline |
| homepage | |
| repository | https://github.com/cool-japan/mecrab |
| max_upload_size | |
| id | 2024825 |
| size | 206,414 |
CLI for MeCrab morphological analyzer.
MeCab → KizaMe (刻め = "Chop up!")
# Default (lightweight)
cargo install kizame
# With Wikidata builder
cargo install kizame --features builder
# Basic parsing
echo "すもももももももものうち" | kizame
kizame -d /var/lib/mecab/dic/ipadic-utf8 parse
# Wakati (space-separated)
echo "日本語" | kizame -w
# JSON output
echo "東京都" | kizame -O json
# With IPA pronunciation (requires | cat for terminal display)
echo "こんにちは" | kizame parse --with-ipa | cat
# With word embeddings
echo "東京に行く" | kizame parse --with-vector -v vectors.bin | cat
# With both IPA and vectors
echo "私は学生です" | kizame parse --with-ipa --with-vector -v vectors.bin | cat
kizame dict init # Find IPADIC
kizame dict info # Show stats
kizame dict dump -d /path # Inspect
kizame dict dump -d /path --vocab > vocab.txt # Extract vocabulary
# Train Word2Vec embeddings (Pure Rust!)
kizame vectors train \
-i corpus_ids.txt \
-o vectors.bin \
-f mcv1 \
--max-word-id 392126 \
--size 100 \
--window 5 \
--negative 5 \
--epochs 3 \
--threads 8
# Convert word2vec text format to MCV1
kizame vectors convert \
-i word2vec.txt \
-o vectors.bin \
-f word2vec-text \
-v vocab.txt
# Show vector file info
kizame vectors info -v vectors.bin
# Launch interactive lattice debugger
kizame explore "東京に行く"
# With custom dictionary
kizame explore -d /path/to/dict "テキスト"
# With semantic pool
kizame explore -s /path/to/semantic.bin "東京都"
Screenshot:

Features:
? for help, q to quitkizame build \
--source ipadic.csv \
--wikidata latest-all.json.gz \
--output ./semantic-dic
| Flag | Description |
|---|---|
-d, --dicdir |
Dictionary path |
-O, --output-format |
default/wakati/dump/json/jsonld/turtle/ntriples/nquads |
-w, --wakati |
Space-separated output |
-n, --nbest |
N-best output count |
--with-ipa |
Include IPA pronunciation (requires | cat) |
--with-vector |
Include word embeddings |
-v, --vector-pool |
Path to vector file (MCV1 format) |
--with-semantic |
Include Wikidata URIs (semantic formats) |
| Format | Description |
|---|---|
default |
MeCab-compatible output |
wakati |
Space-separated surface forms |
dump |
Full token details |
json |
JSON array output |
jsonld |
JSON-LD with semantic URIs |
turtle |
Turtle (TTL) RDF format |
ntriples |
N-Triples RDF format |
nquads |
N-Quads RDF format |
# Basic parsing
echo "東京都庁で会議" | kizame
# JSON-LD with semantic information
echo "東京都" | kizame -O jsonld --with-semantic
# Turtle (TTL) RDF output
echo "東京に行く" | kizame -O turtle
# N-Triples RDF output
echo "東京に行く" | kizame -O ntriples
# N-Quads RDF output
echo "東京に行く" | kizame -O nquads
# N-best paths
echo "すもも" | kizame -n 5
# Streaming large files
cat large_corpus.txt | kizame -w > tokenized.txt
MIT OR Apache-2.0