mecrab

Crates.iomecrab
lib.rsmecrab
version0.1.0
created_at2026-01-05 23:41:51.877195+00
updated_at2026-01-05 23:41:51.877195+00
descriptionA high-performance, thread-safe morphological analyzer compatible with MeCab, written in pure Rust
homepage
repositoryhttps://github.com/cool-japan/mecrab
max_upload_size
id2024805
size383,883
KitaSan (cool-japan)

documentation

README

mecrab

Core runtime library for MeCrab morphological analyzer.

Features

  • MeCab Compatible: Works with IPADIC/UniDic dictionaries
  • High Performance: Memory-mapped dictionaries, SIMD-optimized Viterbi (AVX2)
  • Thread-safe: Safe concurrent access
  • Live Updates: Add/remove words at runtime
  • Semantic Linking: Wikidata URI attachment with JSON-LD/RDF export
  • N-best Search: A* algorithm for multiple path analysis
  • Streaming: Sentence boundary detection for large text processing
  • Phonetic Transduction: Kana/Romaji/X-SAMPA/IPA conversion

Installation

[dependencies]
mecrab = "0.1"

Feature Flags

Feature Description
json JSON output format
parallel Parallel batch processing (rayon)
simd SIMD optimizations (AVX2)
wasm WebAssembly bindings
python Python bindings (PyO3)

Usage

use mecrab::MeCrab;

let mecrab = MeCrab::new()?;
let result = mecrab.parse("すもももももももものうち")?;
println!("{}", result);

// Add custom words
mecrab.add_word("ChatGPT", "チャットジーピーティー", "チャットジーピーティー", 5000);

// N-best paths
use mecrab::viterbi::NbestSearch;
let nbest = NbestSearch::new(&mecrab);
for path in nbest.search("東京", 5)? {
    println!("Cost: {}", path.total_cost);
}

// Phonetic conversion
use mecrab::phonetic::PhoneticTransducer;
let transducer = PhoneticTransducer::new();
println!("{}", transducer.to_romaji("こんにちは")); // konnichiha

Module Structure

mecrab/src/
├── lib.rs           # Public API
├── dict/            # Dictionary loading
│   ├── mod.rs       # Token, SysDic, OverlayDictionary
│   └── user_dict.rs # User dictionary persistence
├── lattice/         # Lattice construction
├── viterbi/         # Viterbi algorithm
│   ├── mod.rs       # Core Viterbi
│   ├── simd.rs      # AVX2 acceleration
│   ├── nbest.rs     # N-best A* search
│   └── analysis.rs  # Cost analysis
├── semantic/        # Semantic enrichment
│   ├── mod.rs       # SemanticEntry, EntityType
│   ├── pool.rs      # SemanticPool (5-byte entries)
│   ├── jsonld.rs    # JSON-LD export
│   ├── rdf.rs       # RDF/Turtle/N-Triples export
│   ├── disambiguation.rs  # Disambiguation strategies
│   └── extension.rs # TokenExtension
├── phonetic/        # Phonetic processing
│   ├── mod.rs       # Reading extraction
│   └── transducer.rs # Kana/Romaji/X-SAMPA/IPA
├── stream.rs        # Streaming text processing
├── normalize.rs     # Text normalization
├── bench.rs         # Benchmarking utilities
└── error.rs         # Error types

License

MIT OR Apache-2.0

Commit count: 1

cargo fmt