| Crates.io | opencc-fmmseg |
| lib.rs | opencc-fmmseg |
| version | 0.8.1 |
| created_at | 2025-07-10 07:45:27.833722+00 |
| updated_at | 2025-08-28 08:44:25.809638+00 |
| description | High-performance OpenCC-based Chinese conversion using FMM (Forward Maximum Matching) segmentation. |
| homepage | https://github.com/laisuk/opencc-fmmseg |
| repository | https://github.com/laisuk/opencc-fmmseg |
| max_upload_size | |
| id | 1746008 |
| size | 5,046,914 |
opencc-fmmseg is a high-performance Rust-based engine for Chinese text conversion.
It combines OpenCC's lexicons with an
optimized Forward Maximum Matching (FMM) algorithm, suitable for:
use opencc_fmmseg::OpenCC;
fn main() {
let input = "汉字转换测试";
let opencc = OpenCC::new();
let output = opencc.convert(input, "s2t", false);
println!("{}", output); // 漢字轉換測試
}
Grab the latest version for your platform from the Releases page:
| Platform | Download Link |
|---|---|
| 🪟 Windows | opencc-fmmseg-windows.zip |
| 🐧 Linux | opencc-fmmseg-linux.zip |
| 🍎 macOS | opencc-fmmseg-macos.zip |
Each archive contains:
README.txt
version.txt
bin/ # Command-line tools
lib/ # Shared library (.dll / .so / .dylib)
include/ # C API header + C++ helper header
git clone https://github.com/laisuk/opencc-fmmseg
cd opencc-fmmseg
cargo build --release --workspace
The CLI tool will be located at:
target/release/
opencc-rs # CLI plain text and Office document text converter
opencc-clip # Convert from clipboard, auto detect config
dict-generate # Generate dictionary ZSTD, CBOR or JSON files
opencc-rs convertConvert plain text using OpenCC
Usage: opencc-rs.exe convert [OPTIONS] --config <config>
Options:
-i, --input <file> Input file (use stdin if omitted for non-office documents)
-o, --output <file> Output file (use stdout if omitted for non-office documents)
-c, --config <config> Conversion configuration [possible values: s2t, t2s, s2tw, tw2s, s2twp, tw2sp, s2hk, hk2s, t2tw, t2twp, t2hk, tw2t, tw2tp, hk2t, t2jp, jp2t]
-p, --punct Enable punctuation conversion
--in-enc <in_enc> Encoding for input [default: UTF-8]
--out-enc <out_enc> Encoding for output [default: UTF-8]
-h, --help Print help
opencc-rs officeConvert Office or EPUB documents using OpenCC
Usage: opencc-rs.exe office [OPTIONS] --config <config>
Options:
-i, --input <file> Input file (use stdin if omitted for non-office documents)
-o, --output <file> Output file (use stdout if omitted for non-office documents)
-c, --config <config> Conversion configuration [possible values: s2t, t2s, s2tw, tw2s, s2twp, tw2sp, s2hk, hk2s, t2tw, t2twp, t2hk, tw2t, tw2tp, hk2t, t2jp, jp2t]
-p, --punct Enable punctuation conversion
-f, --format <ext> Force document format: docx, odt, epub...
--keep-font Preserve original font styles
--auto-ext Infer format from file extension
-h, --help Print help
./opencc-rs convert -c s2t -i text_simplified.txt -o text_traditional.txt
.docx, .xlsx, .pptx, .odt, .ods, .odp, .epub./opencc-rs office -c s2t --punct --format docx -i doc_simplified.docx -o doc_traditional.docx
s2t – Simplified to Traditionals2tw – Simplified to Traditional Taiwans2twp – Simplified to Traditional Taiwan with idiomst2s – Traditional to Simplifiedtw2s – Traditional Taiwan to Simplifiedtw2sp – Traditional Taiwan to Simplified with idiomsBy default, it uses OpenCC's built-in lexicon paths. You can also provide your own lexicon dictionary generated by
dict-generate CLI tool.
You can also use opencc-fmmseg as a library:
To use opencc-fmmseg in your project, add this to your Cargo.toml:
[dependencies]
opencc-fmmseg = "0.8.0" # or latest version
Then use it in your code:
use opencc_fmmseg::OpenCC;
fn main() {
let input = "这是一个测试";
let opencc = OpenCC::new();
let output = opencc.convert(input, "s2t", false);
println!("{}", output); // -> "這是一個測試"
}
📦 Crate: opencc-fmmseg on crates.io
📄 Docs: docs.rs/opencc-fmmseg
opencc_fmmseg_capi)You can also use opencc-fmmseg via a C API for integration with C/C++ projects.
The zip includes:
opencc_fmmseg_capi.{so,dylib,dll}opencc_fmmseg_capi.hOpenccFmmsegHelper.hppYou can link against the shared library and call the segmentation/convert functions from any C or C++ project.
#include "opencc_fmmseg_capi.h"
void* handle = opencc_new();
const char* config = "s2t";
const char* result = opencc_convert(handle, "汉字", config, false);
opencc_delete(handle);
#include <stdio.h>
#include "opencc_fmmseg_capi.h"
int main(int argc, char **argv) {
void *opencc = opencc_new();
bool is_parallel = opencc_get_parallel(opencc);
printf("OpenCC is_parallel: %d\n", is_parallel);
const char *config = u8"s2twp";
const char *text = u8"意大利邻国法兰西罗浮宫里收藏的“蒙娜丽莎的微笑”画像是旷世之作。";
printf("Text: %s\n", text);
int code = opencc_zho_check(opencc, text);
printf("Text Code: %d\n", code);
char *result = opencc_convert(opencc, text, config, true);
code = opencc_zho_check(opencc, result);
char *last_error = opencc_last_error();
printf("Converted: %s\n", result);
printf("Text Code: %d\n", code);
printf("Last Error: %s\n", last_error == NULL ? "No error" : last_error);
if (last_error != NULL) {
opencc_error_free(last_error);
}
if (result != NULL) {
opencc_string_free(result);
}
if (opencc != NULL) {
opencc_delete(opencc);
}
return 0;
}
OpenCC is_parallel: 1
Text: 意大利邻国法兰西罗浮宫里收藏的“蒙娜丽莎的微笑”画像是旷世之作。
Text Code: 2
Converted: 義大利鄰國法蘭西羅浮宮裡收藏的「蒙娜麗莎的微笑」畫像是曠世之作。
Text Code: 1
Last Error: No error
opencc_new() initializes the engine.opencc_convert(...) performs the conversion with the specified config (e.g., s2t, t2s, s2twp).opencc_string_free(...) must be called to free the returned string.opencc_delete(...) must be called to free OpenCC object.opencc_zho_check(...) to detect zh-Hant (1), zh-Hans (2), others (0).opencc_get_parallel().opencc_last_error().src/lib.rs – Main library with segmentation logic.capi/opencc-fmmseg-capi C API source and demo.tools/opencc-rs/src/main.rs – CLI tool (opencc-rs) implementation.dicts/ – OpenCC text lexicons which converted into Zstd compressed CBOR format.opencc-fmmseg Conversion SpeedTested using Criterion.rs on 12,000-character text with
punctuation disabled (punctuation = false).
Results from v0.8.0:
| Input Size | s2t Mean Time | t2s Mean Time |
|---|---|---|
| 100 | 46.47 µs | 50.40 µs |
| 1,000 | 134.18 µs | 135.72 µs |
| 10,000 | 393.05 µs | 375.40 µs |
| 100,000 | 1.664 ms | 1.397 ms |
| 1,000,000 | 16.034 ms | 13.466 ms |
📊 Throughput Interpretation
At this scale, performance is so high that I/O (disk or network), not the converter, becomes the bottleneck.
