ragit-korean

Crates.ioragit-korean
lib.rsragit-korean
version0.4.3
created_at2024-12-30 14:17:57.361309+00
updated_at2025-09-15 15:33:47.895117+00
descriptionkorean tokenizer for ragit
homepage
repositoryhttps://github.com/baehyunsol/ragit
max_upload_size
id1499190
size65,959
(baehyunsol)

documentation

https://docs.rs/ragit-korean

README

ragit-korean

Ragit-korean is a very simple korean tokenizer.

Ragit used to use charabia to tokenize cjk documents, but it has too many issues.

  1. Charabia bundles cjk dictionaries in the binary, which makes the file 70MiB bigger.
  2. It silently converts 완성형 korean to 조합형 korean. That silently messes up tfidf searches.
Commit count: 869

cargo fmt