cang-jie

Crates.iocang-jie
lib.rscang-jie
version0.18.0
sourcesrc
created_at2018-09-18 14:35:08.671671
updated_at2023-11-04 12:49:28.873055
descriptionA Chinese tokenizer for tantivy
homepage
repositoryhttps://github.com/DCjanus/cang-jie
max_upload_size
id85364
size13,506
DCjanus (DCjanus)

documentation

README

cang-jie(仓颉)

Crates.io latest document dependency status

A Chinese tokenizer for tantivy, based on jieba-rs.

As of now, only support UTF-8.

Example

    let mut schema_builder = SchemaBuilder::default();
    let text_indexing = TextFieldIndexing::default()
        .set_tokenizer(CANG_JIE) // Set custom tokenizer
        .set_index_option(IndexRecordOption::WithFreqsAndPositions);
    let text_options = TextOptions::default()
        .set_indexing_options(text_indexing)
        .set_stored();
    // ... Some code   
     let index = Index::create(RAMDirectory::create(), schema.clone())?;
     let tokenizer = CangJieTokenizer {
                        worker: Arc::new(Jieba::empty()), // empty dictionary
                        option: TokenizerOption::Unicode,
                     };
     index.tokenizers().register(CANG_JIE, tokenizer); 
    // ... Some code

Full example

Commit count: 46

cargo fmt