Crates.io | tokengeex |
lib.rs | tokengeex |
version | 1.1.0 |
source | src |
created_at | 2024-02-18 01:49:41.790073 |
updated_at | 2024-06-03 03:08:24.274661 |
description | TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster. |
homepage | https://codegeex.cn |
repository | https://github.com/rojas-diego/tokengeex/ |
max_upload_size | |
id | 1143684 |
size | 208,803 |
This repository holds the code for the TokenGeeX Rust crate and Python package. TokenGeeX is a tokenizer for CodeGeeX aimed at code and Chinese. It is based on UnigramLM (Taku Kudo 2018).