emdb_lib

Crates.ioemdb_lib
lib.rsemdb_lib
version0.1.3
sourcesrc
created_at2024-11-26 02:02:51.134689
updated_at2024-11-26 04:29:54.520748
descriptionOrthographic token compression
homepagehttps://dearborn.cc
repositoryhttps://github.com/gsspdev/emdb_lib
max_upload_size
id1461096
size7,691
(gsspdev)

documentation

README

Memory-efficient English language tokenizer

Applying Dearborn orthography to make English easier for machines to understand.

Dearborn orthography allows for lossless compression of English. This reduces the number of tokens required to encode meaning, and removes tokens that are informationally "distracting". It also removes confusing inconsistencies of standard English, while retaining it's structure and being convertible at any stage back to it's standard English equivalent. This compression and standardization of language down to meaning carrying tokens is ideal for the training of large language models.

Commit count: 38

cargo fmt