Crates.io | emdb_lib |
lib.rs | emdb_lib |
version | 0.1.3 |
source | src |
created_at | 2024-11-26 02:02:51.134689 |
updated_at | 2024-11-26 04:29:54.520748 |
description | Orthographic token compression |
homepage | https://dearborn.cc |
repository | https://github.com/gsspdev/emdb_lib |
max_upload_size | |
id | 1461096 |
size | 7,691 |
Applying Dearborn orthography to make English easier for machines to understand.
Dearborn orthography allows for lossless compression of English. This reduces the number of tokens required to encode meaning, and removes tokens that are informationally "distracting". It also removes confusing inconsistencies of standard English, while retaining it's structure and being convertible at any stage back to it's standard English equivalent. This compression and standardization of language down to meaning carrying tokens is ideal for the training of large language models.