Crates.io | alkale |
lib.rs | alkale |
version | 1.0.3 |
source | src |
created_at | 2024-09-13 18:50:37.419267 |
updated_at | 2024-10-14 16:27:47.87422 |
description | A simple LL(1) tokenizer library for Rust. |
homepage | |
repository | https://codeberg.org/AshliKatt/Alkale |
max_upload_size | |
id | 1374067 |
size | 106,988 |
This is the repository for Alkale, a Rust library to assist in making hand-written LL(1) tokenizers.
Alkale has three specific goals in mind for its design.
Alkale should natively handle common code sources, strings and files in particular.
General-purpose parsers usually need to either operate on files' bytes alone, or read entire files into memory, neither of which are ideal. Because Alkale doesn't need to support extensive lookahead, it can directly read characters from file buffers and treat them the same as if a regular string was being tokenized.
Span information is annoying to keep track of manually, so Alkale will automatically keeps track of spans for its tokens.
Due to the avoidance of in-memory source loading, Alkale's spans store index, line, and column information. This may lead to higher-than-average memory usage for non-iterator tokenizers. An iterator-like tokenizer that creates tokens as they're needed will avoid this problem.
Many aspects of tokenizers are extremely common and repetitive. Think things such as string parsing, number tokenization, error recovery, etc.
These common elements should come pre-packaged with Alkale by default. You may find a list of these in COMMON.md
.
Because I have roots in esolangs, there may be some odd built-ins to assist with non-standard languages.
The core of Alkale operates on the TokenizerContext
type. It is created using a BufReader<File>
(for convience), or
more generally, any type that implements IntoIterator<Item = char>
.
The TokenizerContext
provides LL(1) access into the underlying string with the next
and peek
methods, as well as tons of helper methods.
Other methods range from peek_is
, a general-purpose method to check if the next character is equal to some characters— all the way to
try_parse_simple_string
, which attempts to parse an entire rust-like string with character escaping and everything.