lachs

Crates.iolachs
lib.rslachs
version0.1.3
sourcesrc
created_at2024-10-27 23:47:04.172339
updated_at2024-11-10 21:56:06.817163
descriptionCrate for automatically creating a lexer based on a given enum
homepage
repositoryhttps://github.com/H1ghBre4k3r/lachs
max_upload_size
id1425113
size9,834
Louis Meyer (H1ghBre4k3r)

documentation

https://docs.rs/lachs

README

Lachs

A tool to automatically generate a lexer based on a given enum.

Usage

To generate a lexer from a given struct, just annotate it with [token]:

use lachs::token;

#[token]
pub enum Token {
    #[terminal("+")]
    Plus,
    #[literal("[0-9]+")]
    Integer
}

As you can see, we also annotated the variants Token::Plus and Token::Integer with #[terminal("+")] and #[literal("[0-9]+")], respectively.

The helper #[terminal(...)] takes a string literal which has to match exactly to be lexed as the decorated token, while #[literal(...)] takes a regular expression to extract a matched sequence from the text.

These helper macros get evaluated by #[token] and describe the two different kinds of tokens the lexer can understand:

  • terminals (without an own value)
  • literals (with an own value)

Under the hood, the proc macro expands the struct to roughly the following:

pub enum Token {
    Plus {
        position: lachs::Span,
    },
    Integer {
        value: String,
        position: lachs::Span,
    }
}

Both, terminals and literals have a field named position to store the position in the originating text. Literals have an additional field value which stores the value which matched the passed regular expression.

Additionally, the Token enum gets a function which lets you pass a string and get the result of the lexing back:

use lachs::token;

#[token]
pub enum Token {
    #[terminal("+")]
    Plus,
    #[literal("[0-9]+")]
    Integer
}

let result: Result<Vec<Token>, LexError> = Token::lex("2 + 2");

Caveats

The macro also generates an implementation of PartialEq for the decorated enum. However, this implementation does not take the position into account.

If you want to check whether two tokens are exactly the same, you can utilize the Token::does_equal(...) function.

Generated Stuff

The macro generates additional structs for performing the actual lexing. These should not be touched, if possible. However, they can lead to name collisions.

Commit count: 16

cargo fmt