leqx

Crates.ioleqx
lib.rsleqx
version0.0.1
created_at2025-08-04 17:06:43.715714+00
updated_at2025-08-04 17:06:43.715714+00
descriptionSimple regex-automata based lexer, as a proc macro
homepage
repositoryhttps://github.com/thequux/leqx
max_upload_size
id1780956
size7,524
TQ Hirsch (thequux)

documentation

README

A simple lexer generator based on regex_automata

Warning: this is alpha-quality code (as evidenced by the 0.0.1 version); it has not been exhaustively tested or documented. Unless you wish to help with either, I recommend waiting for version 0.2 (though feel free to prod me to get there)

In particular, the API is likely to change to support the following:

  1. Support for non-'static tokens
  2. Support for lexing non-utf8 input (&[u8] tokens)
  3. Builtin support for skipping a token
  4. Better error messages (as in, any error messages at all)
  5. Hooks to be called before and after each token is lexed (e.g., for position tracking)

Example

use leqx::leqxer;
use regex_automata::{Anchored, Input};

pub enum Token<'a> {
	Word(&'a str),
	Number(isize),
}

leqxer! {
	#[derive(Default)]
	struct State {
		line: usize,
		column: usize,
	}

	#[leqxer(dfa=sparse, embed=true)]
	mode lex_raw(&mut self, tok) -> Option<(usize, usize, Token)> {
		"[ \t]+" => {
			self.column += tok.len();
			None
		},
		"\r?\n|\r" => {
			self.column = 0;
			self.line += 1;
			None
		},
		"[a-z]+" => {
			let col = self.column;
			self.column += tok.len();
			Some((self.line, col, Token::Word(tok)))
		},
		"[0-9]+" => {
			let col = self.column;
			self.column += tok.len();
			Some((self.line, col, Token::Word(tok)))
		}
	}
}

pub struct Lexer<'a> {
	state: State,
	input: regex_automata::Input<'a>,
}

impl<'a> Lexer<'a> {
	pub fn new(input: &'a str) -> Self {
		Self {
			state: State::default(),
			input: regex_automata::Input::new(input).anchored(Anchored::Yes),
		}
	}
}

impl <'a> Iterator for Lexer<'a> {
	type Item = (usize, usize, Token<'a>);

	fn next(&mut self) -> Option<Self::Item> {
		loop {
			// the lex_raw method is from the name of the mode above
			let tok = self.state.lex_raw(&mut self.input)?;
			if let Some(tok) = tok {
				return Some(tok);
			}
		}
	}
}
Commit count: 0

cargo fmt