| Crates.io | runmat-lexer |
| lib.rs | runmat-lexer |
| version | 0.2.8 |
| created_at | 2025-10-15 03:38:43.531786+00 |
| updated_at | 2025-12-22 21:35:35.423766+00 |
| description | Lexer for the RunMat language (MATLAB/Octave syntax) built with logos |
| homepage | https://runmat.org |
| repository | https://github.com/runmat-org/runmat |
| max_upload_size | |
| id | 1883621 |
| size | 53,354 |
This crate tokenizes MATLAB/Octave source code into a stream of tokens for the parser.
It uses the logos library to define a fast, zero-copy DFA with a small amount of
context via LexerExtras to handle MATLAB-specific ambiguities.
We track two pieces of context in LexerExtras:
last_was_value: bool — true if the previous emitted token forms a value.
Used to disambiguate ' as transpose vs string start.line_start: bool — true if we are at the beginning of a logical line.
Used for %% section markers.function if elseif else for while break continue return endswitch case otherwise try catch global persistent true falseclassdef properties methods events enumeration argumentsimport[A-Za-z_][A-Za-z0-9_]*'...' with doubled quotes '' inside"..." with doubled quotes "" inside+ - * / \ ^.* ./ .\ .^== ~= < <= > >=&& || & | ~' (contextual):.@? (e.g., ?MyClass)= , ;() [] {}% to end of line%% at start of line%{ ... %} (non-nesting)... (skips remainder of physical line)line_start':
) ] }), emit Transpose%%:
line_start == true; otherwise % starts a normal line comment...:
Ellipsis and consumes the remainder of the physical line, including any % comment following itThe lexer purposefully does not encode high-level semantics:
int8/uint64 are identifiersvarargin/varargout/ans are identifiershandle inheritance, method attributes) are parsed/handled laterSee tests/ for comprehensive coverage, organized by topic:
lexer.rs: core tokens, operators, single-quoted strings, comments, ellipsistranspose.rs: detailed diagnostics and assertions for apostrophe (') transpose casescomments_continuation.rs: % line comments, %{...%} block comments, %% section markers, ... continuationoperators.rs: logical and element-wise operators (e.g., .* ./ .\ .^ && || & | ~)namespaces.rs: import paths (including wildcard) and metaclass ?ClassNameoop_tokens.rs: OOP keywords (classdef, properties, methods, events, enumeration, arguments) and function handles @strings_chars.rs: double-quoted string scalars and apostrophe disambiguation exercisestokens_basic.rs: identifiers, numbers, separators (; ,), and simple keyword smoke testsAll lexer tests pass when running the crate tests on their own.
LexerExtras and
use a Logos callback to Emit or Skip appropriately..' is tokenized as Dot then Transpose.
The parser should interpret this pair as the non-conjugating transpose.%{...%} are treated as non-nesting by design.Str token, while malformed single-quoted sequences may
be split to allow downstream error reporting.... continuation and % comments are covered by tests; a few rare permutations may still be added as seeds (parser semantics unaffected).runmat-parser, runmat-hir, runmat-ignition, runmat-turbine) are responsible for structure and semantics.