Remarks about lexer implementation ================================== Any terminal symbol of the language should be presented to the parser as a single token recognized by the lexer. Lexer should accept white space anywhere between tokens. If two consecutive tokens aren't separated by white space, then lexer should recognize them separately, unless a result of token concatenation is also a valid token — as a rule, longer token always wins. ## Token categories _Ascesis_ terminal symbols are keywords, identifiers, literals, operators, separators, delimiters and modifiers. - Keywords are `ces`, `vis`, `caps`, `unbounded`, `weights`, `inhibit`, `hold`, `Node`, `CES`, `Size` and `String`. - Identifiers are unquoted strings of alphanumeric characters (plus underscore) not starting from a digit and different from any of the keywords. - Literals are nonnegative integers and double-quoted strings. - Operators are thin and fat arrows and plus sign. - Separators are colon and comma. - Delimiters are parentheses and curly braces. - Exclamation mark is the only modifier. ## Rules for token recognition ```bnf keyword = "ces" | "vis" | "caps" | "unbounded" | "weights" | "inhibit" | "hold" | "Node" | "CES" | "Size" | "String" ; identifier = r"[a-zA-Z_][a-zA-Z0-9_]*" - keyword; literal = size | string ; size = r"[0-9]+" ; string = r#""[^"]*""# ; operator = "->" | "<-" | "=>" | "<=" | "<=>" | "+" ; separator = ":" | "," ; delimiter = "(" | ")" | "{" | "}" ; modifier = "!" ; ```