- GrammarBuilder should detect when a non-terminal is defined but never used, and store a set of such symbols in the resulting Grammar - and maybe not bother adding those rules to the resulting grammar - Instead of GrammarBuilder having a set of nullable symbols that it maintains as rules are added, it can just build up the set in the the `.build()` method while it's building up the "called by" map - Look into Aycock & Horspool's e-DFA scheme for precalculating item sets. - Add a way to clear rules, so we can have defaults - relative numbering for start numbers, so we can make the state array a rope instead of a Vec? - Make grammars and parsers generic over the atom type, so we can parse both char-streams and byte-streams. - Let terminal symbols be char-classes, not just specific chars. Should reduce memory-requirements for grammars, but makes it harder to build a parse-tree from just items? - Keep items produced by "scan" steps separate from items produced by "predict" and "complete" steps so if the input is modified at offset X we can keep the results of the "scan" at offset X-1 while tossing the "predict" and "complete" results. - InvalidInput parse errors should also mention what non-terminals they're looking for. - Ruby Slippers parsing, where we hallucinate invisible non-terminal symbols when we need them to match a rule? - A way to "shrink" a parse tree, removing nodes where one non-terminal immediately calls another, with no padding or wrapping. - makes for prettier trees, maybe easier to process? - A way to "flatten" a parse tree to a sequence of non-overlapping ranges, where each range is associated with the innermost non-terminal to match that range, - for syntax highlighting - A way to find all the character ranges that were matched by a rule for a given symbol, even if other symbols matched within that range - for finding (for example) "all comments" - Non-terminals within rules can have "tags" which get stored in the resulting parse-tree, so code that deals with the parse-tree can pull out the important parts of a rule without having to hard-code their exact offsets in the `non_terminals` vec. - A way to mark rules as ignorable in the parse-tree, so that they affect what gets parsed/recognised but don't appear in the resulting parse-tree. - Nobody cares about whitespace nodes. - It's cool we can handle ambiguous grammars, but maybe it's more useful to deterministically pick an alternative? Given two nodes for the same symbol, covering the same span, we could pick the one that was defined first (assuming we track that in the grammar) so the grammar author can control how ambiguity is resolved.