.lr
An LR(1) parser generator and visualizer created for educational purposes.
[![crates.io](https://img.shields.io/crates/v/dotlr)](https://crates.io/crates/dotlr)
[![docs.rs](https://img.shields.io/docsrs/dotlr)](https://docs.rs/dotlr)
[![ci](https://img.shields.io/github/actions/workflow/status/umut-sahin/dotlr/ci.yml)](https://github.com/umut-sahin/dotlr/actions/workflows/ci.yml)
[![license](https://img.shields.io/crates/l/dotlr)](https://crates.io/crates/dotlr)
## Table of Contents
* [What is an LR(1) parser?](#what-is-an-lr1-parser)
* [Why did you make this?](#why-did-you-make-this)
* [How can I use the CLI in the picture?](#how-can-i-use-the-cli-in-the-picture)
* [Can I use it as a standalone library?](#can-i-use-it-as-a-standalone-library)
* [How it works?](#how-does-it-work)
* [1) Parsing the grammar](#1-parsing-the-grammar)
* [2) Computing FIRST sets](#2-computing-first-sets)
* [3) Computing FOLLOW sets](#3-computing-follow-sets)
* [4) Constructing the LR(1) automaton](#4-constructing-the-lr1-automaton)
* [5) Constructing ACTION and GOTO tables](#5-constructing-action-and-goto-tables)
* [6) Tokenizing the input](#6-tokenizing-the-input)
* [7) Parsing the tokens](#7-parsing-the-tokens)
* [Can I have symbols that can match to empty string?](#can-i-have-symbols-that-can-match-to-empty-string)
* [Can I have an LALR(1) parser instead of an LR(1) parser?](#can-i-have-an-lalr1-parser-instead-of-an-lr1-parser)
* [Any benchmarks?](#any-benchmarks)
* [Can I modify it?](#can-i-modify-it)
* [Which resources did you use when creating this?](#which-resources-did-you-use-when-creating-this)
* [How is it licensed?](#how-is-it-licensed)
## What is an LR(1) parser?
[LR(1) parser](https://en.wikipedia.org/wiki/Canonical_LR_parser) is a type of
[bottom-up parser](https://en.wikipedia.org/wiki/Bottom-up_parsing) for parsing a subset of
[context free languages](https://en.wikipedia.org/wiki/Context-free_language).
`1` indicates that the parser will use a single lookahead token when making parsing decisions.
LR(1) parsers are powerful because they can parse a wide range of context free languages,
including most programming languages!
## Why did you make this?
To learn and to help others learn! This project allows you to visualize LR(1) parsers, from
construction to parsing, which makes it easier to understand how they work.
## How can I use the CLI in the picture?
If you want to play around with `dotlr` to visualize parser construction and step-by-step parsing of
different grammars, you can use the main executable of the crate.
### Installation
You can install the `dotlr` cli from [crates.io](https://crates.io/crates/dotlr).
```shell
cargo install dotlr
```
### Usage
Paste the following into a file called `grammar.lr`:
```
P -> E
E -> E '+' T
E -> T
T -> %id '(' E ')'
T -> %id
%id -> /[A-Za-z][A-Za-z0-9]+/
```
Run `dotlr` cli with the grammar file and an input:
```shell
dotlr grammar.lr "foo(bar + baz)"
```
It'll print:
```
+--------------------------------+
| Grammar |
+--------------------------------+
| P -> E |
| E -> E '+' T |
| E -> T |
| T -> %id '(' E ')' |
| T -> %id |
| |
| %id -> /^[A-Za-z][A-Za-z0-9]+/ |
+--------------------------------+
+--------+-----------+-----------------+
| Symbol | First Set | Follow Set |
+--------+-----------+-----------------+
| T | { %id } | { $, '+', ')' } |
+--------+-----------+-----------------+
| E | { %id } | { $, '+', ')' } |
+--------+-----------+-----------------+
| P | { %id } | { $ } |
+--------+-----------+-----------------+
+-------+------------------------+--------------+---------------+
| State | Items | Lookaheads | Transitions |
+-------+------------------------+--------------+---------------+
| 0 | P -> . E | { $ } | E -> 1 |
| | E -> . E '+' T | { $, '+' } | T -> 2 |
| | E -> . T | { $, '+' } | %id -> 3 |
| | T -> . %id '(' E ')' | { $, '+' } | |
| | T -> . %id | { $, '+' } | |
+-------+------------------------+--------------+---------------+
| 1 | P -> E . | { $ } | '+' -> 14 |
| | E -> E . '+' T | { $, '+' } | |
+-------+------------------------+--------------+---------------+
| 2 | E -> T . | { $, '+' } | |
+-------+------------------------+--------------+---------------+
| 3 | T -> %id . '(' E ')' | { $, '+' } | '(' -> 4 |
| | T -> %id . | { $, '+' } | |
+-------+------------------------+--------------+---------------+
| 4 | T -> %id '(' . E ')' | { $, '+' } | E -> 5 |
| | E -> . E '+' T | { ')', '+' } | %id -> 6 |
| | E -> . T | { ')', '+' } | T -> 9 |
| | T -> . %id '(' E ')' | { ')', '+' } | |
| | T -> . %id | { ')', '+' } | |
+-------+------------------------+--------------+---------------+
| 5 | T -> %id '(' E . ')' | { $, '+' } | '+' -> 11 |
| | E -> E . '+' T | { ')', '+' } | ')' -> 13 |
+-------+------------------------+--------------+---------------+
| 6 | T -> %id . '(' E ')' | { ')', '+' } | '(' -> 7 |
| | T -> %id . | { ')', '+' } | |
+-------+------------------------+--------------+---------------+
| 7 | T -> %id '(' . E ')' | { ')', '+' } | %id -> 6 |
| | E -> . E '+' T | { ')', '+' } | E -> 8 |
| | E -> . T | { ')', '+' } | T -> 9 |
| | T -> . %id '(' E ')' | { ')', '+' } | |
| | T -> . %id | { ')', '+' } | |
+-------+------------------------+--------------+---------------+
| 8 | T -> %id '(' E . ')' | { ')', '+' } | ')' -> 10 |
| | E -> E . '+' T | { ')', '+' } | '+' -> 11 |
+-------+------------------------+--------------+---------------+
| 9 | E -> T . | { ')', '+' } | |
+-------+------------------------+--------------+---------------+
| 10 | T -> %id '(' E ')' . | { ')', '+' } | |
+-------+------------------------+--------------+---------------+
| 11 | E -> E '+' . T | { ')', '+' } | %id -> 6 |
| | T -> . %id '(' E ')' | { ')', '+' } | T -> 12 |
| | T -> . %id | { ')', '+' } | |
+-------+------------------------+--------------+---------------+
| 12 | E -> E '+' T . | { ')', '+' } | |
+-------+------------------------+--------------+---------------+
| 13 | T -> %id '(' E ')' . | { $, '+' } | |
+-------+------------------------+--------------+---------------+
| 14 | E -> E '+' . T | { $, '+' } | %id -> 3 |
| | T -> . %id '(' E ')' | { $, '+' } | T -> 15 |
| | T -> . %id | { $, '+' } | |
+-------+------------------------+--------------+---------------+
| 15 | E -> E '+' T . | { $, '+' } | |
+-------+------------------------+--------------+---------------+
+-------+---------------------------------------+----------------------+
| | Action | Goto |
| State | ------------------------------------- | -------------------- |
| | '+' '(' ')' %id $ | P E T |
+-------+---------------------------------------+----------------------+
| 0 | - - - s3 - | - 1 2 |
+-------+---------------------------------------+----------------------+
| 1 | s14 - - - a1 | - - - |
+-------+---------------------------------------+----------------------+
| 2 | r3 - - - r3 | - - - |
+-------+---------------------------------------+----------------------+
| 3 | r5 s4 - - r5 | - - - |
+-------+---------------------------------------+----------------------+
| 4 | - - - s6 - | - 5 9 |
+-------+---------------------------------------+----------------------+
| 5 | s11 - s13 - - | - - - |
+-------+---------------------------------------+----------------------+
| 6 | r5 s7 r5 - - | - - - |
+-------+---------------------------------------+----------------------+
| 7 | - - - s6 - | - 8 9 |
+-------+---------------------------------------+----------------------+
| 8 | s11 - s10 - - | - - - |
+-------+---------------------------------------+----------------------+
| 9 | r3 - r3 - - | - - - |
+-------+---------------------------------------+----------------------+
| 10 | r4 - r4 - - | - - - |
+-------+---------------------------------------+----------------------+
| 11 | - - - s6 - | - - 12 |
+-------+---------------------------------------+----------------------+
| 12 | r2 - r2 - - | - - - |
+-------+---------------------------------------+----------------------+
| 13 | r4 - - - r4 | - - - |
+-------+---------------------------------------+----------------------+
| 14 | - - - s3 - | - - 15 |
+-------+---------------------------------------+----------------------+
| 15 | r2 - - - r2 | - - - |
+-------+---------------------------------------+----------------------+
> foo(bar + baz)
P
└─ E
└─ T
├─ foo
├─ (
├─ E
│ ├─ E
│ │ └─ T
│ │ └─ bar
│ ├─ +
│ └─ T
│ └─ baz
└─ )
+------+---------------+-------------------+---------------------------+-------------------------------+
| Step | State Stack | Symbol Stack | Remaining Input | Action Taken |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 0 | 0 | | %id '(' %id '+' %id ')' $ | Shift 3 |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 1 | 0 3 | %id | '(' %id '+' %id ')' $ | Shift 4 |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 2 | 0 3 4 | %id '(' | %id '+' %id ')' $ | Shift 6 |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 3 | 0 3 4 6 | %id '(' %id | '+' %id ')' $ | Reduce 4 (T -> %id) |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 4 | 0 3 4 9 | %id '(' T | '+' %id ')' $ | Reduce 2 (E -> T) |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 5 | 0 3 4 5 | %id '(' E | '+' %id ')' $ | Shift 11 |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 6 | 0 3 4 5 11 | %id '(' E '+' | %id ')' $ | Shift 6 |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 7 | 0 3 4 5 11 6 | %id '(' E '+' %id | ')' $ | Reduce 4 (T -> %id) |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 8 | 0 3 4 5 11 12 | %id '(' E '+' T | ')' $ | Reduce 1 (E -> E '+' T) |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 9 | 0 3 4 5 | %id '(' E | ')' $ | Shift 13 |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 10 | 0 3 4 5 13 | %id '(' E ')' | $ | Reduce 3 (T -> %id '(' E ')') |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 11 | 0 2 | T | $ | Reduce 2 (E -> T) |
+------+---------------+-------------------+---------------------------+-------------------------------+
| 12 | 0 1 | E | $ | Accept 0 (P -> E) |
+------+---------------+-------------------+---------------------------+-------------------------------+
```
You can also enter REPL mode if you omit the input:
```shell
dotlr grammar.lr
```
## Can I use it as a standalone library?
Yes, you can depend on the `dotlr` crate from [crates.io](https://crates.io/crates/dotlr).
### Setup
Paste the following to your `dependencies` section of your `Cargo.toml`:
```toml
dotlr = { version = "0.4", default-features = false }
```
### Example
Here is a basic example that shows the primary operations of the `dotlr` crate:
```rust
use dotlr::{
Grammar,
Parser,
ParserError,
};
const GRAMMAR: &str = r#"
P -> E
E -> E '+' T
E -> T
T -> %id '(' E ')'
T -> %id
%id -> /[A-Za-z][A-Za-z0-9]+/
"#;
const INPUT: &str = r#"
foo(bar + baz)
"#;
fn main() {
let grammar = match Grammar::parse(GRAMMAR.trim()) {
Ok(grammar) => grammar,
Err(error) => {
eprintln!("grammar error: {}", error);
return;
}
};
let parser = match Parser::lr(grammar) {
Ok(parser) => parser,
Err(error) => {
eprintln!("parser error: {}", error);
if let ParserError::Conflict { parser, .. } = error {
parser.dump();
}
return;
}
};
let tokens = match parser.tokenize(INPUT.trim()) {
Ok(tokens) => tokens,
Err(error) => {
eprintln!("tokenization error: {}", error);
return;
}
};
let (parse_trace, parse_tree) = match parser.trace(tokens) {
Ok(result) => result,
Err(error) => {
eprintln!("tokenization error: {}", error);
return;
}
};
parser.dump();
println!();
parse_tree.dump();
println!();
parse_trace.dump(parser.grammar());
}
```
## How does it work?
Let's go over a step-by-step construction of the parser for the following grammar:
```
E -> E '+' F
E -> F
F -> F '*' T
F -> T
T -> %b
%b -> /[0-1]/
```
And then do a step-by-step explanation of the parsing steps for the following input:
```
1 + 0 * 1
```
Few notes before starting:
- `$` represents the end of input token
- `ε` represents the empty token
### 1) Parsing the grammar
First, we need to parse the grammar string to a grammar object that we can work with.
The grammar object will consist of:
- **symbols (HashSet