Parser

Source

parser.rs

Code Walk Through

The job of the parser is to take the vector of tokens and convert it into a recursive list structure (mentioned in the introduction). This recursive list structure is an in-memory representation of the Lisp program.

Object model

Before diving into the details of how the recursive list structure is created, we need to first define the objects that will make up the elements of the list. This is done in the object Rust module with an enum defined as follows

pub enum Object {
    Void,
    Integer(i64),
    Bool(bool),
    Symbol(String),
    Lambda(Vec<String>, Vec<Object>),
    List(Vec<Object>),
}
  • Void: An empty object
  • Integer: A signed 64-bit integer
  • Bool: A boolean value
  • Symbol: A Lisp symbol, similar to the Symbol token
  • Lambda: Function object, with the first vector representing the parameter labels and the second vector representing the body of the function. This object is not used during parsing but will be used during the evaluation phase of the interpreter.
  • List: List object

Parser

The parser for the Lisp dialect is a simple recursive function. It takes a vector of tokens (in reverse) generated by the lexer and generates a single List Object representing the entire Lisp program. The core logic of the parser is implemented by the recursive parse_list function as shown below. It expects a vector of tokens, with the first token of the vector being a left parenthesis indicating the start of the list. The function then proceeds to process the elements of the list one at a time. If it encounters atomic tokens such as an integer or symbol it creates corresponding atomic objects and adds them to the list. If it encounters another left parenthesis it recurses with the remaining tokens. Finally, the function returns with the list object when it encounters the right parenthesis.

let token = tokens.pop();
if token != Some(Token::LParen) {
    return Err(ParseError {
        err: format!("Expected LParen, found {:?}", 
                     token),
    });
}
   
let mut list: Vec<Object> = Vec::new(); 
while !tokens.is_empty() {
    let token = tokens.pop();
    if token == None {
        return Err(ParseError {
            err: format!("Insufficient tokens"),
        });
    }
    let t = token.unwrap();
    match t {
        Token::Integer(n) => 
            list.push(Object::Integer(n)),
        Token::Symbol(s) => 
            list.push(Object::Symbol(s)),
        Token::LParen => {
            tokens.push(Token::LParen);
            let sub_list = parse_list(tokens)?;
            list.push(sub_list);
        }
        Token::RParen => {
            return Ok(Object::List(list));
        }
    }
}

Testing

The above parsing code can be unit tested as follows

let list = parse("(+ 1 2)").unwrap();
assert_eq!(
    list,
    Object::List(vec![
        Object::Symbol("+".to_string()),
        Object::Integer(1),
        Object::Integer(2),
    ])
);

To cement your understanding of the parsing process please go through the remaining tests in parser.rs