Crates.io | bappy-script |
lib.rs | bappy-script |
version | 0.1.0 |
source | src |
created_at | 2023-05-12 15:40:32.85639 |
updated_at | 2023-05-12 15:40:32.85639 |
description | Gankra's toy compiler |
homepage | |
repository | https://github.com/Gankra/bappy-script |
max_upload_size | |
id | 863084 |
size | 246,917 |
Just messing around with a toy compiler.
bappy-script is primarily a playground for learning how static type systems work (checker.rs). The parser and interpreter work but are an after-thought by comparison.
This program roughly demonstrates everything the language can do, including:
Basics:
()
)Types:
Analysis:
()
continue
outside a loop)fn print_1d_point() {
struct Point {
x: Int
}
let x = Point { x: 1 }
print x
ret ()
}
let _ = print_1d_point()
let print_point: fn() -> () = print_1d_point
let _ = print_point()
let tuple = (1, (true, "hello"), false)
if tuple.1.0 {
struct Point {
x: Int
y: Int
}
let captured_point = Point { x: 2, y: 4 }
fn print_2d_point() {
print captured_point
ret ()
}
let _ = print_2d_point();
set print_point = print_2d_point
}
struct Point {
x: Int
y: Int
z: Int
}
fn print_3d_point() -> Int {
let pt: Point = Point { x: 3, y: 5, z: 7 }
print pt
ret add(add(pt.x, pt.y), pt.z)
}
fn print_many() {
print "3 more times!!!"
let counter = 3
loop {
if eq(counter, 0) {
break
}
set counter = sub(counter, 1)
let _ = print_3d_point()
}
ret ()
}
let _ = print_1d_point()
let _ = print_point()
let res = print_3d_point()
print res
let _ = print_many()
ret res
The parser is bad (brittle) because I just wanted something simple that was easy to extend. I think it's technically "recursive descent" but I don't like, tokenize so idk. The parser has the worst errors and no recovery because I just didn't care about it.
Syntax is largely based on Rust's because it's a fairly clean and unambiguous syntax I'm comfortable with.
Why didn't you make it a lisp variant if you hate parsing?
I am really bad at reading/writing lispy things so this felt like a good tradeoff in personal effort and comfort. It's also easier for me to intuit how something "should" work when it looks like Rust code, because that's the language I understand the best.
Also if I tell something that it's Rust code, the syntax highlighting basically just works lol
Notable deviations:
Newlines are very significant. It sucks, but most of the time you don't really notice. It only really hurts for really complicated expressions and function declarations IMO. As a consolation I at least let you not write semicolons, since newlines basically are semicolons?
Many things must all be contained to their own line:
let x: MyType = func1(func2(a, b, c), d)
ret x
struct MyStruct {
fn (a: Int, b: Bool) -> Int {
if x {
} else {
x: Int
)}
)Some things are extra verbose because I wanted the parser and interpretter to be trivial:
let _ = func()
is the easiest way to call a function just for its side-effects.set
set x = y
set
and ret
values.There are no infix operators:
We've got builtins like add(x, y)
and eq(x, y)
, final offer
This is basically the place I'm most interested in, so you'll find src/checker.rs
will have the most comments and discussion.
Currently all analysis is done directly by recursively walking the AST. All the variables and types at every point in the program are tracked at every point in the program, but this information is mutable and transient (sort of like we're "executing" the program).
Ideally I'd like to write something that can create an (SSA?) Control Flow Graph to facilitate other analyses like Definite Initialization (a major piece of the ownership system in Rust, and just a staple of any good compiler because it lets you report things like "value assigned to variable is never used").
I'm potentially also interested in faffing around with Generics and maybe Higher-Rank Types? But I'm genuinely unsure about how to best represent and implement some of that stuff (or rather, am saddened that type comparison might have to be more complex than comparing type ids?).
To reiterate the notes in the example, the type system currently supports:
Basics:
()
)Types:
Analysis:
Optional Static type checking
()
Statically validates variable accesses are in scope
Statically validates control flow (can't continue
outside a loop)
Statically computes closure captures
It's a pretty simple "untyped" interpreter of the AST. Runtime values do have basic type tagging so we can check if something is a function or boolean for the places where only those types make sense.
It almost doesn't depend at all on the checker, but the closure capture analysis adds some required information to the AST.
Not depending on the checker helps catch bugs in the checker.
Nothing is terribly optimized. Any time a value is semantically moved it's Cloned (and things like tuples, closures, and structs all contain Vecs!). But even given that, compiling and evaluating the ~100 programs that are currently in the tests is basically instantaneous on my beefy work machine.
There's a lot more lifetimes in the interpreter than there really should be. I make a point of keeping all string literals as pointers into the original program text, because that was aesthetically important to me.