created_at2021-12-13 19:49:21.780826
updated_at2024-04-11 19:52:15.118506
descriptionISLE: Instruction Selection and Lowering Expressions. A domain-specific language for instruction selection in Cranelift.



# ISLE: Instruction Selection/Lowering Expressions DSL See also: [Language Reference](./docs/ ## Table of Contents * [Introduction](#introduction) * [Example Usage](#example-usage) * [Tutorial](#tutorial) * [Implementation](#implementation) ## Introduction ISLE is a DSL that allows one to write instruction-lowering rules for a compiler backend. It is based on a "term-rewriting" paradigm in which the input -- some sort of compiler IR -- is, conceptually, a tree of terms, and we have a set of rewrite rules that turn this into another tree of terms. This repository contains a prototype meta-compiler that compiles ISLE rules down to an instruction selector implementation in generated Rust code. The generated code operates efficiently in a single pass over the input, and merges all rules into a decision tree, sharing work where possible, while respecting user-configurable priorities on each rule. The ISLE language is designed so that the rules can both be compiled into an efficient compiler backend and can be used in formal reasoning about the compiler. The compiler in this repository implements the former. The latter use-case is future work and outside the scope of this prototype, but at a high level, the rules can be seen as simple equivalences between values in two languages, and so should be translatable to formal constraints or other logical specification languages. Some more details and motivation are in [BA RFC #15]( Reference documentation can be found [here](docs/ Details on ISLE's integration into Cranelift can be found [here](../docs/ ## Example Usage Build `islec`, the ISLE compiler: ```shell $ cargo build --release ``` Compile a `.isle` source file into Rust code: ```shell $ target/release/islec -i isle_examples/test.isle -o isle_examples/ ``` Include that Rust code in your crate and compile it: ```shell $ rustc isle_examples/ ``` ## Tutorial This tutorial walks through defining an instruction selection and lowering pass for a simple, RISC-y, high-level IR down to low-level, CISC-y machine instructions. It is intentionally somewhat similar to CLIF to MachInst lowering, although it restricts the input and output languages to only adds, loads, and constants so that we can focus on ISLE itself. > The full ISLE source code for this tutorial is available at > `isle_examples/tutorial.isle`. The ISLE language is based around rules for translating a term (i.e. expression) into another term. Terms are typed, so before we can write rules for translating some type of term into another type of term, we have to define those types: ```lisp ;; Declare that we are using the `i32` primitive type from Rust. (type i32 (primitive i32)) ;; Our high-level, RISC-y input IR. (type HighLevelInst (enum (Add (a Value) (b Value)) (Load (addr Value)) (Const (c i32)))) ;; A value in our high-level IR is a Rust `Copy` type. Values are either defined ;; by an instruction, or are a basic block argument. (type Value (primitive Value)) ;; Our low-level, CISC-y machine instructions. (type LowLevelInst (enum (Add (mode AddrMode)) (Load (offset i32) (addr Reg)) (Const (c i32)))) ;; Different kinds of addressing modes for operands to our low-level machine ;; instructions. (type AddrMode (enum ;; Both operands in registers. (RegReg (a Reg) (b Reg)) ;; The destination/first operand is a register; the second operand is in ;; memory at `[b + offset]`. (RegMem (a Reg) (b Reg) (offset i32)) ;; The destination/first operand is a register, second operand is an ;; immediate. (RegImm (a Reg) (imm i32)))) ;; The register type is a Rust `Copy` type. (type Reg (primitive Reg)) ``` Now we can start writing some basic lowering rules! We declare the top-level lowering function (a "constructor term" in ISLE terminology) and attach rules to it. The simplest case is matching a high-level `Const` instruction and lowering that to a low-level `Const` instruction, since there isn't any translation we really have to do. ```lisp ;; Declare our top-level lowering function. We will attach rules to this ;; declaration for lowering various patterns of `HighLevelInst` inputs. (decl lower (HighLevelInst) LowLevelInst) ;; Simple rule for lowering constants. (rule (lower (HighLevelInst.Const c)) (LowLevelInst.Const c)) ``` Each rule has the form `(rule )`. The left-hand side (LHS) is a *pattern* and the right-hand side (RHS) is an *expression*. When the LHS pattern matches the input, then we evaluate the RHS expression. The LHS pattern can bind variables from the input that are then available in the right-hand side. For example, in our `Const`-lowering rule, the variable `c` is bound from the LHS and then reused in the RHS. Now we can compile this code by running ```shell $ islec isle_examples/tutorial.isle ``` and we'll get the following output (ignoring any minor code generation changes in the future): ```rust // GENERATED BY ISLE. DO NOT EDIT! // // Generated automatically from the instruction-selection DSL code in: // - isle_examples/tutorial.isle // [Type and `Context` definitions removed for brevity...] // Generated as internal constructor for term lower. pub fn constructor_lower(ctx: &mut C, arg0: &HighLevelInst) -> Option { let pattern0_0 = arg0; if let &HighLevelInst::Const { c: pattern1_0 } = pattern0_0 { // Rule at isle_examples/tutorial.isle line 45. let expr0_0 = LowLevelInst::Const { c: pattern1_0, }; return Some(expr0_0); } return None; } ``` There are a few things to notice about this generated Rust code: * The `lower` constructor term becomes the `constructor_lower` function in the generated code. * The function returns a value of type `Option` and returns `None` when it doesn't know how to lower an input `HighLevelInst`. This is useful for incrementally porting hand-written lowering code to ISLE. * There is a helpful comment documenting where in the ISLE source code a rule was defined. The goal is to make ISLE more transparent and less magical. * The code is parameterized by a type that implements a `Context` trait. Implementing this trait is how you glue the generated code into your compiler. Right now this is an empty trait; more on `Context` later. * Lastly, and most importantly, this generated Rust code is basically what we would have written by hand to do the same thing, other than things like variable names. It checks if the input is a `Const`, and if so, translates it into a `LowLevelInst::Const`. Okay, one rule isn't very impressive, but in order to start writing more rules we need to be able to put the result of a lowered instruction into a `Reg`. This might internally have to do arbitrary things like update use counts or anything else that Cranelift's existing `LowerCtx::put_input_in_reg` does for different target architectures. To allow for plugging in this kind of arbitrary logic, ISLE supports *external constructors*. These end up as methods of the `Context` trait in the generated Rust code, and you can implement them however you want with custom Rust code. Here is how we declare an external helper to put a value into a register: ```lisp ;; Declare an external constructor that puts a high-level `Value` into a ;; low-level `Reg`. (decl put_in_reg (Value) Reg) (extern constructor put_in_reg put_in_reg) ``` If we rerun `islec` on our ISLE source, instead of an empty `Context` trait, now we will get this trait definition: ```rust pub trait Context { fn put_in_reg(&mut self, arg0: Value) -> (Reg,); } ``` With the `put_in_reg` helper available, we can define rules for lowering loads and adds: ```lisp ;; Simple rule for lowering adds. (rule (lower (HighLevelInst.Add a b)) (LowLevelInst.Add (AddrMode.RegReg (put_in_reg a) (put_in_reg b)))) ;; Simple rule for lowering loads. (rule (lower (HighLevelInst.Load addr)) (LowLevelInst.Load 0 (put_in_reg addr))) ``` If we compile our ISLE source into Rust code once again, the generated code for `lower` now looks like this: ```rust // Generated as internal constructor for term lower. pub fn constructor_lower(ctx: &mut C, arg0: &HighLevelInst) -> Option { let pattern0_0 = arg0; match pattern0_0 { &HighLevelInst::Const { c: pattern1_0 } => { // Rule at isle_examples/tutorial.isle line 45. let expr0_0 = LowLevelInst::Const { c: pattern1_0, }; return Some(expr0_0); } &HighLevelInst::Load { addr: pattern1_0 } => { // Rule at isle_examples/tutorial.isle line 59. let expr0_0: i32 = 0; let expr1_0 = C::put_in_reg(ctx, pattern1_0); let expr2_0 = LowLevelInst::Load { offset: expr0_0, addr: expr1_0, }; return Some(expr2_0); } &HighLevelInst::Add { a: pattern1_0, b: pattern1_1 } => { // Rule at isle_examples/tutorial.isle line 54. let expr0_0 = C::put_in_reg(ctx, pattern1_0); let expr1_0 = C::put_in_reg(ctx, pattern1_1); let expr2_0 = AddrMode::RegReg { a: expr0_0, b: expr1_0, }; let expr3_0 = LowLevelInst::Add { mode: expr2_0, }; return Some(expr3_0); } _ => {} } return None; } ``` As you can see, each of our rules was collapsed into a single, efficient `match` expression. Just like we would have otherwise written by hand. And wherever we need to get a high-level operand as a low-level register, there is a call to the `Context::put_in_reg` trait method, allowing us to hook whatever arbitrary logic we need to when putting a value into a register when we implement the `Context` trait. Things start to get more interesting when we want to do things like sink a load into the add's addressing mode. This is only desirable when our add is the only use of the loaded value. Furthermore, it is only valid to do when there isn't any store that might write to the same address we are loading from in between the load and the add. Otherwise, moving the load across the store could result in a miscompilation where we load the wrong value to add: ```text x = load addr store 42 -> addr y = add x, 1 ==/==> store 42 -> addr x = load addr y = add x, 1 ``` We can encode these kinds of preconditions in an *external extractor*. An extractor is like our regular constructor functions, but it is used inside LHS patterns, rather than RHS expressions, and its arguments and results flipped around: instead of taking arguments and producing results, it takes a result and (fallibly) produces the arguments. This allows us to write custom preconditions for matching code. Let's make this more clear with a concrete example. Here is the declaration of an external extractor to match on the high-level instruction that defined a given operand `Value`, along with a new rule to sink loads into adds: ```lisp ;; Declare an external extractor for extracting the instruction that defined a ;; given operand value. (decl inst_result (HighLevelInst) Value) (extern extractor inst_result inst_result) ;; Rule to sink loads into adds. (rule (lower (HighLevelInst.Add a (inst_result (HighLevelInst.Load addr)))) (LowLevelInst.Add (AddrMode.RegMem (put_in_reg a) (put_in_reg addr) 0))) ``` Note that the operand `Value` passed into this extractor might be a basic block parameter, in which case there is no such instruction. Or there might be a store or function call instruction in between the current instruction and the instruction that defines the given operand value, in which case we want to "hide" the instruction so that we don't illegally sink loads into adds they shouldn't be sunk into. So this extractor might fail to return an instruction for a given operand `Value`. If we recompile our ISLE source into Rust code once again, we see a new `inst_result` method defined on our `Context` trait, we notice that its arguments and returns are flipped around from the `decl` in the ISLE source because it is an extractor, and finally that it returns an `Option` because it isn't guaranteed that we can extract a defining instruction for the given operand `Value`: ```rust pub trait Context { fn put_in_reg(&mut self, arg0: Value) -> (Reg,); fn inst_result(&mut self, arg0: Value) -> Option<(HighLevelInst,)>; } ``` And if we look at the generated code for our `lower` function, there is a new, nested case for sinking loads into adds that uses the `Context::inst_result` trait method to see if our new rule can be applied: ```rust // Generated as internal constructor for term lower. pub fn constructor_lower(ctx: &mut C, arg0: &HighLevelInst) -> Option { let pattern0_0 = arg0; match pattern0_0 { &HighLevelInst::Const { c: pattern1_0 } => { // [...] } &HighLevelInst::Load { addr: pattern1_0 } => { // [...] } &HighLevelInst::Add { a: pattern1_0, b: pattern1_1 } => { if let Some((pattern2_0,)) = C::inst_result(ctx, pattern1_1) { if let &HighLevelInst::Load { addr: pattern3_0 } = &pattern2_0 { // Rule at isle_examples/tutorial.isle line 68. let expr0_0 = C::put_in_reg(ctx, pattern1_0); let expr1_0 = C::put_in_reg(ctx, pattern3_0); let expr2_0: i32 = 0; let expr3_0 = AddrMode::RegMem { a: expr0_0, b: expr1_0, offset: expr2_0, }; let expr4_0 = LowLevelInst::Add { mode: expr3_0, }; return Some(expr4_0); } } // Rule at isle_examples/tutorial.isle line 54. let expr0_0 = C::put_in_reg(ctx, pattern1_0); let expr1_0 = C::put_in_reg(ctx, pattern1_1); let expr2_0 = AddrMode::RegReg { a: expr0_0, b: expr1_0, }; let expr3_0 = LowLevelInst::Add { mode: expr2_0, }; return Some(expr3_0); } _ => {} } return None; } ``` Once again, this is pretty much the code you would have otherwise written by hand to sink the load into the add. At this point we can start defining a whole bunch of even-more-complicated lowering rules that do things like take advantage of folding static offsets into loads into adds: ```lisp ;; Rule to sink a load of a base address with a static offset into a single add. (rule (lower (HighLevelInst.Add a (inst_result (HighLevelInst.Load (inst_result (HighLevelInst.Add base (inst_result (HighLevelInst.Const offset)))))))) (LowLevelInst.Add (AddrMode.RegMem (put_in_reg a) (put_in_reg base) offset))) ;; Rule for sinking an immediate into an add. (rule (lower (HighLevelInst.Add a (inst_result (HighLevelInst.Const c)))) (LowLevelInst.Add (AddrMode.RegImm (put_in_reg a) c))) ;; Rule for lowering loads of a base address with a static offset. (rule (lower (HighLevelInst.Load (inst_result (HighLevelInst.Add base (inst_result (HighLevelInst.Const offset)))))) (LowLevelInst.Load offset (put_in_reg base))) ``` I'm not going to show the generated Rust code for these new rules here because it is starting to get a bit too big. But you can compile `isle_examples/tutorial.isle` and verify yourself that it generates the code you expect it to. In conclusion, adding new lowering rules is easy with ISLE. And you still get that efficient, compact tree of `match` expressions in the generated Rust code that you would otherwise write by hand. ## Implementation This is an overview of `islec`'s passes and data structures: ```text +------------------+ | ISLE Source Text | +------------------+ | | Lex V +--------+ | Tokens | +--------+ | | Parse V +----------------------+ | Abstract Syntax Tree | +----------------------+ | | Semantic Analysis V +----------------------------+ | Term and Type Environments | +----------------------------+ | | Trie Construction V +-----------+ | Term Trie | +-----------+ | | Code Generation V +------------------+ | Rust Source Code | +------------------+ ``` ### Lexing Lexing breaks up the input ISLE source text into a stream of tokens. Our lexer is pull-based, meaning that we don't eagerly construct the full stream of tokens. Instead, we wait until the next token is requested, at which point we lazily lex it. Relevant source files: * `isle/src/` ### Parsing Parsing translates the stream of tokens into an abstract syntax tree (AST). Our parser is a simple, hand-written, recursive-descent parser. Relevant source files: * `isle/src/` * `isle/src/` ### Semantic Analysis Semantic analysis performs type checking, figures out which rules apply to which terms, etc. It creates a type environment and a term environment that we can use to get information about our terms throughout the rest of the pipeline. Relevant source files: * `isle/src/` ### Trie Construction The trie construction phase linearizes each rule's LHS pattern and inserts them into a trie that maps LHS patterns to RHS expressions. This trie is the skeleton of the decision tree that will be emitted during code generation. Relevant source files: * `isle/src/` * `isle/src/` ### Code Generation Code generation takes in the term trie and emits Rust source code that implements it. Relevant source files: * `isle/src/`
Commit count: 13025

cargo fmt