1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
//! # lasm, a minimal and portable assembly langauge
//!
//! The spirit of this crate is to make the most small and correct
//! assembly language as possible. A reduced instruction set is valued
//! above all else. If possible, speed is also an admirable trait.
//!
//! # purpose
//!
//! Writing a compiler is very **_very_** hard. A lot of that
//! difficulty comes from trying to manage memory and trying to represent
//! high level concepts in terms of low level instructions.
//!
//! So, with these problems in mind, I wrote this assembly language.
//!
//! # features
//!
//! The most high level feature is the infinite number of registers. This allows
//! the compiler to declare and use variables **significantly** easier.
//! The last time I wrote a compiler, the absolute hardest part was managing
//! when variables were allocated and freed. As a result, I wrote this assembly
//! language to take care of that!
//!
//! ### procedures
//!
//! Another high level feature is managing procedure declarations. When the
//! assembly is parsed, the procedures are each defined before they are checked
//! for semantic errors. So, procedures can be defined in any order.
//!
//! ### portability
//!
//! The final, and best feature is portability. lasm is _extremely_ compact:
//! the entire C implementation of lasm's instruction set is nearly 150 lines.
//! Writing an implementation for lasm is extremely simple, and so compiling to
//! lasm allows the compiler to target several different programming languages
//! and platforms.
//!
//! # basic instructions
//!
//! | Stack Instruction | Description |
//! |-------------------|-------------|
//! | `push LITERAL` | Push the LITERAL argument onto the stack. The LITERAL argument MUST be a character or float |
//! | `pop` | Pop a value off of the stack and into the ACC register |
//! | `ld REGISTER` | Push the value stored in REGISTER onto the stack. The REGISTER being loaded MUST be defined before being loaded |
//! | `st REGISTER` | Pop a value off of the stack into REGISTER. The REGISTER being stored to MUST be declared before being stored |
//! | `dup` | Duplicate the top item on the stack |
//!
//! | Pointer Instruction | Description |
//! |---------------------|-------------|
//! | `refer REGISTER` | Push a pointer to REGISTER onto the stack |
//! | `deref_ld` | Pop a pointer off of the stack, and push the value stored at where the pointer points. This will only push a single cell onto the stack, not more than one cell |
//! | `deref_st` | Pop a pointer and a cell off of the stack, and store the cell at the pointer |
//! | `alloc REGISTER` | Pop a SIZE value off of the stack, and store the address to SIZE free cells in REGISTER |
//! | `free REGISTER` | Pop a SIZE value off of the stack, and free the memory stored at the pointer stored in REGISTER |
//!
//!
//! | Math Instruction | Description |
//! |------------------|-------------|
//! | `add` | Pop two cells off of the stack, and push their sum |
//! | `sub` | Pop two cells off of the stack, and push the first minus the second |
//! | `div` | Pop two cells off of the stack, and push their product |
//! | `mul` | Pop two cells off of the stack, and push the first divided by the second |
//! | `cmp` | Pop two cells off of the stack, and push -1 if the first is less than the second, 0 if they are equal, and 1 otherwise |
//!
//! | IO Instruction | Description |
//! |----------------|-------------|
//! | `outc` | Pop a cell off of the stack and print it as a character |
//! | `outn` | Pop a cell off of the stack and print it as a float |
//! | `inc` | Get a character from STDIN and push it into the stack |
//! | `inn` | Get a float from STDIN and push it into the stack |
//!
//! | Control Instruction | Description |
//! |---------------------|-------------|
//! | `loop` | Marks the start of a loop. At the start of each iteration, a test value is popped from the stack. While the value is not zero, the loop continues. Else, the loop jumps to the matching `endloop` |
//! | `endloop` | Marks the end of a loop |
//!
//! # examples
//!
//! This assembly language is a bit simpler than most others because
//! portability and compactness are the two largest goals in mind.
//! As a result, examples are pretty simple.
//!
//! ## fibonacci
//! 
//! This simply implements fibonacci by doing arithmetic on three variables `a`, `b`, and `c`.
//! To simplify outputing the numbers, a few helper procedures are defined.
//! 
//! ```rust,ignore,no_run
//! // comments are C-style
//! // The `stack_size` flag can ONLY be used at the top of the file.
//! // Anywhere else, this flag will show up as a syntax error.
//! // The purpose of the flag is to set the size of memory used
//! // outside of the statically determined memory. Any loads,
//! // pushes, allocs, etc. require a bit of memory on the stack.
//!
//! // If this flag is not present, 256 cells are used by default.
//! stack_size 1024
//!
//! // The start procedure is the entry point
//! proc start
//!     // Declare the registers we will use
//!     define a, 1
//!     // Push 0 and store it in 'a'
//!     push 0 st a
//!     define b, 1
//!     // Push 1 and store it in 'b'
//!     push 1 st b
//!     define c, 1
//!     // Push 0 and store it in 'c'
//!     push 0 st c
//!
//!     // This will determine the number of times to iterate
//!     define n, 1
//!     // Push 10 and store it in 'n'
//!     push 10 st n
//!
//!     // loop while n is not zero
//!     ld n
//!     loop
//!         ld a st c // c = a
//!         ld b st a // a = b
//!         ld a call print_num // print a
//!         ld c ld b add st b // b = c + b
//!
//!         push 1
//!         ld n
//!         // subtract 1 from n
//!         sub
//!         // store the result in n again
//!         st n
//!
//!         // Load n again for the loop test
//!         ld n
//!     endloop
//! endproc
//!
//!
//! proc print_num
//!     // the define keyword takes two arguments,
//!     // the name of the register and the size of
//!     // the newly created register.
//!
//!     // This simply tells the assembler to allocate permanent
//!     // space for a register with a given size. It also tells
//!     // the assembler how many cells to pop off of the stack when
//!     // storing a value in this register.
//!     define n, 1
//!     
//!     // When we call print_num, we expect a single argument on the
//!     // stack. So, we store this argument in the register n for later
//!     // usage.
//!     st n
//!
//!     // Now we load the value stored in n back onto the stack
//!     // and print the value as a number
//!     ld n outn
//!
//!     // Now we print a newline using the newline procedure
//!     call nl
//! endproc
//!
//!
//! proc nl
//!     // 10 (the character code for '\n') is pushed onto the stack
//!     // and printed out as a character
//!     push 10 outc
//! endproc
//! ```
//!
//! # implementation
//! 
//! lasm's implementation is very simple: there are very few instructions to implement
//! when targeting a new programming language. Additionally, lasm's structure is very
//! simple to implement in low level languages.
//! 
//! There are a few **very** important notes for lasm's implementation
//! 1. lasm's memory is implemented using an array of double precision floats, or 64 bit floats
//! 2. lasm tracks allocs and frees for each individual cell of the memory array. This is most
//! simply done using an array of booleans with identical length to the data tape
//! 3. allocating more than the available amount of memory is undefined behavior (if possible, this should cause the program to exit)
//! 4. the implementation should _always_ mark memory reserved for registers as allocated (so that alloc may not return a pointer to register memory)
//! 5. memory reserved for registers always lies **immediately** before the stack
//! 6. the accumulator register always lies at address `0`
//! 7. the stack pointer register always lies at address `1`
//! 8. user defined registers lie between the stack pointer register and the stack
//! 9. the `inn` and `inc` instructions return `0` on `EOF` and on other input errors

#![no_std]
#[macro_use]
extern crate alloc;
#[macro_use]
extern crate lazy_static;

pub mod asm;
pub use asm::{Instruct, Register};
pub(crate) mod ast;
pub mod target;
pub use target::Target;
pub mod error;
pub use error::{Error, Result};
pub(crate) mod parser;
pub(crate) use parser::program;

use alloc::{string::String, vec::Vec, collections::BTreeMap};

/// assemble takes an assembly target, and the assembly code an object that implements display
/// 
/// This function will 
/// 1. parse the assembly code and convert it into an abstract syntax tree
/// 2. convert the abstract syntax tree into a list of executable assembly instructions
/// 3. transform the list of assembly instructions into output code using the assembly target
pub fn assemble(target: impl Target, asm_code: impl core::fmt::Display) -> Result<String> {
    use asm::{REGISTER_POINTER, NAMED_REGISTERS, PREDEFINED_REGISTERS};

    let (code, stack_size) = compile(asm_code)?;
    let initial_stack_ptr = *REGISTER_POINTER.lock();

    // Reset the global variables used for assembling an input file
    *REGISTER_POINTER.lock() = PREDEFINED_REGISTERS;
    *NAMED_REGISTERS.lock() = BTreeMap::new();
    
    // Assemble using the targets assembly method
    Ok(target.assemble(initial_stack_ptr, stack_size, code))
}


fn compile(s: impl core::fmt::Display) -> Result<(Vec<Instruct>, usize)> {
    let (ast, stack_size) = program(
        &comment::c::strip(s)
            .unwrap()
            .split_whitespace()
            .collect::<Vec<&str>>()
            .join(" "),
    )?;

    let code = ast.lower()?;

    let mut loop_count = 0;
    for instruct in &code {
        match instruct {
            Instruct::WhileNotZero => loop_count += 1,
            Instruct::EndWhile => loop_count -= 1,
            _ => {}
        }
    }

    if loop_count != 0 {
        Err(Error::UnmatchedLoop)
    } else {
        Ok((code, stack_size))
    }
}