# Ideas for compiler Assorted ideas ## Types It's not possible to define neat types of builtin commands like `times`, `...`, or `do!`. There are at least a few challenges: 1. The exact effect on a stack may be impossible to exactly represent. An easy example is a value coming from standard input, that influences an argument to one of these. The same goes for any IO. 2. Some words allow for side effects like dynamically defining new words. Again this is complicated if input is coming from standard input, or some other IO unknown until runtime. First, stack issues: While it's not possible to _know_ an exact stack effect, it appears trivial to me to know failure cases. `[pop pl] 3 times` is safe to perform when the immediately preceding type is a quote of size 3 or greater, has a side effect of printing 3 lines, and will result in a quote reduced on the right in size by three items. What I think this means is that `times` (and other higher-order procedures) can be represented as a **lazy type** that can be filled in either at compile time and completely reduced, or can be represented as a range of possibilities when an argument is unknown. Like Haskell, we can know and trace what is from IO as an unknown value. Anything else we can opportunistically treat as a Zig-like `comptime` value. I believe the latter should open up many compiler optimizations that may not be possible in other programming paradigms. We can also statically analyze a program's unknown values to know that, for example, the exact program `stdin ... [pl] 5 times` will succeed "perfectly" for 5 lines of standard input, succeed with "waste" for more lines, and result in stack underflow for less lines. Exactly how to represent exceptional code (Result types, try/catch with exceptions, halt and catch fire...) can be decided independently. _P.S. a scarier problem arises if stdin could be read as commands. Consider a program like `stdin pop string-to-commands do` that allows arbitrary execution. The language is turing complete, so such a `string-to-commands` could be envisioned -- it could also be intentionally be added to the language later. This suggests [no dynamic execution in compiled mode](#no-dynamic-execution-in-compiled-mode)._ Second, side effects: Types can know when they have side effects, and even propagate them. Again there should be a way to understand both potential side effects that are unknown-from-IO vs known-at-compile-time. TODO: this could be expanded. Lastly: I believe this distinction between unknown (IO) and known (comptime) can be modeled in a type system. The types can start out lazily (when implemented in a type system) and can be resolved to more specific types during interpretation or compilation. ## No dynamic execution in compiled mode? It will be useful to have a compiled mode that disallows the ability to execute or define unknown code. In fact, this should probably be the default "safe" mode of compilation. When interpreting (e.g. via `railsh`), this is worth a warning. Perhaps also a "safe" mode that completely halts when `do` receives arguments from an IO source. ## Represent all subprocedures as jump addresses GOTO will never die... but also we don't have to expose it. Calling a function in a compiled program shouldn't require a string equality lookup, and not even require a lookup table. (Thanks [Travis](https://github.com/travisbhartwell)) See also: https://en.wikipedia.org/wiki/Threaded_code#Indirect_threading Like an index map, I believe we can represent program state, including enough information to (for example) create a stack trace, independently from the flow of control. If we can know failure cases (by tracking unknown/IO input) then we can sparsely populate enough information for _only reachable error paths._ This may not be a huge savings for a "large" application, like a web service that talks to humans, other web services, and databases. However, for the embedded case it may be able to give a great balance between meaningful error logs and small program size. ## Memory handling I really enjoy the handled-memory of Rust. (Is that ok to say?) Thinking about garbage collection is hard, and very difficult to reason about in larger systems operating at scale. When the runtime behavior is consistent, it's much simpler to reason about. Secondly, a concatenative approach means we can statically know exactly how long a memory blob must last, and when it goes completely out of scope and is safe to free. Finally, if we know how long all (or most) memory blobs must last, we can also opportunistically determine the memory allocation strategies. So I think we can "handle" memory silently in a way that looks like garbage collection, but get away with potentially less book-keeping than something like C's (c|m)alloc and free. In addition we can divide up memory pages in whatever way makes most sense for the program under compilation, and consider this an area for heavy optimization. This also suggests [no dynamic execution in compiled mode](#no-dynamic-execution-in-compiled-mode)._ ## In-lining Ahead-of-time compiling and also Just-in-time compiling can choose to layout code how it likes. For consideration: https://ratfactor.com/meow5/ > It’s unique feature is its "inline all the things!" compiling method. A Meow5 function is the recursively concatenated machine code of every function it calls. A departure from Forth's pragmatic call/ret REPL approach