Made with ♡ by Isaac Clayton and the Passerine Community – a cornerstone of the Veritable Computation Initiative.
## Why Passerine? [Passerine](https://www.passerine.io) is a small, concise, extensible functional scripting language, powered by a VM† written in [Rust](https://www.rust-lang.org). Here's a small taste: Passerine has roots in Scheme and ML-flavored languages — it's the culmination of everything I expect from a programming language, including the desire to keep everything as minimalistic yet concise as possible. At its core, Passerine is lambda-calculus with pattern-matching, structural types, fiber-based concurrency, and syntactic extension. > † It's a bytecode VM with a few optimizations, so I'd say it's fast enough to be useful. ### Who started this? This is first project of The Veritable Computation Initiative. Our goal is to improve the tools developers use to write software. We're planning release a site with more information about Veritable soon. Passerine is currently being developed by [Isaac Clayton](https://github.com/slightknack), a high-school student with too much free time on his hands. A few people have offered feedback and suggestions from time to time. Huge thanks to [Raul](https://github.com/spaceface777), [Hamza](https://github.com/hhhapz), [Rishi](htps://github.com/rishiosaur), [Lúcás](https://github.com/cronokirby), [Anton](https://github.com/jesyspa/), [Yasser](https://github.com/realnegate), [Shaw](https://github.com/shawsumma)†, [Plecra](https://github.com/plecra), [IFcoltransG](https://github.com/IFcoltransG), [Jack](https://github.com/nivpgir), [Keith](https://github.com/Kethku)‡, Xal, and others! > † Shaw is writing an [alternative implementation of Passerine](https://github.com/ShawSumma/purr/tree/main/ext/passerine), and it's *super* fast. It's part of a wider effort of his to develop [an efficient language-agnostic VM](https://github.com/ShawSumma/purr). > ‡ Keith is currently [sponsoring](https://www.patreon.com/slightknack) the development of Passerine — I'm deeply grateful for his support! ## An Overview Where this overview gets really exciting is when we dive into [macros](#macros). If you're here to give Passerine a try, [skip to Installation](#installation). > **⚠️ Note that Passerine is a *work in progress*: features mentioned in this overview may not be implemented yet.** ### Syntax The goal of Passerine's syntax is to make all expressions as *concise* as possible while still conserving the 'feel' of *different types* of expressions. We'll start simple; here's a function that squares a number: ```elm square = x -> x * x square 4 -- is 16 ``` > By the way: Passerine uses `-- comment` for single-line comments and `-{ comment }-` for nestable multi-line comments. There are already some important things we can learn about Passerine from this short example: Like other programming languages, Passerine uses `=` for assignment. On the left hand side is a *pattern* – in this case, just the variable `square` – which destructures an expression into a set of bindings. On the right hand side is an *expression*; in this case the expression is a *function definition*. > Because Passerine is *expression-oriented*, the distinction between statements and expressions isn't made. In the case that an expression produces no useful value, it should return the Unit type, `()`. Assignment, for instance, returns Unit. The function call `square 4` may look a bit alien to you; this is because Passerine uses whitespace for function calls. A function call takes the form `l e₀ ... eₙ`, where `l` is a function and `e` is an expression. `square 4` is a simple case because `square` only takes one argument, `x`... let's try writing a function that takes two arguments! Using our definition of `square`, here's a function that returns the Euclidean distance between two points: ```elm distance = (x1, y1) (x2, y2) -> { sqrt (square (x1 - x2) + square (y1 - y2)) } origin = (0, 0) distance origin (3, 4) -- is 5 ``` Passerine is an *expression-oriented* language, because of this, it makes sense that *all functions are anonymous*. All functions take the form `p₀ ... pₙ -> e`, where `p` is a pattern and `e` is an expression. If a function takes multiple arguments, they can be written one after another; `a b c -> d` is equivalent to `a -> (b -> (c -> d))`. The function `distance` is a bit more complex that `square`, because the two arguments are bound using *tuple destructuring*. As you can see, `distance` takes two pairs, `(x1, y1)` and `(x2, y2)`, and *destructures* each pair into its component parts. For instance, when we call `distance` in `distance origin (3, 4)` the function pulls out the numbers that make up the pair: - `origin`, i.e. `(0, 0)`, is matched against `(x1, y1)`, creating the bindings `x1 = 0` and `y1 = 0`. - the tuple `(3, 4)` is matched against `(x2, y2)`, creating the bindings `x2 = 3` and `y2 = 4`. The body of `distance`, `sqrt (...)` is then evaluated in a new scope where the variables defined about are bound. In the case of the above example: ```elm -- call and bind distance origin (3, 4) distance (0, 0) (3, 4) -- substitute and evaluate sqrt (square (0 - 3) + square (0 - 4)) sqrt (9 + 5) sqrt 25 5 ``` Now, you may have noticed that `distance` is actually two functions. It may be more obvious it we remove some syntactic sugar rewrite it like so: ```elm distance = (x1, y1) -> { (x2, y2) -> { ... } } ``` The first function binds the first argument, then returns a new function that binds the second argument, which evaluates to a value. This is known as *currying*, and can be really useful when writing functional code. > To leverage currying, function calls are *left-associative*. The call `a b c d` is equivalent to `((a b) c) d`, not `a (b (c d))`. This syntax comes from functional languages like Haskell and OCaml, and makes currying (partial application) quite intuitive. In the above example, we used `distance` to measure how far away `(3, 4)` was from the origin. Coincidentally, this is known as the *length* of a vector. Wouldn't it be nice if we could define length in terms of distance? ```elm length = distance origin length (5, 12) -- is 13 ``` Because distance is curried, we can call it with only one of its arguments. For this reason, `length` is a function that takes a pair and returns its distance from the `origin`. In essence, we can read the above definition of `length` as: > `length` is the `distance` from `origin`. Transforming data through the use of and functions and pattern matching is a central paradigm of Passerine. In the following sections, we'll dive deep and show how this small core language is enough to build a powerful and flexible language. #### A Quick(-sort) Example Here's another slightly more complex example – a recursive quick-sort with questionable pivot selection: ```elm sort = list -> match list { -- base case [] -> [] -- pivot is head, tail is remaining [pivot, ..tail] -> { higher = filter { x -> x >= pivot } tail lower = filter { x -> x < pivot } tail (sorted_lower, sorted_higher) = (sort lower, sort higher) sorted_lower + [pivot] + sorted_higher } } ``` The first thing that you should notice is the use of a `match` expression. Like ML-style languages, Passerine makes extensive use of *pattern matching* and *destructuring* as a driver of computation. A match expression takes the form: ```elm match value { pattern₀ -> expression₀ ... patternₙ -> expressionₙ } ``` Each `pattern -> expression` pair is a *branch* – each `value` is against each branch in order until a branch successfully matches and evaluates – the match expression takes on the value of the matching branch. We'll take a deep dive into match statements [later](#building-a-match-expression), so keep this in mind. You might've also noticed the use of curly braces `{ ... }` after `[head, ..tail]`. This is a *block*, a group of expressions executed one after another. Each expression in a block is separated by a newline or semicolon; the block takes on the value of its last expression. The next thing to notice is this line: ```elm (sorted_lower, sorted_higher) = (sort lower, sort higher) ``` This is a more complex assignment than the first one we saw. In this example, the pattern `(sorted_lower, sorted_higher)` is being matched against the expression `(sort lower, sort higher)`. This pattern is a *tuple* destructuring, if you've ever used Python or Rust, I'm sure you're familiar with it. This assignment is equivalent to: ```elm sorted_lower = sort lower sorted_higher = sort higher ``` There's no real reason to use tuple destructuring here – idiomatically, just using `sort lower` and `sort higher` is the better solution. We discuss pattern matching in depth in the [next section](#pattern-matching). Passerine also supports higher order functions (this should come as no surprise): ```elm filter { x -> x >= pivot } tail ``` `filter` takes a predicate (a function) and an iterable (like a list), and produces a new iterable where the predicate is true for all items. Although parenthesis could be used to group the inline function definition after `filter`, it's stylistically more coherent to use blocks for *regions of computation*. What's a region of computation? A region of computation is a series of multiple expressions, or a single expression that creates new bindings, like an assignment or a function definition. Passerine also allows lines to be split around operators to break up long expressions: ```elm sorted_lower + [pivot] + sorted_higher ``` Although this is not a particularly long expression, splitting up lines by operations can help improve the legibility of some expressions. #### Function Application Before we move on, here's a clever implementation of FizzBuzz in Passerine: ```elm fizzbuzz = n -> { test = d s x -> if n % d == 0 { _ -> s + x "" } else { x } fizz = test 3 "Fizz" buzz = test 5 "Buzz" "{n}" . fizz (buzz (i -> i)) } 1..100 . fizzbuzz . print ``` `.` is the function application operator: ```elm -- the composition a . b c . d -- is left-associative (a . (b c)) . d -- and equivalent to d ((b c) a) ``` ### Pattern Matching In the last section, we touched on pattern matching a little. I hope to now go one step further and build a strong argument as to why pattern matching in Passerine is such a powerful construct. Patterns are used in in three places: 1. Assignments, 2. Functions, 3. and Type Definitions. We'll briefly discuss each type of pattern and the context in which they are used. #### What are patterns? *Patterns* extract *data* from *types* by mirroring the structure of those types. The act of applying a pattern to a type is called *matching* or *destructuring* – when a pattern matches some data successfully, a number of *bindings* are produced. Passerine supports algebraic data types, and all of these types can be matched and destructured against. Here is a table of Passerine's patterns: > In the following table, `p` is a nested sub-pattern. | pattern | example | destructures | | -------- | ----------------- | ------------ | | variable | `foo` | Terminal pattern, binds an variable to a value. | | data | `420.69` | Terminal pattern, data that *must* match, raises an error otherwise. See the following section on fibers and concurrency to get an idea of how errors work in Passerine. | | discard | `_` | Terminal pattern, matches any data, does not produce a binding. | | label | `Baz p` | Matches a label, i.e. a named *type* in Passerine. | | tuple | `(p₀, ...)` | Matches each element of a tuple, which is a group of elements, of potentially different types. Unit `()` is the empty tuple. | | list | `[]`/`[p₀, ..p₁]` | `[]` Matches an empty list - `p₀` matches the head of a list, `..p₁` matches the tail. | record | `{f₀: p₀, ...}` | A record, i.e. a struct. This is a series of field-pattern pairs. If a field does not exist in the target record, an error is raised. | | enum | `{p₀; ...}` | An enumeration, i.e. a union. Matches if any of the patterns hold. | | is | `p₀: p₁` | A type annotation. Matches against `p₀` only if `p₁` holds, errors otherwise. | | where | `p \| e` | A bit different from the other patterns so far. Matches `p` only if the expression `e` is true. | That's quite a lot of information, so let's work through it. The simplest case is a standard assignment: ```elm a = b -- produces the binding a = b ``` This is very straightforward and we've already covered this, so let's begin by discussing matching against *data*. The following function will return the second argument if the first argument passed to the function is `true`. ```elm true second -> second ``` If the first argument is not true, say `false`, Passerine will yell at us: ``` Fatal Traceback, most recent call last: In src/main.pn:1:1 | 1 | (true second -> second) false "Bananas!" | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | In src/main.pn:1:2 | 1 | (true second -> second) false "Bananas!" | ^^^^ | Runtime Pattern Matching Error: The data 'false' does not match the expected data 'true' ``` *Discard* is another simple pattern – it does nothing. It's most useful when used in conjunction with other patterns: ```elm -- to ensure an fruit has the type Banana: banana: Banana _ = mystery_fruit -- to ignore an item in a tuple: (_, plays_tennis, height_in_feet) = ("Juan Milan", true, 27.5) -- to ignore a field on a record: { name: "Isaac Clayton", age: _, skill } = isaac ``` A *label* is a name given to a type. Of course, names do not imply type safety, but they do do a darn good job most of the time: ```elm -- make a soft yellow banana: banana = Banana ("yellow", "soft") -- check that the banana flesh is soft: if { Banana (_, flesh) = banana; flesh == "soft" } { print "Delicioso!" } ``` Pattern matching on labels is used to *extract* the raw data that is used to construct that label. *Tuples* are fairly simple – we already covered them – so we'll cover records next. A record is a set of fields: ```elm -- define the Person type type Person { name: String, age: Natural, skill, -- polymorphic over skill } -- Make a person. It's me! isaac = Person { name: "Isaac Clayton", age: 16, skill: Wizard "High enough ¯\_(ツ)_/¯", } ``` Here's how you can pattern match on `isaac`: ```elm Person { -- aliasing field `name` as `full_name` name: full_name, -- `age` is ignored age: _, -- short for `skill: skill` skill, } = isaac ``` Of course, the pattern after the field is a full pattern, and can be matched against further. *Is* is a type annotation: ``` Banana (color, _): Banana (_, "soft") = fruit ``` In this example, `color` will be bound if `fruit` is a `Banana` whose 1nd† tuple item is `"soft"`. > † Read as 'firnd', corresponds to the 1-indexed *second* item. Zerost, firnd, secord, thirth, fourth, fifth... Finally, we'll address my favorite pattern, *where*. Where allows for arbitrary code check the validity of a pattern. This can go a long way. For example, let's define natural numbers in terms of integers: ```elm type Natural n: Integer | n >= 0 ``` This should be interpreted as: > The type `Natural` is an `Integer` `n` where `n` is greater than `0`. With this definition in place: ```elm -- this is valid seven = Natural 7 -- this is not negative_six = Natural -6 ``` Where clauses in patterns ensure that the underlying data of a type can never break an invariant. As you can imagine, this is more powerful than ensuring name-safety through type constructors. > TODO: traits and impls. Pattern matching and algebraic data types allow for quickly building up and tearing down expressive data schemas. As data (and the transformation applied to it) are the core of any program, constructs for quickly building up and tearing down complex datatypes are an incredible tool for scripting expressive applications. ### Fibers How does passerine handle errors? What about concurrency? What do prime sieves, exceptions, and for-loops have in common? If you guessed concurrency, you won a bajillion points! Structured concurrency is a difficult problem to tackle, given how pervasive it is in the field language design. It's important to point out that concurrency is *not* the same thing as parallelism. Concurrent systems may be parallel, but that's not always the case. Passerine subscribes to the coroutine model of structured concurrency – more succinctly, Passerine uses *fibers* – as exemplified by [Wren](https://wren.io). A fiber is a lightweight process of execution that is cooperatively scheduled with other fibers. Each fiber is like a little system unto itself that can pass messages to other fibers. > TODO: Algebraic Effects? #### Error handling Passerine uses a combination of *exceptions* and algebraic data types to handle errors. Errors that are expected to happen should be wrapped in a `Result` type: ```elm validate_length = n -> { if length n < 5 { Result.Error "It's too short!" } else { Result.Ok n } } ``` Some errors, however, are unexpected. There are an uncountable number of ways software can fail; to account for all external circumstances is flat-out impossible in some cases. For this reason, in the case that something that isn't expected to fail fails, an exception is raised. For example, trying to open a file that *should* exist may throw an error if it has been lost, moved, corrupted, or otherwise has incorrect permissions. ```elm config = Config.parse (open "config.toml") ``` > `.` is the indexing operator. On a list or tuple, `items.0` returns the zerost item; on a record, `record.field` returns the specified field; **on a label, `Label.method` returns the specified associated method**. The reason we don't always need to handle these errors is because Passerine follows a fail-fast, fail-safe philosophy. In this regard, Passerine subscribes to the philosophy of Erlang/Elixir: > "Keep calm and let it crash." The good news is that crashes are local to the fiber they occur in – a single fiber crashing does *not* bring down the whole system. The idiomatic way to handle an operation that we know may fail is to try it. `try` performs the operation in a new fiber and converts any exceptions that may occur into a `Result`: ```elm config = match try (open "config.toml") { Result.Ok file -> Config.parse file Result.Error error -> Config.default () } ``` We know that some functions may raise errors, but how can *we* signal that something exceptionally bad has happened? We use the `error` keyword! ```elm doof = platypus -> { if platypus == "Perry" { -- crashes the current fiber error "What!? Perry the platypus!?" } else { -- oh, it's just a platypus... work_on_inator () } } -- oh no! inator = doof "Perry" ``` Note that the value of raised errors can be any data. This allows for programmatic recovery from errors: ```elm -- socket must not be disconnected send_data = ( socket data ) -> match socket.connection { -- `Disconnected` is a labeled record Option.None -> error Disconnected { socket, message: "Could not send data; disconnected", } -- if the connection is open, we send the data Option.Some connection -> connection.write data } ``` Let's say the above code tries to send some data through a socket. To handle a disconnection, we can try the error: ```elm ping = socket -> try (send_data socket "ping") socket = Socket.new "isaac@passerine.io:42069" -- whatever socket.disconnect () -- oh no! result = ping socket match result { Result.Ok "pong" -> () Result.Error Disconnected socket -> socked.connect } ``` Why make the distinction between expected errors (`Result`) and unexpected errors (fiber crashes)? Programs only produce valid results if the environments they run in are valid. When a fiber crashes, it's signaling that something about the environment it's running in is not valid. This is very useful to *developers* during development, and very useful to *programs* in contexts where complex long-running applications may fail for any number of reasons. Why not only use exceptions then? Because it's perfectly possible for an error to occur that is not exceptional at all. Malformed input, incorrect permissions, missing items – these are all things that can occur and do occur on a regular basis. It's always important to use the right tool for the job; prefer expected errors over unexpected errors. #### Concurrency Fibers are for more than just isolating the context of errors. As mentioned earlier: > A fiber is a lightweight process of execution that is cooperatively scheduled with other fibers. Each fiber is like a little system unto itself that can pass messages to other fibers. Passerine leverages fibers to handle errors, but fibers are full *coroutines*. To create a fiber, use the fiber keyword: ```elm counter = fiber { i = 0 loop { yield i; i = i + 1 } } print counter () -> prints 0 print counter () -> prints 1 print counter () -> ... ``` The `yield` keyword suspends the current fiber and returns a value to the calling fiber. `yield` can also be used to pass data *into* a fiber. ```elm passing_yield = fiber { print "hello" result = yield "banana" print result "yes" } passing_yield "first" -- prints "hello" then evaluates to "banana" passing_yield "second" -- prints "second" then evaluates to "yes" passing_yield "uh oh" -- raises an error, fiber is done ``` To build more complex systems, you can build fibers with functions: ```elm -- a function that returns a fiber flopper = this that -> fiber { loop { yield this yield that } } apple_banana = flopper "Apple" "Banana" apple_banana () -- evaluates to "Apple" apple_banana () -- evaluates to "Banana" apple_banana () -- evaluates to "Apple" apple_banana () -- ... you get the point ``` Of course, the possibilities are endless. There's one last thing I'd like to discuss before we start talking about macros. Fibers, while usually being ran in the context of another, all act as peers to each-other. If you have a reference to a fiber, it's possible to transfer to it forgetting the context in which it was called. To switch to a fiber, use `switch`. ```elm banana_and_end = fiber { print "banana ending!" } print "the beginning..." switch banana_and_end print "the end." ``` `"the end."` is never displayed. > 'Tis not the end, 'tis but the beginning... 'tis hackin' time! ### Macros Passerine has a rich hygienic* syntactic macro system that extends the language itself. *Syntactic macros*, quite simply, are bits of code that *hygienically* produce more code when invoked at compile time. Macros use a small, simple-yet-powerful set of rules to transform code. > \* Having read Doug Hoyte's exellent [Let Over Lambda](https://letoverlambda.com/), I understand the raw power of a rich *unhygenic* macro system. However, such systems are hard to comprehend, and harder to master. Passerine aims to be as simple and powerful as possible without losing *transparency*: hygienic macro systems are much more transparent then their opaque unhygenic counterparts. #### Hygiene Extensions are defined with the `syntax` keyword, followed by some *argument patterns*, followed by the code that the captured arguments will be spliced into. Here's a simple example: we're using a macro to define `swap` operator: ```elm syntax a 'swap b { tmp = a a = b b = tmp } x = 7 y = 3 x swap y ``` Note that the above code is completely hygienic. the expanded macro looks something like this: ```elm _tmp = x x = y x = _tmp ``` Because `tmp` was not passed as a macro pattern parameter, all uses of `tmp` in the macro body are unique unrepresentable variables that do not collide with any other variables currently bound in scope. Essentially: ```elm tmp = 1 x = 2 y = 3 x swap y ``` Will not affect the value of `tmp`; `tmp` will still be `1`. #### Argument Patterns So, what is an argument pattern (an *arg-pat*)? Arg-pats are what go between: ```elm syntax ... { } ``` Each item between `syntax` and the macro body is an arg-pat. Arg-pats can be: - *Syntactic variables*, like `foo` and `bar`. - Literal *syntactic identifiers*, which are prefixed with a quote (`'`), like `'let`. - Nested argument patterns, followed by optional *modifiers*. Let's start with *syntactic identifiers*. Identifiers are literal names that must be present for the pattern to match. Each syntactic extension is required to have at least one. For example, here's a macro that matches a *for loop*: ```elm syntax 'for binding 'in values do { ... } ``` In this case, `'for` and `'in` are syntactic identifiers. This definition could be used as follows: ```elm for a in [1, 2, 3] { print a } ``` *Syntactic variables* are the other identifiers in the pattern that are bound to actual values. In the above example, `a` → `binding`, `[1, 2, 3]` → `values`, and `{ print a }` → `do`. Macros can also be used to define operators†: ```elm syntax sequence 'contains value { c = seq -> match seq { [] -> False [head, ..] | head == value -> True [_, ..tail] -> c tail } c sequence } ``` This defines a `contains` operator that could be used as follows: ```elm print { if [1, 2, 3] contains 2 { "It contains 2" } else { "It does not contain 2" } } ``` Evidently, `It contains 2` would be printed. > † Custom operators defined in this manner will always have the lowest precedence, and must be explicitly grouped when ambiguous. For this reason, Passerine already has a number of built-in operators (with proper precedence) which can be overloaded. It's important to note that macros serve to introduce new constructs that just *happen* to be composable – syntactic macros can be used to make custom operators, but they can be used for *so much more*. I think this is a fair trade-off to make. *Modifiers* are postfix symbols that allow for flexibility within argument patterns. Here are some modifiers: - Zero or more (`...`) - Optional (`?`) Additionally, parenthesis are used for grouping, and `{ ... }` are used to match expressions within blocks. Let's construct some syntactic arguments that match an `if else` statement, like this one: ```elm if x == 0 { print "Zero" } else if x % 2 == 0 { print "Even" } else { print "Not even or zero" } ``` The arg-pat must match a beginning `if`: ```elm syntax 'if { ... } ``` Then, a condition: ```elm syntax 'if condition { ... } ``` Then, the first block: ```elm syntax 'if condition then { ... } ``` Next, we'll need a number of `else if