# Vector Remap Language: Guiding Design Principles This document describes the high-level goals and directions of the Vector Remap Language (VRL). It is intended to help guide discussions and decisions made during the development of VRL itself. The document is less relevant to _users_ of VRL, the [language documentation][docs] exists for that audience. [docs]: https://vrl.dev ## Table of Contents * [The Zen of VRL](#the-zen-of-vrl) * [Why VRL](#why-vrl) * [Target Audience](#target-audience) * [Language Limits](#language-limits) * [Conventions](#conventions) * [Performance](#performance) * [Copying Data](#copying-data) * [Program Optimizations](#program-optimizations) * [Diagnostics](#diagnostics) * [Syntax](#syntax) * [Fallibility](#fallibility) * [Fallible Expressions](#fallible-expressions) * [Type Checking](#type-checking) * [Progressive Type Checking](#progressive-type-checking) * [Errors](#errors) * [Functions](#functions) * [Composition](#composition) * [Naming](#naming) * [Return Types](#return-types) * [Mutability](#mutability) * [Fallibility](#fallibility-1) * [Signatures](#signatures) * [Patterns](#patterns) * [Error Chaining](#error-chaining) ## The Zen of VRL * **Safety and performance over ease of use.** VRL programs facilitate data processing for mission critical production systems. They are run millions of times per day for large Vector users and usually written once. Therefore, safety and performance are prioritized over developer ease of use. * **The best VRL program is the one that most clearly expresses its output.** VRL is an expression-oriented DSL designed to express data transformations. It is not a programming language. Users should not sacrifice the clarity of their data transformation for things like performance. The best VRL program is the one that most clearly describes the intended output. There should be no "tricks" to making VRL fast that take away from readability. ## Why VRL VRL exists to solve the need for a _safe_ and _performant_ **domain specific language** (DSL) to remap observability data. Its purpose is to hit a sweet spot between Turing complete languages such as Lua, and static transforms such as Vector's old `rename_fields`. It should be flexible enough to cover most remapping use-cases, without requiring the flexibility and downsides of a full programming language, such as poor readability and performance. See the [introduction blog post][blog] for more details on the **why**. [blog]: https://vector.dev/blog/vector-remap-language/#preamble ## Target Audience VRL has a specialized purpose of solving observability data remapping needs. Because of this purpose, the language is mostly used by people who manage infrastructure at their organisations. The role of this group is usually referred to as "operator", "devops" (Development and Operations) or "SRE" (Site Reliability Engineer). One common generalization of this group is that they are focused on maintaining infrastructure within an organization, and often write and maintain their own software to achieve this goal. They are adept enough at programming to achieve their goals, but have no need or desire to be as skilled in programming as dedicated software engineers, because their time is best spent elsewhere. As with everything, there are exceptions to the rule, and many people in this group _are_ highly skilled software engineers, but VRL must capture the largest possible segment of this group, and should therefore be limited in complexity. Therefore, when extending the feature set of VRL, design **the feature for the intended target audience**, which will likely mean choosing different trade-offs than you'd make if you were to design the feature for your personal needs. ## Language Limits There are a number of features that we've so far rejected to implement in the language. This might change in the future, but there should be a good reason to do so. * modules (see: [#5507][]) So far, we've had no need to split up functions over multiple modules. The function naming rules make it so that most functions are already grouped by their logical usage patterns. * classes Given that VRL is a simple DSL, and that any indirection in a program's source can lead to confusion, we've decided against introducing the concept of classes, and instead focused on the usage of function calls to solve operator needs. * user-defined functions User-defined functions again produce indirection. While it _might_ be useful to some extremely large use-cases, in most cases, allowing people to read a program from top to bottom without having to jump around is more clear in the context within which VRL is used. * network calls (see [#4517][] and [#8717][]) In order to avoid performance foot guns, we want to ensure each function is as performant as it can be, and there's no way to use functions in such a way that performance of a VRL program tanks. We might introduce network calls at some point, if we find a good caching solution to solve most of our concerns, but so far we've avoided any network calls inside our stdlib. * assignable closures (see [#9001]) While we _do_ support closures, they are tied to function calls, and cannot be used elsewhere. This also means closures cannot be assigned to variables, and re-used between function-calls. This decision was made because it can lead to poor-performing code (to the extend of introducing infinite loops), and makes code less clear to reason about. While this is undoubtedly a powerful feature to have, the cons do not outweigh the pro's. [#5507]: https://github.com/vectordotdev/vector/issues/5507 [#4517]: https://github.com/vectordotdev/vector/issues/4517#issuecomment-754160338 [#8717]: https://github.com/vectordotdev/vector/pull/8717 [#9001]: https://github.com/vectordotdev/vector/pull/9001#discussion_r701830595 ## Conventions ### Performance The performance of VRL is a corner-stone of the language. However, given the target audience, and the goal to make the language as simple and straight-forward as possible, there are always trade-offs to be made when considering the performance implications of a feature or design decision. #### Copying Data For example, VRL is an expression-oriented language, and we favor returning a new copy of a piece of manipulated data over mutating the underlying data. That is, you write this: ```coffee # explicitly assign the parsed JSON to `.message` .message = parse_json(.message) ``` Instead of this: ```coffee # mutate `.message` in place parse_json(.message) ``` While this results in less performance, it makes VRL programs easier to reason about, which _in this particular case_ weigh more heavily than the performance implications. #### Program Optimizations We plan on introducing an "optimization step" to the compiler in the future, that would allow us to rewrite parts of a program into a more optimized variant, without having the operator having to worry about these transformations. This means that if there's a reasonable path forward towards optimizing certain language constructs internally, we should not burden the operator with using/applying to optimizations manually. For example, we favor composition over single-purpose functions, and while each individual function call adds extra performance overhead, the thought is that in the future, we can optimize multiple function calls into a single call (inlining), without having to expose this optimization technique to operators. ### Diagnostics Diagnostic messages shown by the compiler during compilation are one of the biggest tools we expose to operators to help them write correct programs. Adding diagnostic messages to new and existing features in VRL should never be an afterthought. ### Syntax Keep VRL as syntax-light as possible. The less symbols, the more readable a VRL program is. Use functions whenever possible, and only introduce new syntax if a common pattern warrants a more convenient syntax-based solution, or functions are too limited in their capability to solve the required need. ### Fallibility A VRL program must be **infallible by default** once the compiler generates a program from the provided source. This means that any expression that can result in an error (adding a string and an object, dividing by zero, calling a fallible function) must be explicitly handled before the compiler accepts the input source. The fallibility system is an important part of the language and its goals, and is also the part that most often trips people up. This chapter tries to shed some light on its inner workings. #### Fallible Expressions When the VRL compiler compiles a source to a valid program, it queries each expression whether the expression itself can fail at runtime or not. If it can, the compiler refuses to compile the program, until the operator handles the failure case. For example: ```coffee . = parse_json(.message) ``` The above program can fail at runtime, because there's no guarantee the `message` field contains a JSON-encoded string. The operator needs to handle the failure case using one of the available [failure-handling features][fail] in VRL. #### Type Checking In addition to expressions being fallible, the type checker also considers a program fallible if the type expected by an expression cannot be guaranteed at compile-time. For example: ```coffee .message = "log message: " + .log ``` In this case, the `log` field type cannot be determined at compile-time, and thus concatenating the field value with a string might fail (e.g. if it's an array, or any other type that cannot be combined with a string). This too needs to be handled at compile-time by the operator. #### Progressive Type Checking A quirk in the type checker is that arguments passed to functions are _not_ type checked individually. Instead, the function call itself can be marked fallible if any of its arguments do not adhere to the expected type. For example: ```coffee upcase(.message) ``` Upcasing a string is an infallible operation, however, because we can't guarantee that the `message` field will actually be a string, the function call is still invalid, as the function is marked as fallible. This decision [was made][#6507] for ergonomics purposes. [fail]: https://vrl.dev/errors/#runtime-errors [#6507]: https://github.com/vectordotdev/vector/issues/6507 ### Errors * All errors are caught at compile-time by the compiler (see "fallibility" chapter). * The only exception to this rule is if the operator explicitly allows a function to fail the program at runtime (e.g. `safe_call()` vs `unsafe_call!()`). * A function should be marked as "fallible" if its internal implementation can fail. * A function _should not_ be marked as fallible if it receives the wrong argument type. This is handled by the compiler. * Errors should contain explicit messages detailing what went wrong, and how the operator can solve the problem. ### Functions #### Composition * Favor function composition over single-purpose functions. * If a problem needs to be solved in multiple steps, consider adding single-purpose functions for each individual step. * If usability or readability is hurt by composition, favor single-purpose functions. #### Naming * Function names are lower-cased (e.g. `round`). * Multi-name functions use underscore (`_`) as a separator (e.g. `encode_key_value`). * Favor explicit verbose names over terse ones (e.g. `parse_aws_cloudwatch_log_subscription_message` over `parse_aws_cwl_msg`). * Functions should be preceded with their function category for organization and discovery. * Use `parse_*` for functions that decode a string to another type of data (e.g. `parse_json` and `parse_grok`). * Use `decode_*` for string to string decoding functions (e.g. `decode_base64`). * Use `encode_*` for string encoding functions (e.g. `encode_base64`). * Use `to_*` to convert from one type to another (e.g. `to_bool`). * Use `is_*` to determine if the provided value is of a given type (e.g. `is_string` or `is_json`). * Use `format_*` for string formatting functions (e.g. `format_timestamp` and `format_number`). * Use `get_*` for functions that return a single result, or error if zero or more results are found. * Use `find_*` when multiple possible results are returned in an array. #### Return Types * Return boolean from `is_*` functions (e.g. `is_string`). * Return a string from `encode_*` functions (e.g. `encode_base64`). * Return an error whenever the function can fail at runtime. #### Mutability As a general rule, functions never mutate values. Favor this: ```coffee # explicitly assign the parsed JSON to `.message` .message = parse_json(.message) ``` Over this: ```coffee # mutate `.message` in place parse_json(.message) ``` There are exceptions to this rule (such as the `del` function), but they are limited, and additional exceptions should be well reasoned. #### Fallibility * A function must be marked as fallible if it can fail in any way. * A function should be designed with the goal of making it infallible. * A function must not hide fallibility for the sake of pursuing the previous rule. * A function implementation may assume it receives the argument type it has defined. * `parse_*` functions should almost always error when used incorrectly. * `get_*` functions should fail when it can't find a single result to return. * `to_*` functions must never fail. #### Signatures * Functions can have zero or more parameters (e.g. `now()` and `assert(true)`). * Argument naming follows the same conventions as function naming. * For one or more parameters, the first parameter must be the "value" the function is acting on (e.g. `parse_regex(value: , pattern: )`). * The first parameter must therefore almost always be named `value`. * The exception to this is when you're dealing with an actual VRL path (e.g. `del(path)`) or in special cases such as `assert`. ## Patterns The following is a list of patterns we've experienced while writing and using VRL. This section of the document is intended to be updated frequently with new insights. These insights are meant to guide future design decisions, and shape our thinking of the language as it matures and we learn more about its strengths and weaknesses from our users. ### Error Chaining Observability data can be structured in unexpected ways. A common pattern is to try to decode the data in one way, only to try a different decoder if the first one failed. This pattern uses [function calls][] and [error coalescing][] to achieve its goal: ```coffee data = parse_json(.message) ?? parse_nginx_log(.message) ?? parse_apache_log(.message) ?? { "error": "invalid data format" } ```