# frawk Builtin Functions and Commands

This document lists all of the builtin functions and commands supported by
frawk. For those interested in a source of truth on these components, check out
the "builtins" module in
[`src/builtins.rs`](https://github.com/ezrosent/frawk/blob/master/src/builtins.rs).

Unlike Awk, builtin functions must have parentheses directly following the
function name. Awk supports C-style syntax like `length (s)`, but only with
builtin functions: user-defined functions must still be called like `foo(x)`. In
frawk, builtin and user-defined functions are called with the same syntax: with
no spaces allowed.

## Operators

_Binary operators:_
* Arithmetic: `+`, `-`, `/`, `*`, `*`, `^` (which is exponentiation), and `%`
* Comparison (which also work on strings): `<`, `>`, `<=`, `>=`, `==`, `!=`.

_Unary Operators:_

* `$x`: Get column `x`.
* `+`, `-`: Unary "positive" and negation.
* `!`: logical negation.

## Math

* Floating-point operations: `sin`, `cos`, `atan`, `atan2`, `log`, `log2`,
  `log10`, `sqrt`, `exp` are delegated to the Rust standard library, or LLVM
  intrinsics where available.
* `rand()`: Returns a uniform random floating-point number between 0 and 1.
* `srand(x)`: Seeds the random number generator used by `rand`, returns the old
  seed.
* Bitwise operations. All of these operations coerce their operands to integers
  before being evaluated.
  * `compl(x)`: Bitwise complement.
  * `and(x, y)`: Bitwise and.
  * `or(x, y)`: Bitwise or.
  * `xor(x, y)`: Bitwise xor.
  * `lshift(x, y)`: Shift `x` left by `y` bits.
  * `rshift(x, y)`: Arithmetic right shift of `x` by `y` bits.
  * `rshiftl(x, y)`: Logical right shift of `x` by `y` bits.

## String Operations

* `s ~ re`: 1 if string `s` matches regular expression in `re`.
* `s !~ re`: Equivalent to negating the result of `s ~ re`.
* `match(s, re)`: 1 if string `s` matches the regular expression in `re`. If `s`
  matches, the `RSTART` variable is set with the start of the leftmost match of
  `re`, and `RLENGTH` is set with the length of this match.
* `substr(s, i[, j])`: The 1-indexed substring of string `s` starting from index `i`
  and continuing for the next `j` characters or until the end of `s` if `i+j`
  exceeds the length of `s` or if `s` is not provided.
* `sub(re, t, s)`: Substitutes `t` for the first matching occurrence of regular
  expression `re` in the string `s`.
* `gsub(re, t, s)`: Like `sub`, but with all occurrences substituted, not just
  the first.
* `index(haystack, needle)`: The first index within `haystack` in which the
  string `needle` occurs, 0 if `needle` does not appear.
* `split(s, m[, fs])`: Splits the string `s` according to `fs`, placing the
  results in the array `m`. If `fs` is not specified then the `FS` variable is
  used to split `s`.
* `sprintf(fmt, s, ...)`: Returns a string formatted according to `fmt` and
  provided arguments. The goal is to provide the semantics of the libc `sprintf`
  function.
* `print(s, ...) [>[>] out]`: Print the arguments `s` separated by `OFS`. If `>>
  out` is provided then the output is appended to the file `out`, if `> out` is
  provided then any data in `out` is overwritten. Parentheses are optional in
  `print`, but parsing of non-parenthesized arguments proceeds differently to
  avoid potential ambiguities.
* `printf(fmt, s, ...) [>[>] out]`: Like `sprintf` but the result of the
  operation is written to standard output, or to `out` according to the append
  or overwrite semantics specified by `>` or `>>`. Like `print`, `printf` can be
  called without parentheses around its arguments, though arguments are parsed
  differently in this mode to avoid ambiguities.
* `hex(s)`: Returns the hexadecimal integer (e.g. `0x123abc`) encoded in `s`, or
  `0` otherwise.
* `join_fields(i, j[, sep])`: Returns columns `i` through `j` (1-indexed,
  inclusive) concatenated together, joined by `sep`, or by `OFS` if `sep` is not
  provided.
* `escape_csv(s)`: Returns `s` escaped as a CSV column, adding quotes if
  necessary, replacing quotes with double-quotes, and escaping other whitespace.
* `escape_tsv(s)`: Returns `s` escaped as a TSV column. There is less to do with
  CSV, but tab and newline characters are replaced with `\t` and `\n`.
* `join_csv(i, j)`: Like `join_fields` but with columns joined by `,` and
  escaped using `escape_csv`.
* `join_tsv(i, j)`: Like `join_fields` but with columns joined by tabs and
  escaped using `escape_tsv`.
* `int(s)`: Convert `s` to an integer. Floating-point numbers are also converted
  (rounded down), potentially without a round-trip through a string
  representation.
* `tolower(s)`: Returns a copy of `s` where all uppercase ASCII characters are
  replaced with their lowercase counterparts; other characters are unchanged.
* `toupper(s)`: Returns a copy of `s` where all lowercase ASCII characters are
  replaced with their uppercase counterparts; other characters are unchanged.
* `exit [code]`: Exits the current process with the given code. `exit` attempts
  to flush any open file buffers. For parallel scripts, other worker threads
  have inputs cut off. Once those threads exit their main loop the process
  exits with the given exit code. This means that scripts with long loop
  iterations may not exit immediately. `exit` can be called with and without
  parentheses.

# Other Functions

* `close(s)` flushes all pending output to file `s` and then closes it.
* `length(x)` returns the length of `x`, where `x` can be either a string or an
  array.
* `system(s)` runs the command contained in the string `s` in a subshell,
  returning the error code, or the integer `1` if an error code was
  unavailable. The string `s` is subject to taint analysis by default.