Crates.io | jaq-core |
lib.rs | jaq-core |
version | 2.0.0-delta |
source | src |
created_at | 2021-05-19 11:50:17.713798 |
updated_at | 2024-11-08 10:21:42.926686 |
description | Interpreter for the jaq language |
homepage | |
repository | https://github.com/01mf02/jaq |
max_upload_size | |
id | 399529 |
size | 187,930 |
jaq (pronounced /ʒaːk/, like Jacques1) is a clone of the JSON data processing tool jq. jaq aims to support a large subset of jq's syntax and operations.
You can try jaq online on the jaq playground. Instructions for the playground can be found here.
jaq focuses on three goals:
I drew inspiration from another Rust program, namely jql. However, unlike jql, jaq aims to closely imitate jq's syntax and semantics. This should allow users proficient in jq to easily use jaq.
You can download binaries for Linux, Mac, and Windows on the releases page.
You may also install jaq using homebrew on macOS or Linux:
$ brew install jaq
$ brew install --HEAD jaq # latest development version
Or using scoop on Windows:
$ scoop install main/jaq
To compile jaq, you need a Rust toolchain. See https://rustup.rs/ for instructions. (Note that Rust compilers shipped with Linux distributions may be too outdated to compile jaq.)
Any of the following commands install jaq:
$ cargo install --locked jaq
$ cargo install --locked --git https://github.com/01mf02/jaq # latest development version
On my system, both commands place the executable at ~/.cargo/bin/jaq
.
If you have cloned this repository, you can also build jaq by executing one of the commands in the cloned repository:
$ cargo build --release # places binary into target/release/jaq
$ cargo install --locked --path jaq # installs binary
jaq should work on any system supported by Rust. If it does not, please file an issue.
The following examples should give an impression of what jaq can currently do. You should obtain the same outputs by replacing jaq with jq. If not, your filing an issue would be appreciated. :) The syntax is documented in the jq manual.
Access a field:
$ echo '{"a": 1, "b": 2}' | jaq '.a'
1
Add values:
$ echo '{"a": 1, "b": 2}' | jaq 'add'
3
Construct an array from an object in two ways and show that they are equal:
$ echo '{"a": 1, "b": 2}' | jaq '[.a, .b] == [.[]]'
true
Apply a filter to all elements of an array and filter the results:
$ echo '[0, 1, 2, 3]' | jaq 'map(.*2) | [.[] | select(. < 5)]'
[0, 2, 4]
Read (slurp) input values into an array and get the average of its elements:
$ echo '1 2 3 4' | jaq -s 'add / length'
2.5
Repeatedly apply a filter to itself and output the intermediate results:
$ echo '0' | jaq '[recurse(.+1; . < 3)]'
[0, 1, 2]
Lazily fold over inputs and output intermediate results:
$ seq 1000 | jaq -n 'foreach inputs as $x (0; . + $x)'
1 3 6 10 15 [...]
The following evaluation consists of several benchmarks that
allow comparing the performance of jaq, jq, and gojq.
The empty
benchmark runs n
times the filter empty
with null input,
serving to measure the startup time.
The bf-fib
benchmark runs a Brainfuck interpreter written in jq,
interpreting a Brainfuck script that produces n
Fibonacci numbers.
The other benchmarks evaluate various filters with n
as input;
see bench.sh
for details.
I generated the benchmark data with
bench.sh target/release/jaq jq-1.7 gojq-0.12.13 jq-1.6 | tee bench.json
on a Linux system with an AMD Ryzen 5 5500U.2
I then processed the results with a "one-liner" (stretching the term and the line a bit):
jq -rs '.[] | "|`\(.name)`|\(.n)|" + ([.time[] | min | (.*1000|round)? // "N/A"] | min as $total_min | map(if . == $total_min then "**\(.)**" else "\(.)" end) | join("|"))' bench.json
(Of course, you can also use jaq here instead of jq.)
Finally, I concatenated the table header with the output and piped it through pandoc -t gfm
.
Table: Evaluation results in milliseconds ("N/A" if error or more than 10 seconds).
Benchmark | n | jaq-2.0 | jq-1.7.1 | gojq-0.12.16 | jq-1.6 |
---|---|---|---|---|---|
empty |
512 | 280 | 420 | 250 | 8270 |
bf-fib |
13 | 510 | 1230 | 560 | 1450 |
defs |
100000 | 40 | N/A | 1030 | N/A |
reverse |
1048576 | 80 | 690 | 270 | 640 |
sort |
1048576 | 150 | 550 | 580 | 680 |
group-by |
1048576 | 540 | 1900 | 1530 | 2820 |
min-max |
1048576 | 250 | 320 | 260 | 340 |
add |
1048576 | 520 | 640 | 1290 | 740 |
kv |
131072 | 130 | 150 | 220 | 200 |
kv-update |
131072 | 150 | 530 | 470 | N/A |
kv-entries |
131072 | 600 | 1170 | 720 | 1120 |
ex-implode |
1048576 | 530 | 1110 | 590 | 1090 |
reduce |
1048576 | 810 | 900 | N/A | 850 |
try-catch |
1048576 | 330 | 320 | 370 | 670 |
tree-contains |
23 | 70 | 600 | 210 | 1770 |
tree-flatten |
17 | 870 | 360 | 10 | 490 |
tree-update |
17 | 700 | 970 | 1370 | 1190 |
tree-paths |
17 | 450 | 280 | 880 | 480 |
to-fromjson |
65536 | 40 | 360 | 100 | 390 |
ack |
7 | 540 | 710 | 1270 | 620 |
range-prop |
128 | 380 | 320 | 240 | 590 |
cumsum |
1048576 | 320 | 380 | 450 | 360 |
cumsum-xy |
1048576 | 490 | 470 | 720 | 520 |
The results show that
jaq-2.0 is fastest on 17 benchmarks, whereas
jq-1.7.1 is fastest on 3 benchmarks and
gojq-0.12.16 is fastest on 3 benchmarks.
gojq is much faster on tree-flatten
because it implements the filter flatten
natively instead of by definition.
Here is an overview that summarises:
Contributions to extend jaq are highly welcome.
Identity (.
)
Recursion (..
)
Basic data types (null, boolean, number, string, array, object)
if-then-else (if .a < .b then .a else .b end
)
Folding (reduce .[] as $x (0; . + $x)
, foreach .[] as $x (0; . + $x; . + .)
)
Error handling (try ... catch ...
)
Breaking (label $x | f | ., break $x
)
String interpolation ("The successor of \(.) is \(.+1)."
)
Format strings (@json
, @text
, @csv
, @tsv
, @html
, @sh
, @base64
, @base64d
)
Indexing of arrays/objects (.[0]
, .a
, .["a"]
)
Iterating over arrays/objects (.[]
)
Optional indexing/iteration (.a?
, .[]?
)
Array slices (.[3:7]
, .[0:-1]
)
String slices
Composition (|
)
Variable binding (. as $x | $x
)
Pattern binding (. as {a: [$x, {("b", "c"): $y, $z}]} | $x, $y, $z
)
Concatenation (,
)
Plain assignment (=
)
Update assignment (|=
)
Arithmetic update assignment (+=
, -=
, ...)
Alternation (//
)
Logic (or
, and
)
Equality and comparison (.a == .b
, .a < .b
)
Arithmetic (+
, -
, *
, /
, %
)
Negation (-
)
Error suppression (?
)
Basic definitions (def map(f): [.[] | f];
)
Recursive definitions (def r: r; r
)
empty
)error
)inputs
)length
, utf8bytelength
)floor
, round
, ceil
)fromjson
, tojson
)explode
, implode
)ascii_downcase
, ascii_upcase
)startswith
, endswith
, ltrimstr
, rtrimstr
)trim
, ltrim
, rtrim
)split("foo")
)reverse
, sort
, sort_by(-.)
, group_by
, min_by
, max_by
)first
, last
, range
, fold
)range
, recurse
)now
, fromdateiso8601
, todateiso8601
)sqrt
, sin
, log
, pow
, ...) (list of numeric filters)strptime
, strftime
, strflocaltime
, mktime
, gmtime
, and localtime
)These filters are defined via more basic filters.
Their definitions are at std.jq
.
null
)true
, false
, not
)nan
, infinite
, isnan
, isinfinite
, isfinite
, isnormal
)type
)select(. >= 0)
)values
, nulls
, booleans
, numbers
, strings
, arrays
, objects
, iterables
, scalars
)tostring
, tonumber
)map(.+1)
, map_values(.+1)
, add
, join("a")
)transpose
, first
, last
, nth(10)
, flatten
, min
, max
)to_entries
, from_entries
, with_entries
)all
, any
)walk
)input
)test
, scan
, match
, capture
, splits
, sub
, gsub
)fromdate
, todate
)jaq imports many filters from libm and follows their type signature.
Zero-argument filters:
acos
acosh
asin
asinh
atan
atanh
cbrt
cos
cosh
erf
erfc
exp
exp10
exp2
expm1
fabs
frexp
, which returns pairs of (float, integer).gamma
ilogb
, which returns integers.j0
j1
lgamma
log
log10
log1p
log2
logb
modf
, which returns pairs of (float, float).nearbyint
pow10
rint
significand
sin
sinh
sqrt
tan
tanh
tgamma
trunc
y0
y1
Two-argument filters that ignore .
:
atan2
copysign
drem
fdim
fmax
fmin
fmod
hypot
jn
, which takes an integer as first argument.ldexp
, which takes an integer as second argument.nextafter
nexttoward
pow
remainder
scalb
scalbln
, which takes as integer as second argument.yn
, which takes an integer as first argument.Three-argument filters that ignore .
:
fma
include "path";
import "path" as mod;
import "path" as $data;
jaq currently does not aim to support several features of jq, such as:
SQL-style operators
Streaming
jq uses 64-bit floating-point numbers (floats) for any number. By contrast, jaq interprets numbers such as 0 or -42 as machine-sized integers and numbers such as 0.0 or 3e8 as 64-bit floats. Many operations in jaq, such as array indexing, check whether the passed numbers are indeed integer. The motivation behind this is to avoid rounding errors that may silently lead to wrong results. For example:
$ jq -n '[0, 1, 2] | .[1.0000000000000001]'
1
$ jaq -n '[0, 1, 2] | .[1.0000000000000001]'
Error: cannot use 1.0 as integer
$ jaq -n '[0, 1, 2] | .[1]'
1
The rules of jaq are:
Examples:
$ jaq -n '1 + 2'
3
$ jaq -n '10 / 2'
5.0
$ jaq -n '1.0 + 2'
3.0
You can convert an integer to a floating-point number e.g.
by adding 0.0, by multiplying with 1.0, or by dividing with 1.
You can convert a floating-point number to an integer by
round
, floor
, or ceil
:
$ jaq -n '1.2 | [floor, round, ceil]'
[1, 1, 2]
In jq, division by 0 yields an error, whereas
In jaq, n / 0
yields nan
if n == 0
, infinite
if n > 0
, and -infinite
if n < 0
.
jaq's behaviour is closer to the IEEE standard for floating-point arithmetic (IEEE 754).
jaq implements a total ordering on floating-point numbers to allow sorting values.
Therefore, it unfortunately has to enforce that nan == nan
.
(jq gets around this by enforcing that nan < nan
is true, yet nan > nan
is false,
which breaks basic laws about total orders.)
Like jq, jaq prints nan
and infinite
as null
in JSON,
because JSON does not support encoding these values as numbers.
Like jq, jaq allows for assignments of the form p |= f
.
However, jaq interprets these assignments differently.
Fortunately, in most cases, the result is the same.
In jq, an assignment p |= f
first constructs paths to all values that match p
.
Only then, it applies the filter f
to these values.
In jaq, an assignment p |= f
applies f
immediately to any value matching p
.
Unlike in jq, assignment does not explicitly construct paths.
jaq's implementation of assignment likely yields higher performance,
because it does not construct paths.
Furthermore, this allows jaq to use multiple outputs of the right-hand side, whereas
jq uses only the first.
For example, 0 | (., .) |= (., .+1)
yields 0 1 1 2
in jaq,
whereas it yields only 0
in jq.
However, {a: 1} | .a |= (2, 3)
yields {"a": 2}
in both jaq and jq,
because an object can only associate a single value with any given key,
so we cannot use multiple outputs in a meaningful way here.
Because jaq does not construct paths,
it does not allow some filters on the left-hand side of assignments,
for example first
, last
, limit
:
For example, [1, 2, 3] | first(.[]) |= .-1
yields [0, 2, 3]
in jq, but is invalid in jaq.
Similarly, [1, 2, 3] | limit(2; .[]) |= .-1
yields [0, 1, 3]
in jq, but is invalid in jaq.
(Inconsequentially, jq also does not allow for last
.)
Like jq, jaq allows for the definition of filters, such as:
def map(f): [.[] | f];
Arguments can also be passed by value, such as:
def cartesian($f; $g): [$f, $g];
Filter definitions can be nested and recursive, i.e. refer to themselves.
That is, a filter such as recurse
can be defined in jaq:
def recurse(f): def r: ., (f | r); r;
Since jaq 1.2, jaq optimises tail calls, like jq. Since jaq 1.1, recursive filters can also have non-variable arguments, like in jq. For example:
def f(a): a, f(1+a);
Recursive filters with non-variable arguments can yield surprising effects;
for example, a call f(0)
builds up calls of the shape f(1+(..(1+0)...))
,
which leads to exponential execution times.
Recursive filters with non-variable arguments can very frequently be alternatively implemented by either:
def walk(f): (.[]? |= walk(f)) | f;
, you can use
def walk(f): def rec: (.[]? |= rec) | f; rec;
.def f(a): a, f(1+a);
, you can equally well write
def f($a): $a, f(1+$a);
.recurse
: for example, you may write
def f(a): a | recurse(1+.);
.
If you expect your filter to recurse deeply,
it is advised to implement it using recurse
,
because jaq has an optimised implementation of recurse
.All of these options are supported by jaq.
Like jq, jaq allows to define arguments via the command line,
in particular by the options --arg
, --rawfile
, --slurpfile
.
This binds variables to values, and
for every variable $x
bound to v
this way,
$ARGS.named
contains an entry with key x
and value v
.
For example:
$ jaq -n --arg x 1 --arg y 2 '$x, $y, $ARGS.named'
"1"
"2"
{
"x": "1",
"y": "2"
}
jq and jaq provide filters
reduce xs as $x (init; update)
,
foreach xs as $x (init; update)
, and
foreach xs as $x (init; update; project)
, where
foreach xs as $x (init; update)
is equivalent to
foreach xs as $x (init; update; .)
.
In jaq, the output of these filters is defined very simply:
Assuming that xs
evaluates to x0
, x1
, ..., xn
,
reduce xs as $x (init; update)
evaluates to
init
| x0 as $x | update
| ...
| xn as $x | update
and foreach xs as $x (init; update; project)
evaluates to
init |
( x0 as $x | update | project,
( ...
( xn as $x | update | project,
( empty )...)
The interpretation of reduce
/foreach
in jaq has the following advantages over jq:
It deals very naturally with filters that yield multiple outputs.
In contrast, jq discriminates outputs of f
,
because it recurses only on the last of them,
although it outputs all of them.
foreach (5, 10) as $x (1; .+$x, -.)
yields
6, -1, 9, 1
in jq, whereas it yields
6, 16, -6, -1, 9, 1
in jaq.
We can see that both jq and jaq yield the values 6
and -1
resulting from the first iteration (where $x
is 5), namely
1 | 5 as $x | (.+$x, -.)
.
However, jq performs the second iteration (where $x
is 10)
only on the last value returned from the first iteration, namely -1
,
yielding the values 9
and 1
resulting from
-1 | 10 as $x | (.+$x, -.)
.
jaq yields these values too, but it also performs the second iteration
on all other values returned from the first iteration, namely 6
,
yielding the values 16
and -6
that result from
6 | 10 as $x | (.+$x, -.)
.
It makes the implementation of reduce
and foreach
special cases of the same code, reducing the potential for bugs.
Slurping: When files are slurped in (via the -s
/ --slurp
option),
jq combines the inputs of all files into one single array, whereas
jaq yields an array for every file.
This is motivated by the -i
/ --in-place
option,
which could not work with the behaviour implemented by jq.
The behaviour of jq can be approximated in jaq;
for example, to achieve the output of
jq -s . a b
, you may use
jaq -s . <(cat a b)
.
Cartesian products:
In jq, [(1,2) * (3,4)]
yields [3, 6, 4, 8]
, whereas
[{a: (1,2), b: (3,4)} | .a * .b]
yields [3, 4, 6, 8]
.
jaq yields [3, 4, 6, 8]
in both cases.
Indexing null
:
In jq, when given null
input, .["a"]
and .[0]
yield null
, but .[]
yields an error.
jaq yields an error in all cases to prevent accidental indexing of null
values.
To obtain the same behaviour in jq and jaq, you can use
.["a"]? // null
or .[0]? // null
instead.
List updating:
In jq, [0, 1] | .[3] = 3
yields [0, 1, null, 3]
; that is,
jq fills up the list with null
s if we update beyond its size.
In contrast, jaq fails with an out-of-bounds error in such a case.
Input reading:
When there is no more input value left,
in jq, input
yields an error, whereas in jaq, it yields no output value.
Joining:
When given an array [x0, x1, ..., xn]
,
in jq, join(x)
converts all elements of the input array to strings and intersperses them with x
, whereas
in jaq, join(x)
simply calculates x0 + x + x1 + x + ... + xn
.
When all elements of the input array and x
are strings, jq and jaq yield the same output.
Modules:
If the -L
command-line option is not given, the search path for modules and data files
in jq is ["~/.jq", "$ORIGIN/../lib/jq", "$ORIGIN/../lib"]
, whereas
in jaq, it is []
.
However, this can be emulated in jaq by setting an alias such as
alias jaq="jaq -L ~ -L \
which jaq`/../lib/jq -L `which jaq`/../lib"`.
Contributions to jaq are welcome.
Please make sure that after your change, cargo test
runs successfully.
This project was funded through the NGI0 Entrust Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101069594.
jaq has also profited from:
I wanted to create a tool that should be discreet and obliging, like a good waiter. And when I think of a typical name for a (French) waiter, to my mind comes "Jacques". Later, I found out about the old French word jacquet, meaning "squirrel", which makes for a nice ex post inspiration for the name. ↩
The binaries for jq-1.7.1 and gojq-0.12.16 were retrieved from their GitHub release pages, the binary for jq-1.6 was installed from the standard Ubuntu repository. ↩