jtxt
A JavaScript syntax text processing tool, an awk alternative.
使用JavaScript语法进行文本处理的工具,来替换awk
[![Rust CI](https://github.com/sunwu51/jtxt/actions/workflows/rust.yml/badge.svg)](https://github.com/sunwu51/jtxt/actions/workflows/rust.yml)
```bash
$ jtxt 'var it = l.split("]")[2]; console.log(it)' gc.log
```
# Quick Start
Download binary from release or install with `cargo`
从release页下载或者直接用`cargo`安装
```bash
$ cargo install jtxt
```
Usage
```bash
$ jtxt -h
Processes lines of input with JavaScript
Usage: jtxt [OPTIONS] [file]
Arguments:
JavaScript code to process each line, l is the origin string
[file] Path to the input file. If not provided, reads from stdin.
Options:
-b, --begin JavaScript code to execute before processing any lines
-e, --end JavaScript code to execute after processing all lines
-h, --help Print help
-V, --version Print version
```
Example with [1.txt](./1.txt)
```bash
# 1 filter the lines that only contains numbers from 1.txt
# 1 从1.txt中过滤全是数字的行
$ jtxt 'if (l.match(/^\d+$/)) console.log(l)' 1.txt
200
404
200
# 2 compute the character number of 1.txt (exclude the [\r]\n)
# 2 计算1.txt中的字符个数(除了换行符[\r]\n)
$ jtxt -b 'var s=0' 's += l.length' -e 'console.log(s)' 1.txt
144
# 3 filter the lines that contains `Request`, and process the time text, then analyze the distribution of quantity per hour
# 3 过滤含有Request的行,并将时间部分进行处理,分析每个小时的请求数量分布
$ jtxt 'if(l.indexOf("Request")>=0) console.log(l)' 1.txt \
| jtxt -b 'var m={}' 'var k = "h" + new Date(l.substring(0,19)).getHours(); if(!m[k]) m[k]=0; m[k]++' -e 'console.log(m)' 1.txt
{
"h13": 2,
"h14": 1
}
```
Well, to simplify the command, you can also use the preset variable: `var ctx = {}, n1=0, n2=0, n3=0, s='', arr=[];`
你也可以用内置的变量来简化指令,这是几个内置变量和他们的默认值`var ctx = {}, n1=0, n2=0, n3=0, s='', arr=[];`
# Performance
The performance is about 4 times slower in `simple scenarios` compared to awk, because of using `deno_core` to interpret code. For example, count the character. But in the other hand, you get the JavaScript support.
简单场景下,例如字符计数,性能比`awk`差了4倍,这是因为使用了`deno_core`来解释运行,但反过来讲,你获得了JavaScript语法的支持。
![img](https://i.imgur.com/DydkAan.png)
`gc.log` is a jvm gc log 32M size.
![img](https://i.imgur.com/1HoVOLX.png)
In `complex scenarios`, the performance gap will narrow, and even reverse. For example, compute the gc count and gc time in every second, and finally return the top 1 item (the longest).
复杂场景下,性能差距会缩小甚至反超。例如计算每秒的gc总数量和gc总时间,最终返回最长的那一秒
```bash
$ jtxt 'var re=l.match(/(\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d).*Real=([0-9\.]+)/);
if (re) {
if (!ctx[re[1]]) ctx[re[1]] = 0;
ctx[re[1]]+=parseFloat(re[2])
}' -e 'for (var k in ctx) {
if (ctx[k]>n1) { n1 = ctx[k]; n2 = k}
}
console.log(`最大时间: ${n2}, 最大值: ${n1}`)' gc.log
$ awk '{ if (match($0, /([0-9]{4}-[0-9]{2}-[0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2}).*Real=([0-9\.]+)/, arr)) {
time = arr[1] "T" arr[2];
value = arr[3];
sum[time] += value;}
} END {
max_value = 0;
max_time = "";
for (t in sum) {
if (sum[t] > max_value) {
max_value = sum[t];
max_time = t;
}
}
print "最大值时间: " max_time ", 最大值: " max_value;
}' gc.log
```
![image](https://i.imgur.com/14sChtK.png)