| Crates.io | aggrs |
| lib.rs | aggrs |
| version | 0.3.0 |
| created_at | 2026-01-11 15:20:01.595819+00 |
| updated_at | 2026-01-11 15:20:01.595819+00 |
| description | A fast command-line tool for building aggregation trees from JSON and CSV data |
| homepage | https://github.com/awgn/aggrs |
| repository | https://github.com/awgn/aggrs |
| max_upload_size | |
| id | 2035954 |
| size | 41,738 |
A fast, multi-threaded command-line tool for building hierarchical aggregation trees from JSON and CSV data. Perfect for analyzing log files, network traffic data, and any structured data where you need to understand the distribution of values across multiple dimensions.
git clone https://github.com/awgn/aggrs.git
cd aggrs
cargo build --release
The binary will be available at ./target/release/aggrs.
cargo install aggrs
aggrs [OPTIONS] [FILE]
If no file is specified, aggrs reads from stdin.
| Option | Description |
|---|---|
-k, --keys <KEYS> |
Specify the JSON/CSV keys to aggregate (can be repeated) |
-l, --level <LEVEL> |
Specify the aggregation level depth |
-c, --colors |
Enable colored output |
-v, --verbose |
Enable verbose mode (shows percentages) |
--counters-to-right |
Display counters to the right of bucket names |
-t, --tokenize |
Tokenize lines by whitespace (for plain text input) |
-f, --filter <FILTER> |
Filter buckets by regular expression |
-d, --discovery <DISCOVERY> |
Discover keys matching regex on values |
-j, --num-threads <NUM_THREADS> |
Specify the number of threads |
--file-format <FILE_FORMAT> |
Specify file format (json or csv) |
-h, --help |
Print help information |
-V, --version |
Print version |
Given a JSON file with network flow data (flows.json):
{"transport":"tcp","application":"http","service":"google","src_ip":"192.168.1.100"}
{"transport":"tcp","application":"https","service":"facebook","src_ip":"192.168.1.101"}
{"transport":"udp","application":"dns","service":"cloudflare","src_ip":"192.168.1.100"}
Build a hierarchical aggregation tree:
aggrs -k transport -k application -k service flows.json
Output:
1: "udp"
1: "dns"
1: "cloudflare"
2: "tcp"
1: "http"
1: "google"
1: "https"
1: "facebook"
buckets : 2
total entries: 3
time elapsed : 0.42ms
With colored output and percentages:
aggrs -k transport -k application flows.json -c -v
Output:
1 (33.33%): "udp"
1 (100.00%): "dns"
2 (66.67%): "tcp"
1 (50.00%): "http"
1 (50.00%): "https"
Display counters to the right:
aggrs -k transport -k application flows.json --counters-to-right -v
Output:
"udp" -> 1 (33.33%)
"dns" -> 1 (100.00%)
"tcp" -> 2 (66.67%)
"http" -> 1 (50.00%)
"https" -> 1 (50.00%)
Given a CSV file (traffic.csv):
transport,application,service,country,category
tcp,http,google,US,search
tcp,https,facebook,US,social
udp,dns,cloudflare,US,infrastructure
tcp,ssh,github,US,development
Aggregate by category and application:
aggrs -k category -k application traffic.csv
Output:
1: "development"
1: "ssh"
1: "infrastructure"
1: "dns"
1: "search"
1: "http"
1: "social"
1: "https"
Discover which keys contain a specific value pattern:
aggrs -d 'google' flows.json
Output:
service: 2
sni: 3
http_host: 1
buckets : 3
time elapsed : 0.85ms
This is useful when you need to find which fields in your data contain a certain value.
Filter results to only show entries matching a pattern:
aggrs -k transport -k application -k service flows.json -f "https"
This will only aggregate entries where the combined key values match "https".
For non-JSON/CSV files, use tokenize mode:
cat access.log | aggrs -t
This splits each line by whitespace and treats each token as a separate level.
cat data.json | aggrs -k field1 -k field2
Or pipe from other commands:
zcat compressed.json.gz | aggrs -k type -k status
For large files, specify the number of threads:
aggrs -k transport -k application large_file.json -j 8
The default output format shows:
Results are sorted by count in ascending order within each level.
At the end, aggrs displays:
buckets: Number of top-level bucketstotal entries: Total number of processed entriestime elapsed: Processing timeaggrs automatically detects the file format:
.csv are treated as CSV--file-format to override automatic detection-j) for large files-f) to reduce the dataset earlyMIT License
Nicola Bonelli nicola.bonelli@larthia.com