aggrs

Crates.io	aggrs
lib.rs	aggrs
version	0.3.0
created_at	2026-01-11 15:20:01.595819+00
updated_at	2026-01-11 15:20:01.595819+00
description	A fast command-line tool for building aggregation trees from JSON and CSV data
homepage	https://github.com/awgn/aggrs
repository	https://github.com/awgn/aggrs
max_upload_size
id	2035954
size	41,738

Nicola Bonelli (awgn)

documentation

https://github.com/awgn/aggrs#readme

README

aggrs

A fast, multi-threaded command-line tool for building hierarchical aggregation trees from JSON and CSV data. Perfect for analyzing log files, network traffic data, and any structured data where you need to understand the distribution of values across multiple dimensions.

Features

🚀 Fast & Parallel: Built with Rust and Rayon for multi-threaded processing
📊 Hierarchical Aggregation: Build nested aggregation trees with multiple keys
🔍 Key Discovery: Find which keys contain specific values in your data
📁 Multiple Formats: Supports JSON (newline-delimited), CSV, and plain text
🎨 Colored Output: Optional colored output for better readability
🔎 Filtering: Filter results using regular expressions
📈 Statistics: Optional percentage display for each bucket

Installation

From source

git clone https://github.com/awgn/aggrs.git
cd aggrs
cargo build --release

The binary will be available at ./target/release/aggrs.

From crates.io

cargo install aggrs

Usage

aggrs [OPTIONS] [FILE]

If no file is specified, aggrs reads from stdin.

Options

Option	Description
`-k, --keys <KEYS>`	Specify the JSON/CSV keys to aggregate (can be repeated)
`-l, --level <LEVEL>`	Specify the aggregation level depth
`-c, --colors`	Enable colored output
`-v, --verbose`	Enable verbose mode (shows percentages)
`--counters-to-right`	Display counters to the right of bucket names
`-t, --tokenize`	Tokenize lines by whitespace (for plain text input)
`-f, --filter <FILTER>`	Filter buckets by regular expression
`-d, --discovery <DISCOVERY>`	Discover keys matching regex on values
`-j, --num-threads <NUM_THREADS>`	Specify the number of threads
`--file-format <FILE_FORMAT>`	Specify file format (`json` or `csv`)
`-h, --help`	Print help information
`-V, --version`	Print version

Examples

JSON Aggregation

Given a JSON file with network flow data (flows.json):

{"transport":"tcp","application":"http","service":"google","src_ip":"192.168.1.100"}
{"transport":"tcp","application":"https","service":"facebook","src_ip":"192.168.1.101"}
{"transport":"udp","application":"dns","service":"cloudflare","src_ip":"192.168.1.100"}

Build a hierarchical aggregation tree:

aggrs -k transport -k application -k service flows.json

Output:

1: "udp"
    1: "dns"
        1: "cloudflare"
2: "tcp"
    1: "http"
        1: "google"
    1: "https"
        1: "facebook"
buckets      : 2
total entries: 3
time elapsed : 0.42ms

With colored output and percentages:

aggrs -k transport -k application flows.json -c -v

Output:

1 (33.33%): "udp"
    1 (100.00%): "dns"
2 (66.67%): "tcp"
    1 (50.00%): "http"
    1 (50.00%): "https"

Display counters to the right:

aggrs -k transport -k application flows.json --counters-to-right -v

Output:

"udp" -> 1 (33.33%)
    "dns" -> 1 (100.00%)
"tcp" -> 2 (66.67%)
    "http" -> 1 (50.00%)
    "https" -> 1 (50.00%)

CSV Aggregation

Given a CSV file (traffic.csv):

transport,application,service,country,category
tcp,http,google,US,search
tcp,https,facebook,US,social
udp,dns,cloudflare,US,infrastructure
tcp,ssh,github,US,development

Aggregate by category and application:

aggrs -k category -k application traffic.csv

Output:

1: "development"
    1: "ssh"
1: "infrastructure"
    1: "dns"
1: "search"
    1: "http"
1: "social"
    1: "https"

Key Discovery

Discover which keys contain a specific value pattern:

aggrs -d 'google' flows.json

Output:

service: 2
sni: 3
http_host: 1
buckets      : 3
time elapsed : 0.85ms

This is useful when you need to find which fields in your data contain a certain value.

Filtering

Filter results to only show entries matching a pattern:

aggrs -k transport -k application -k service flows.json -f "https"

This will only aggregate entries where the combined key values match "https".

Plain Text Mode

For non-JSON/CSV files, use tokenize mode:

cat access.log | aggrs -t

This splits each line by whitespace and treats each token as a separate level.

Reading from stdin

cat data.json | aggrs -k field1 -k field2

Or pipe from other commands:

zcat compressed.json.gz | aggrs -k type -k status

Multi-threaded Processing

For large files, specify the number of threads:

aggrs -k transport -k application large_file.json -j 8

Output Format

The default output format shows:

Count: Number of entries for each bucket
Bucket name: The value of the key
Indentation: Hierarchical level (4 spaces per level)

Results are sorted by count in ascending order within each level.

At the end, aggrs displays:

buckets: Number of top-level buckets
total entries: Total number of processed entries
time elapsed: Processing time

File Format Detection

aggrs automatically detects the file format:

Files ending in .csv are treated as CSV
All other files are treated as JSON (newline-delimited)
Use --file-format to override automatic detection

Performance Tips

Use multiple threads (-j) for large files
Specify only needed keys to reduce memory usage
Use filters (-f) to reduce the dataset early
The tool processes files in parallel using Rayon for optimal performance

Use Cases

Network Traffic Analysis: Aggregate flows by protocol, application, and service
Log Analysis: Group log entries by severity, source, and type
Data Exploration: Discover patterns and distributions in structured data
Security Analysis: Identify anomalies by aggregating event types and sources
Business Intelligence: Summarize transactions by category, region, and time

License

MIT License

Author

Nicola Bonelli nicola.bonelli@larthia.com

Commit count: 14

aggrs

documentation

README

aggrs

Features

Installation

From source

From crates.io

Usage

Options

Examples

JSON Aggregation

CSV Aggregation

Key Discovery

Filtering

Plain Text Mode

Reading from stdin

Multi-threaded Processing

Output Format

File Format Detection

Performance Tips

Use Cases

License

Author

cargo fmt