Crates.io | ucloud-cdn-log-parser |
lib.rs | ucloud-cdn-log-parser |
version | 0.1.3 |
source | src |
created_at | 2024-03-16 14:48:24.444885 |
updated_at | 2024-03-20 04:49:29.616547 |
description | Parse UCloud CDN log files to CSV |
homepage | |
repository | https://github.com/yinheli/ucloud-cdn-log-parser |
max_upload_size | |
id | 1175686 |
size | 20,790 |
Parse ucloud cdn log to csv/tsv format with/without header, then you can use duckdb / clickhouse local / etc to analyze the log.
Download from release page, which is built by github action.
or install via cargo or build from source.
cargo install ucloud-cdn-log-parser --locked
# download logs
# parse & convert to csv / parquet
zcat *.gz | \
ucloud-cdn-log-parser > log.csv
## use duckdb
zcat *.gz | \
ucloud-cdn-log-parser | \
pv | \
duckdb -c "
copy
(select * from read_csv('/dev/stdin'))
to 'log.parquet.zst'
(format parquet, compression 'zstd');"
# now you can use duckdb / clickhouse local / etc to query the log
Analyze log with duckdb for example:
-- get top 100 client_ip by sent_bytes_incl_header in last 6 hours
select
client_ip,
format_bytes(sum(sent_bytes_incl_header)::bigint) sent_bytes,
count(*) as n
from 'log.parquet.zst'
where date_time > (now() - interval 6 hours)::timestamp
group by client_ip
order by sum(sent_bytes_incl_header) desc
limit 100;
-- get top 100 request_method_url_protocol by sent_bytes_incl_header in last 6 hours
select
request_method_url_protocol,
format_bytes(sum(sent_bytes_incl_header)::bigint) sent_bytes,
count(*) as n
from 'log.parquet.zst'
where date_time > (now() - interval 6 hours)::timestamp
group by request_method_url_protocol
order by sum(sent_bytes_incl_header) desc, n desc
limit 100;