ucloud-cdn-log-parser

Crates.ioucloud-cdn-log-parser
lib.rsucloud-cdn-log-parser
version0.1.3
sourcesrc
created_at2024-03-16 14:48:24.444885
updated_at2024-03-20 04:49:29.616547
descriptionParse UCloud CDN log files to CSV
homepage
repositoryhttps://github.com/yinheli/ucloud-cdn-log-parser
max_upload_size
id1175686
size20,790
yinheli (yinheli)

documentation

README

ucloud-cdn-log-parser

Parse ucloud cdn log to csv/tsv format with/without header, then you can use duckdb / clickhouse local / etc to analyze the log.

Install

Download from release page, which is built by github action.

or install via cargo or build from source.

cargo install ucloud-cdn-log-parser --locked

Usage

# download logs

# parse & convert to csv / parquet

zcat *.gz | \
  ucloud-cdn-log-parser > log.csv

## use duckdb
zcat *.gz | \
  ucloud-cdn-log-parser | \
  pv | \
  duckdb -c "
    copy 
      (select * from read_csv('/dev/stdin')) 
      to 'log.parquet.zst' 
      (format parquet, compression 'zstd');"

# now you can use duckdb / clickhouse local / etc to query the log

Analyze log with duckdb for example:

-- get top 100 client_ip by sent_bytes_incl_header in last 6 hours
select
    client_ip,
    format_bytes(sum(sent_bytes_incl_header)::bigint) sent_bytes,
    count(*) as n
from 'log.parquet.zst'
where date_time > (now() - interval 6 hours)::timestamp
group by client_ip
order by sum(sent_bytes_incl_header) desc
limit 100;

-- get top 100 request_method_url_protocol by sent_bytes_incl_header in last 6 hours
select
    request_method_url_protocol,
    format_bytes(sum(sent_bytes_incl_header)::bigint) sent_bytes,
    count(*) as n
from 'log.parquet.zst'
where date_time > (now() - interval 6 hours)::timestamp
group by request_method_url_protocol
order by sum(sent_bytes_incl_header) desc, n desc
limit 100;

Reference

Commit count: 16

cargo fmt