Crates.io | alog |
lib.rs | alog |
version | 0.8.0 |
source | src |
created_at | 2020-01-05 12:04:10.317803 |
updated_at | 2024-09-11 09:53:05.839937 |
description | Anonymize 'Combined Log Format' data |
homepage | https://crates.io/crates/alog |
repository | https://github.com/thyrc/alog |
max_upload_size | |
id | 195397 |
size | 34,180 |
alog
is a simple log file anonymizer.
In fact by default alog
just replaces the first word on every line of any input stream
with a customizable string.
So "log file anonymizer" might be a bit of an overstatement, but alog
can be used to (very
efficiently) replace the $remote_addr
part in many access log formats, e.g. Nginx' default
combined log format:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
By default any parseable $remote_addr
is replaced by it's localhost representation,
Lines without a $remote_addr
part will remain unchanged (but can be skipped).
With version 0.7
ASCII whitespace character
s are removed from the beginning of each line by default.With version 0.6
$remote_user
with '-' as well and$remote_addr
.With version 0.3 [features]
where added, so that the library crate won't pull unneeded
dependencies anymore.
To build the alog
commandline tool you now have to expicitly add --features
.
cargo build --features alog-cli
or
cargo build --all-features
Run cli-tool with --help
.
./target/release/alog --help
Calling run()
fn main() {
let mut io_conf = alog::IOConfig::default();
let mut conf = alog::Config::default();
io_conf.push_input("/tmp/test.log");
conf.set_ipv4_value("0.0.0.0");
if let Err(e) = alog::run(&conf, &io_conf) {
eprintln!("{}", e);
}
}
or run_raw()
use std::io::Cursor;
fn main() {
let mut buffer = vec![];
if let Err(e) = alog::run_raw(
&alog::Config {
ipv4: "XXX",
..Default::default()
},
Cursor::new(b"8.8.8.8 test line"),
&mut buffer,
) {
eprintln!("{}", e);
}
assert_eq!(buffer, b"XXX test line");
}
Config::authuser
With version 0.6 alog
can be used to replace the $remote_user
field with '-', but this
feature comes with a couple of peculiarities.
This feature should work fine with standard Common / Combined Log formatted files, but...
There will be a significant hit on performance (synthetic benchmarking suggests ~625MB/s instead of ~1100MB/s on my machine, but still better than Perl's ~115MB/s ;)
Used with Config::trim
set to false
and malformatted files the performance hit will be
even worse and removal of the $remote_user
field will fail altogether if no $time_local
field is found.
The $time_local
field is expected to start with '[' followed by a decimal number. E.g.:
"[10/Oct/2000:13:55:36 -0700]"
There is an optimization in place to reduce the performance hit with real-life log files,
but this leads to $remote_user
fields starting with "- [" not being replaced! So in
"8.8.8.8 - - [frank] [10/Oct/2000:13:55:36 -0700] GET /apache_pb.gif HTTP/1.0 200 2326"
"frank" will still be "frank". This optimization can be disabled.
alog
started as a replacement for a <10 line Perl script running on an old backup host.
So nothing shiny.. but it helped me learning some Rust (and crates.io) basics.
With version 0.6 alog
is feature complete. It doesn't do much, but it does it quite well.
At some point I might re-use this crate and try harder to actually anonymize data. But for
now, this is it.
I will still fix bugs when (and if) I find them, so alog
is now passively-maintained.