| Crates.io | lflog |
| lib.rs | lflog |
| version | 0.1.3 |
| created_at | 2026-01-17 04:52:26.343323+00 |
| updated_at | 2026-01-17 14:55:44.179134+00 |
| description | Query log files with SQL using DataFusion and regex pattern macros. |
| homepage | |
| repository | https://github.com/WeiNyn/lflog |
| max_upload_size | |
| id | 2049934 |
| size | 11,238,527 |
Query log files with SQL using DataFusion and regex pattern macros.
{{timestamp:datetime("%Y-%m-%d")}} instead of raw regexlogs/*.log__FILE__) and raw log lines (__RAW__)| Tool | Command |
|---|---|
| lflog |
|
| awk |
|
| DuckDB |
|
| Feature | lflog | awk/grep | DuckDB |
|---|---|---|---|
| Pattern syntax | {{level:var_name}} |
Raw regex | Raw regex |
| Named fields | ✅ Built-in | ❌ Manual indexing | ❌ regexp_extract() per field |
| SQL queries | ✅ Full SQL | ❌ Not available | ✅ Full SQL |
| Type inference | ✅ Automatic | ❌ All strings | ❌ Manual |
| Multi-file glob | ✅ 'logs/*.log' |
⚠️ Shell expansion | ✅ Supported |
| Source tracking | ✅ __FILE__ column |
❌ Manual | ❌ Manual |
| Aggregations | ✅ SQL GROUP BY | ⚠️ Complex piping | ✅ SQL GROUP BY |
| Joins | ✅ Supported | ❌ Not available | ✅ Supported |
Run the comparison demo:
./examples/duckdb_comparison.shRun the complex analysis demo:
./examples/complex_analysis_demo.sh(showcases multi-source analysis, security log inspection, and advanced SQL queries)
cargo build --release
lflog <log_file> [OPTIONS]
| Option | Description |
|---|---|
-c, --config <path> |
Config file (default: ~/.config/lflog/config.toml or LFLOG_CONFIG env) |
-p, --profile <name> |
Use profile from config |
--pattern <regex> |
Inline pattern (overrides profile) |
-t, --table <name> |
Table name for SQL (default: log) |
-q, --query <sql> |
Execute SQL query (omit for interactive mode) |
-f, --add-file-path |
Add __FILE__ column with source file path |
-r, --add-raw |
Add __RAW__ column with raw log line |
-n, --num-threads <N> |
Number of threads (default: 8, or LFLOGTHREADS env) |
# Query with inline pattern
lflog loghub/Apache/Apache_2k.log \
--pattern '^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$' \
--query "SELECT * FROM log WHERE level = 'error' LIMIT 10"
# Query multiple files with glob pattern
lflog 'logs/*.log' \
--pattern '{{ts:datetime}} [{{level:var_name}}] {{msg:any}}' \
--query "SELECT * FROM log"
# Include file path and raw line in results
lflog 'logs/*.log' --pattern '...' \
--add-file-path --add-raw \
--query 'SELECT level, "__FILE__", "__RAW__" FROM log'
# Query with config profile
lflog /var/log/apache.log --profile apache --query "SELECT * FROM log LIMIT 5"
# Interactive REPL mode
lflog server.log --pattern '{{ts:datetime}} [{{level:var_name}}] {{msg:any}}'
> SELECT * FROM log WHERE level = 'error'
> SELECT level, COUNT(*) FROM log GROUP BY level
> .exit
lflog includes a comprehensive set of demos using the Loghub dataset collection. These demos showcase how to query 16 different types of system logs (Android, Apache, Hadoop, HDFS, Linux, Spark, etc.).
To run a demo:
# 1. Go to the demo scripts directory
cd examples/loghub_demos/scripts
# 2. Run the demo for a specific dataset (e.g., Apache)
./run_demo.sh apache
See examples/loghub_demos/README.md for the full list of available datasets and more details.
Create ~/.config/lflog/config.toml:
# Global custom macros
[[custom_macros]]
name = "timestamp"
pattern = '\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}'
type_hint = "DateTime"
# Apache log profile
[[profiles]]
name = "apache"
description = "Apache error log format"
pattern = '^\[{{time:datetime("%a %b %d %H:%M:%S %Y")}}\] \[{{level:var_name}}\] {{message:any}}$'
# Nginx access log profile
[[profiles]]
name = "nginx"
pattern = '{{ip:ip}} - - \[{{time:any}}\] "{{method:var_name}} {{path:any}}" {{status:number}} {{bytes:number}}'
| Macro | Description | Type |
|---|---|---|
{{field:number}} |
Integer (digits) | Int32 |
{{field:float}} |
Floating point number | Float64 |
{{field:string}} |
Non-greedy string | String |
{{field:any}} |
Non-greedy match all | String |
{{field:var_name}} |
Identifier ([A-Za-z_][A-Za-z0-9_]*) |
String |
{{field:datetime("%fmt")}} |
Datetime with strftime format | String |
{{field:enum(a,b,c)}} |
One of the listed values | String |
{{field:uuid}} |
UUID format | String |
{{field:ip}} |
IPv4 address | String |
You can also use raw regex with named capture groups:
^(?P<ip>\d+\.\d+\.\d+\.\d+) - (?P<method>\w+)
When enabled, lflog adds special metadata columns to your query results:
| Column | Flag | Description |
|---|---|---|
__FILE__ |
-f, --add-file-path |
Absolute path of the source log file |
__RAW__ |
-r, --add-raw |
The original, unparsed log line |
These are useful when querying multiple files or when you need to see the original log line alongside parsed fields:
# Find errors across all log files with their source
lflog 'logs/*.log' --pattern '...' --add-file-path \
--query 'SELECT "__FILE__", level, message FROM log WHERE level = '\''error'\'''
Note: Use double quotes around
__FILE__and__RAW__in SQL to preserve case.
use lflog::{LfLog, QueryOptions};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// With inline pattern
let lflog = LfLog::new();
lflog.register(
QueryOptions::new("access.log")
.with_pattern(r#"^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$"#)
)?;
lflog.query_and_show("SELECT * FROM log WHERE level = 'error'").await?;
Ok(())
}
With glob patterns and metadata columns:
use lflog::{LfLog, QueryOptions};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let lflog = LfLog::new();
// Query multiple files with metadata columns
lflog.register(
QueryOptions::new("logs/*.log") // Glob pattern
.with_pattern(r#"^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$"#)
.with_add_file_path(true) // Add __FILE__ column
.with_add_raw(true) // Add __RAW__ column
.with_num_threads(Some(4)) // Use 4 threads
)?;
lflog.query_and_show(r#"SELECT level, "__FILE__" FROM log WHERE level = 'error'"#).await?;
Ok(())
}
Or with config profiles:
use lflog::{LfLog, QueryOptions};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let lflog = LfLog::from_config("~/.config/lflog/config.toml")?;
lflog.register(
QueryOptions::new("/var/log/apache.log")
.with_profile("apache")
)?;
let df = lflog.query("SELECT level, COUNT(*) FROM log GROUP BY level").await?;
df.show().await?;
Ok(())
}
src/
├── lib.rs # Public API
├── app.rs # LfLog application struct
├── types.rs # FieldType enum
├── scanner.rs # Pattern matching
├── macros/ # Macro expansion
│ ├── parser.rs # Config & macro parsing
│ └── expander.rs # Macro to regex expansion
├── datafusion/ # DataFusion integration
│ ├── builder.rs
│ ├── provider.rs
│ └── exec.rs
└── bin/
├── lflog.rs # Main CLI
└── lf_run.rs # Simple runner (deprecated)
lflog is designed for high performance, leveraging zero-copy parsing and DataFusion's vectorized execution engine.
Parsing an Apache error log (168MB, 2 million lines):
| Query | Time | Throughput |
|---|---|---|
SELECT count(*) FROM log WHERE level = 'error' |
~450ms | ~370 MB/s (4.4M lines/s) |
SELECT count(*) FROM log WHERE message LIKE '%error%' |
~450ms | ~370 MB/s |
Tested on Linux, single-threaded execution (default).
LFLOGTHREADS).MIT