lflog

Crates.io	lflog
lib.rs	lflog
version	0.1.3
created_at	2026-01-17 04:52:26.343323+00
updated_at	2026-01-17 14:55:44.179134+00
description	Query log files with SQL using DataFusion and regex pattern macros.
homepage
repository	https://github.com/WeiNyn/lflog
max_upload_size
id	2049934
size	11,238,527

Wei (WeiNyn)

documentation

README

lflog

Query log files with SQL using DataFusion and regex pattern macros.

Features

🔍 SQL Queries - Query log files using familiar SQL syntax via DataFusion
🧩 Pattern Macros - Use intuitive macros like {{timestamp:datetime("%Y-%m-%d")}} instead of raw regex
📊 Type Inference - Automatic schema generation with proper types (Int32, Float64, String)
⚡ Fast - Leverages DataFusion's optimized query engine with parallel processing
📁 Glob Patterns - Query multiple files at once with patterns like logs/*.log
🏷️ Metadata Columns - Access file path (__FILE__) and raw log lines (__RAW__)
📝 Config Profiles - Define reusable log profiles in TOML config files
💻 Interactive REPL - Query logs interactively with command history

Why lflog?

Comparison: Count errors by log level

Tool Command

Tool	Command
lflog	`lflog access.log \ --pattern '^\[{{time:any}}\] \[{{level:var_name}}\] {{msg:any}}$' \ --query "SELECT level, COUNT(*) FROM log GROUP BY level"`
awk	`awk -F'[][]' '{print $4}' access.log \| sort \| uniq -c \| sort -rn # Or with proper parsing: awk 'match($0, /\[[^\]]+\] \[([^\]]+)\]/, m) {count[m[1]]++} END {for (l in count) print l, count[l]}' access.log`
DuckDB	`SELECT regexp_extract(line, '\[[^\]]+\] \[([^\]]+)\]', 1) as level, COUNT(*) as count FROM read_csv('access.log', columns={'line': 'VARCHAR'}, header=false, delim=E'\x1F') GROUP BY level;`

lflog

lflog access.log \
  --pattern '^\[{{time:any}}\] \[{{level:var_name}}\] {{msg:any}}$' \
  --query "SELECT level, COUNT(*) FROM log GROUP BY level"

awk

awk -F'[][]' '{print $4}' access.log | sort | uniq -c | sort -rn
# Or with proper parsing:
awk 'match($0, /\[[^\]]+\] \[([^\]]+)\]/, m) {count[m[1]]++} 
     END {for (l in count) print l, count[l]}' access.log

DuckDB

SELECT 
    regexp_extract(line, '\[[^\]]+\] \[([^\]]+)\]', 1) as level,
    COUNT(*) as count
FROM read_csv('access.log', columns={'line': 'VARCHAR'}, 
              header=false, delim=E'\x1F')
GROUP BY level;

Key Advantages

Feature	lflog	awk/grep	DuckDB
Pattern syntax	`{{level:var_name}}`	Raw regex	Raw regex
Named fields	✅ Built-in	❌ Manual indexing	❌ `regexp_extract()` per field
SQL queries	✅ Full SQL	❌ Not available	✅ Full SQL
Type inference	✅ Automatic	❌ All strings	❌ Manual
Multi-file glob	✅ `'logs/*.log'`	⚠️ Shell expansion	✅ Supported
Source tracking	✅ `__FILE__` column	❌ Manual	❌ Manual
Aggregations	✅ SQL GROUP BY	⚠️ Complex piping	✅ SQL GROUP BY
Joins	✅ Supported	❌ Not available	✅ Supported

Run the comparison demo: ./examples/duckdb_comparison.sh

Run the complex analysis demo: ./examples/complex_analysis_demo.sh (showcases multi-source analysis, security log inspection, and advanced SQL queries)

Installation

cargo build --release

CLI Usage

lflog <log_file> [OPTIONS]

Options

Option	Description
`-c, --config <path>`	Config file (default: `~/.config/lflog/config.toml` or `LFLOG_CONFIG` env)
`-p, --profile <name>`	Use profile from config
`--pattern <regex>`	Inline pattern (overrides profile)
`-t, --table <name>`	Table name for SQL (default: `log`)
`-q, --query <sql>`	Execute SQL query (omit for interactive mode)
`-f, --add-file-path`	Add `__FILE__` column with source file path
`-r, --add-raw`	Add `__RAW__` column with raw log line
`-n, --num-threads <N>`	Number of threads (default: 8, or `LFLOGTHREADS` env)

Examples

# Query with inline pattern
lflog loghub/Apache/Apache_2k.log \
  --pattern '^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$' \
  --query "SELECT * FROM log WHERE level = 'error' LIMIT 10"

# Query multiple files with glob pattern
lflog 'logs/*.log' \
  --pattern '{{ts:datetime}} [{{level:var_name}}] {{msg:any}}' \
  --query "SELECT * FROM log"

# Include file path and raw line in results
lflog 'logs/*.log' --pattern '...' \
  --add-file-path --add-raw \
  --query 'SELECT level, "__FILE__", "__RAW__" FROM log'

# Query with config profile
lflog /var/log/apache.log --profile apache --query "SELECT * FROM log LIMIT 5"

# Interactive REPL mode
lflog server.log --pattern '{{ts:datetime}} [{{level:var_name}}] {{msg:any}}'
> SELECT * FROM log WHERE level = 'error'
> SELECT level, COUNT(*) FROM log GROUP BY level
> .exit

Demos (Loghub)

lflog includes a comprehensive set of demos using the Loghub dataset collection. These demos showcase how to query 16 different types of system logs (Android, Apache, Hadoop, HDFS, Linux, Spark, etc.).

To run a demo:

# 1. Go to the demo scripts directory
cd examples/loghub_demos/scripts

# 2. Run the demo for a specific dataset (e.g., Apache)
./run_demo.sh apache

See examples/loghub_demos/README.md for the full list of available datasets and more details.

Config File

Create ~/.config/lflog/config.toml:

# Global custom macros
[[custom_macros]]
name = "timestamp"
pattern = '\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}'
type_hint = "DateTime"

# Apache log profile
[[profiles]]
name = "apache"
description = "Apache error log format"
pattern = '^\[{{time:datetime("%a %b %d %H:%M:%S %Y")}}\] \[{{level:var_name}}\] {{message:any}}$'

# Nginx access log profile
[[profiles]]
name = "nginx"
pattern = '{{ip:ip}} - - \[{{time:any}}\] "{{method:var_name}} {{path:any}}" {{status:number}} {{bytes:number}}'

Pattern Macros

Macro	Description	Type
`{{field:number}}`	Integer (digits)	Int32
`{{field:float}}`	Floating point number	Float64
`{{field:string}}`	Non-greedy string	String
`{{field:any}}`	Non-greedy match all	String
`{{field:var_name}}`	Identifier (`[A-Za-z_][A-Za-z0-9_]*`)	String
`{{field:datetime("%fmt")}}`	Datetime with strftime format	String
`{{field:enum(a,b,c)}}`	One of the listed values	String
`{{field:uuid}}`	UUID format	String
`{{field:ip}}`	IPv4 address	String

You can also use raw regex with named capture groups:

^(?P<ip>\d+\.\d+\.\d+\.\d+) - (?P<method>\w+)

Metadata Columns

When enabled, lflog adds special metadata columns to your query results:

Column	Flag	Description
`__FILE__`	`-f, --add-file-path`	Absolute path of the source log file
`__RAW__`	`-r, --add-raw`	The original, unparsed log line

These are useful when querying multiple files or when you need to see the original log line alongside parsed fields:

# Find errors across all log files with their source
lflog 'logs/*.log' --pattern '...' --add-file-path \
  --query 'SELECT "__FILE__", level, message FROM log WHERE level = '\''error'\'''

Note: Use double quotes around __FILE__ and __RAW__ in SQL to preserve case.

Library Usage

use lflog::{LfLog, QueryOptions};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // With inline pattern
    let lflog = LfLog::new();
    
    lflog.register(
        QueryOptions::new("access.log")
            .with_pattern(r#"^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$"#)
    )?;
    
    lflog.query_and_show("SELECT * FROM log WHERE level = 'error'").await?;
    Ok(())
}

With glob patterns and metadata columns:

use lflog::{LfLog, QueryOptions};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let lflog = LfLog::new();
    
    // Query multiple files with metadata columns
    lflog.register(
        QueryOptions::new("logs/*.log")  // Glob pattern
            .with_pattern(r#"^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$"#)
            .with_add_file_path(true)    // Add __FILE__ column
            .with_add_raw(true)          // Add __RAW__ column
            .with_num_threads(Some(4))   // Use 4 threads
    )?;
    
    lflog.query_and_show(r#"SELECT level, "__FILE__" FROM log WHERE level = 'error'"#).await?;
    Ok(())
}

Or with config profiles:

use lflog::{LfLog, QueryOptions};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let lflog = LfLog::from_config("~/.config/lflog/config.toml")?;
    
    lflog.register(
        QueryOptions::new("/var/log/apache.log")
            .with_profile("apache")
    )?;
    
    let df = lflog.query("SELECT level, COUNT(*) FROM log GROUP BY level").await?;
    df.show().await?;
    Ok(())
}

Project Structure

src/
├── lib.rs              # Public API
├── app.rs              # LfLog application struct
├── types.rs            # FieldType enum
├── scanner.rs          # Pattern matching
├── macros/             # Macro expansion
│   ├── parser.rs       # Config & macro parsing
│   └── expander.rs     # Macro to regex expansion
├── datafusion/         # DataFusion integration
│   ├── builder.rs
│   ├── provider.rs
│   └── exec.rs
└── bin/
    ├── lflog.rs        # Main CLI
    └── lf_run.rs       # Simple runner (deprecated)

Performance

lflog is designed for high performance, leveraging zero-copy parsing and DataFusion's vectorized execution engine.

Benchmarks

Parsing an Apache error log (168MB, 2 million lines):

Query	Time	Throughput
`SELECT count(*) FROM log WHERE level = 'error'`	~450ms	~370 MB/s (4.4M lines/s)
`SELECT count(*) FROM log WHERE message LIKE '%error%'`	~450ms	~370 MB/s

Tested on Linux, single-threaded execution (default).

Optimizations

Zero-Copy Parsing: Parses log lines directly from memory-mapped files without intermediate String allocations.
Pre-calculated Regex Indices: Resolves capture group indices once at startup, avoiding repeated string lookups in the hot loop.
Parallel Execution: Automatically partitions files for parallel processing (configurable via LFLOGTHREADS).

License

MIT

Commit count: 34

lflog

documentation

README

lflog

Features

Why lflog?

Comparison: Count errors by log level

Key Advantages

Installation

CLI Usage

Options

Examples

Demos (Loghub)

Config File

Pattern Macros

Metadata Columns

Library Usage

Project Structure

Performance

Benchmarks

Optimizations

License

cargo fmt