Crates.io | lawkit |
lib.rs | lawkit |
version | 2.5.15 |
created_at | 2025-07-02 06:06:14.676639+00 |
updated_at | 2025-07-16 16:45:02.034558+00 |
description | Statistical law analysis CLI toolkit with international number support |
homepage | https://github.com/kako-jun/lawkit |
repository | https://github.com/kako-jun/lawkit |
max_upload_size | |
id | 1734556 |
size | 318,530 |
π Multi-law statistical analysis toolkit - Uncover hidden patterns and continuously detect anomalies automatically
English README | ζ₯ζ¬θͺη README | δΈζη README
Traditional tools analyze one pattern at a time. lawkit analyzes multiple statistical laws together to give you the complete picture. It automatically detects conflicts, runs faster with parallel processing, and provides clear insights.
Designed for modern automation with JSON, CSV, and other structured outputs that work perfectly with AI tools and automated workflows. Ideal for fraud detection, data quality checks, and business intelligence.
# Single law analysis - Benford Law fraud detection with visual charts
$ lawkit benf financial_data.csv
Benford Law Analysis Results
Dataset: financial_data.csv
Numbers analyzed: 2500
Risk Level: Low [LOW]
First Digit Distribution:
1: ββββββββββββββββββββββββββββββββββββββββββββ 35.2% (expected: 30.1%)
2: ββββββββββββββββββββββββββββββββββββββββββββ 14.8% (expected: 17.6%)
3: ββββββββββββββββββββββββββββββββββββββββββββ 10.3% (expected: 12.5%)
4: ββββββββββββββββββββββββββββββββββββββββββββ 12.1% (expected: 9.7%)
5: ββββββββββββββββββββββββββββββββββββββββββββ 5.2% (expected: 7.9%)
6: ββββββββββββββββββββββββββββββββββββββββββββ 11.7% (expected: 6.7%)
7: ββββββββββββββββββββββββββββββββββββββββββββ 6.8% (expected: 5.8%)
8: ββββββββββββββββββββββββββββββββββββββββββββ 2.9% (expected: 5.1%)
9: ββββββββββββββββββββββββββββββββββββββββββββ 1.0% (expected: 4.6%)
Statistical Tests:
Chi-square: 1.34 (p-value: 0.995)
Mean Absolute Deviation: 0.8%
# Pareto Analysis with Lorenz curve visualization
$ lawkit pareto sales_data.csv
Pareto Principle (80/20 Rule) Analysis Results
Dataset: sales_data.csv
Numbers analyzed: 1000
[LOW] Dataset analysis
Lorenz Curve (Cumulative Distribution):
10%: ββββββββββββββββββββββββββββββββββββββββββββββ 5.2% cumulative
20%: ββββββββββββββββββββββββββββββββββββββββββββββ 20.1% cumulative
30%: ββββββββββββββββββββββββββββββββββββββββββββββ 35.4% cumulative
40%: ββββββββββββββββββββββββββββββββββββββββββββββ 48.9% cumulative
50%: ββββββββββββββββββββββββββββββββββββββββββββββ 61.7% cumulative
80/20 Rule: Top 20% owns 79.2% of total wealth (Ideal: 80.0%, Ratio: 0.99)
# Multi-law integration analysis
$ lawkit analyze --laws all data.csv
Statistical Laws Integration Analysis
Dataset: data.csv
Numbers Analyzed: 1000
Laws Executed: 5 (benf, pareto, zipf, normal, poisson)
Integration Metrics:
Overall Quality Score: 0.743
Consistency Score: 0.823
Conflicts Detected: 2
Recommendation Confidence: 0.892
Real benchmark results on AMD Ryzen 5 PRO 4650U:
# Traditional tools analyze one pattern at a time
$ other-tool data.csv # Single analysis: ~2.1s
$ lawkit benf data.csv # Same analysis: ~180ms (11.7x faster)
$ lawkit analyze data.csv # Multi-law analysis: ~850ms
graph TB
A[π Input Data<br/>CSV, JSON, Excel, PDF...] --> B[π Parse & Validate<br/>5 Language Support]
B --> C1[π΅οΈ Benford Law<br/>Fraud Detection]
B --> C2[π Pareto Analysis<br/>80/20 Rule]
B --> C3[π€ Zipf Law<br/>Frequency Analysis]
B --> C4[π Normal Distribution<br/>Quality Control]
B --> C5[β‘ Poisson Distribution<br/>Rare Events]
C1 --> D1[π Statistical Scores]
C2 --> D2[π Gini Coefficient]
C3 --> D3[π Correlation Analysis]
C4 --> D4[π Normality Tests]
C5 --> D5[π Event Modeling]
D1 --> E[π§ Integration Engine<br/>Conflict Detection]
D2 --> E
D3 --> E
D4 --> E
D5 --> E
E --> F1[β οΈ Risk Assessment<br/>Critical/High/Medium/Low]
E --> F2[π― Smart Recommendations<br/>Primary/Secondary Laws]
E --> F3[π Advanced Outliers<br/>LOF, Isolation Forest, DBSCAN]
E --> F4[π Time Series Analysis<br/>Trends, Seasonality, Anomalies]
F1 --> G[π Comprehensive Report<br/>lawkit/JSON/CSV/YAML/XML]
F2 --> G
F3 --> G
F4 --> G
graph LR
subgraph "Stage 1: Basic Analysis"
A[π lawkit analyze<br/>Multi-law Integration] --> A1[Overall Quality Score<br/>Law Compatibility<br/>Initial Insights]
end
subgraph "Stage 2: Validation"
A1 --> B[π lawkit validate<br/>Data Quality Checks]
B --> B1[Consistency Analysis<br/>Cross-validation<br/>Reliability Assessment]
end
subgraph "Stage 3: Deep Diagnosis"
B1 --> C[π©Ί lawkit diagnose<br/>Conflict Detection]
C --> C1[Detailed Root Cause<br/>Resolution Strategies<br/>Risk Assessment]
end
style A stroke:#2196f3,stroke-width:2px
style B stroke:#9c27b0,stroke-width:2px
style C stroke:#ff9800,stroke-width:2px
analyze β validate β diagnose: Start with a broad overview, then check data quality, and finally investigate any specific problems.
lawkit looks at your data from multiple angles at once, then combines what it finds to give you clear insights and practical recommendations.
The first digit of naturally occurring numbers follows a specific distribution (1 appears ~30%, 2 appears ~18%, etc.). Deviations often indicate data manipulation, making it invaluable for:
The famous "80/20 rule" where 80% of effects come from 20% of causes. Essential for:
Word frequencies follow a predictable pattern where the nth most common word appears 1/n as often as the most common word. Useful for:
The bell-curve distribution that appears throughout nature and human behavior. Critical for:
Models the probability of rare events occurring in fixed time/space intervals. Essential for:
lawkit
outputs results in multiple formats for different use cases:
# From crates.io (recommended)
cargo install lawkit
# From releases
wget https://github.com/kako-jun/lawkit/releases/latest/download/lawkit-linux-x86_64.tar.gz
tar -xzf lawkit-linux-x86_64.tar.gz
# In your Cargo.toml
[dependencies]
lawkit-core = "2.1"
use lawkit_core::laws::benford::analyze_benford;
use lawkit_core::common::input::parse_text_input;
let numbers = parse_text_input("123 456 789")?;
let result = analyze_benford(&numbers, "data.txt", false)?;
println!("Chi-square: {}", result.chi_square);
# Node.js integration
npm install lawkit-js
# Python integration
pip install lawkit-python # CLI binary automatically included
# Benford Law - Fraud detection with digit distribution chart
$ lawkit benf financial_data.csv
First Digit Distribution:
1: ββββββββββββββββββββββββββββββββββββββββββββββββββ 13.6% (expected: 30.1%)
2: ββββββββββββββββββββββββββββββββββββββββββββββββββ 14.6% (expected: 17.6%)
3: ββββββββββββββββββββββββββββββββββββββββββββββββββ 14.6% (expected: 12.5%)
4: ββββββββββββββββββββββββββββββββββββββββββββββββββ 13.6% (expected: 9.7%)
5: ββββββββββββββββββββββββββββββββββββββββββββββββββ 12.6% (expected: 7.9%)
6: ββββββββββββββββββββββββββββββββββββββββββββββββββ 13.6% (expected: 6.7%)
7: ββββββββββββββββββββββββββββββββββββββββββββββββββ 7.8% (expected: 5.8%)
8: ββββββββββββββββββββββββββββββββββββββββββββββββββ 4.9% (expected: 5.1%)
9: ββββββββββββββββββββββββββββββββββββββββββββββββββ 4.9% (expected: 4.6%)
# Pareto Analysis - 80/20 Rule with Lorenz curve visualization
$ lawkit pareto sales_data.csv
Lorenz Curve (Cumulative Distribution):
8%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 59.7% cumulative
17%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 85.3% cumulative (80/20 point)
27%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 94.8% cumulative
35%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 98.2% cumulative
46%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 99.3% cumulative
80/20 Rule: Top 20% owns 90.0% of total wealth (Ideal: 80.0%, Ratio: 1.13)
# Normal Distribution - Quality control with histogram
$ lawkit normal measurements.csv
Distribution Histogram:
97.73- 98.26: ββββββββββββββββββββββββββββββββββββββββββββββββββ 2.7%
98.26- 98.79: ββββββββββββββββββββββββββββββββββββββββββββββββββ 11.5%
98.79- 99.32: ββββββββββββββββββββββββββββββββββββββββββββββββββ 34.0%
99.32- 99.85: ββββββββββββββββββββββββββββββββββββββββββββββββββ 69.8%
99.85-100.39: ββββββββββββββββββββββββββββββββββββββββββββββββββ 100.0%
Distribution: ΞΌ=100.39, Ο=0.89, Range: [97.73, 103.04]
1Ο: 60.0%, 2Ο: 98.0%, 3Ο: 100.0%
# Zipf Law - Rank-frequency distribution with power law analysis
$ lawkit zipf word_frequencies.csv
Rank-Frequency Distribution:
# 1: ββββββββββββββββββββββββββββββββββββββββββββββββββ 1.74% (expected: 1.74%)
# 2: ββββββββββββββββββββββββββββββββββββββββββββββββββ 1.22% (expected: 0.87%)
# 3: ββββββββββββββββββββββββββββββββββββββββββββββββββ 1.04% (expected: 0.58%)
# 4: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.87% (expected: 0.43%)
# 5: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.87% (expected: 0.35%)
# 6: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.29%)
# 7: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.25%)
# 8: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.22%)
# 9: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.19%)
#10: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.17%)
Zipf Exponent: 0.142 (ideal: 1.0), Correlation: 0.950
# Poisson Distribution - Rare events with probability chart
$ lawkit poisson event_counts.csv
Probability Distribution:
P(X= 0): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.103
P(X= 1): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.234
P(X= 2): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.266
P(X= 3): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.201
P(X= 4): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.114
Key Probabilities: P(X=0)=0.103, P(X=1)=0.234, P(Xβ₯2)=0.662
Ξ»=2.27, Variance/Mean=0.774 (ideal: 1.0), Fit Score=0.682
We recommend the analyze β validate β diagnose approach for thorough data analysis:
# Stage 1: Basic multi-law analysis
$ lawkit analyze --laws all data.csv
Statistical Laws Integration Analysis
Dataset: data.csv
Numbers analyzed: 1000
Laws executed: 5 (benford, pareto, zipf, normal, poisson)
Integration Metrics:
Overall Quality: 0.743
Consistency: 0.823
Conflicts Detected: 2
Recommendation Confidence: 0.892
Law Results:
Benford Law: 0.652
Pareto Principle: 0.845
Zipf Law: 0.423
Normal Distribution: 0.912
Poisson Distribution: 0.634
Conflicts:
[CONFLICT] Benford Law score 0.652 significantly deviates from expected 0.500 - deviation 30.4%
Likely Cause: Different distribution assumptions
Suggestion: Focus on Zipf analysis for frequency data
Risk Assessment: [MEDIUM]
# Stage 2: Data validation with consistency checks
$ lawkit validate --laws benf,pareto,normal transactions.csv --consistency-check
Data Validation and Consistency Analysis
Dataset: transactions.csv
Numbers analyzed: 2500
Laws validated: 3 (benford, pareto, normal)
Validation Results:
Data Quality Score: 0.891
Cross-validation Consistency: 0.943
Statistical Reliability: HIGH
Individual Law Validation:
[PASS] Benford Law validation (Score: 0.834, p-value: 0.023)
[PASS] Pareto Principle validation (Gini: 0.78, Alpha: 2.12)
[WARNING] Normal Distribution validation (Shapiro-Wilk: 0.032)
Consistency Analysis:
Benford-Pareto Agreement: 0.912 (HIGH)
Benford-Normal Agreement: 0.643 (MEDIUM)
Pareto-Normal Agreement: 0.587 (MEDIUM)
Data Quality Assessment: RELIABLE (Validation Score: 0.891)
# Stage 3: Deep conflict analysis and recommendations
$ lawkit diagnose --laws all suspicious_data.csv --report detailed
Detailed Conflict Detection and Diagnostic Report
Dataset: suspicious_data.csv
Numbers analyzed: 1500
Laws analyzed: 5 (benford, pareto, zipf, normal, poisson)
[CONFLICT] 3 Critical Issues Detected
Critical Conflict #1: Score Deviation
Laws: Benford Law vs Normal Distribution
Conflict Score: 0.847 (HIGH)
Description: Benford Law and Normal Distribution show significantly different
evaluations (difference: 0.623) with structural differences in:
confidence_level ("high" β "low"), score_category ("good" β "poor")
Root Cause: Benford Law indicates potential data manipulation while Normal
suggests legitimate natural distribution pattern
Resolution: Investigate data source integrity; consider temporal analysis
to identify manipulation periods
Critical Conflict #2: Distribution Mismatch
Laws: Pareto Principle vs Poisson Distribution
Conflict Score: 0.793 (HIGH)
Description: Power law distribution conflicts with discrete event modeling
Root Cause: Data contains mixed patterns (continuous wealth distribution
and discrete event counts)
Resolution: Segment data by type before analysis; apply Pareto Principle to amounts,
Poisson Distribution to frequencies
Critical Conflict #3: Methodological Conflict
Laws: Zipf Law vs Normal Distribution
Conflict Score: 0.651 (MEDIUM)
Description: Frequency-based analysis conflicts with continuous distribution
Root Cause: Dataset may contain both textual frequency data and numerical measurements
Resolution: Separate frequency analysis from statistical distribution testing
Risk Assessment: [CRITICAL] (Multiple fundamental conflicts detected)
Recommendation: Manual data review required before automated decision-making
# Generate test data
lawkit generate pareto --samples 1000 > test_data.txt
lawkit generate normal --mean 100 --stddev 15 --samples 500
# Built-in time series analysis
lawkit normal monthly_sales.csv --enable-timeseries --timeseries-window 12
# Returns: trend analysis, seasonality detection, changepoints, forecasts
# Advanced filtering and analysis
lawkit analyze --laws all --filter ">=1000" financial_data.xlsx
lawkit benf sales_data.csv --format xml
# Pipeline usage
cat raw_numbers.txt | lawkit benf -
lawkit generate zipf --samples 10000 | lawkit analyze --laws all -
# Meta-chaining with diffx for time series analysis
lawkit benf sales_2023.csv > analysis_2023.txt
lawkit benf sales_2024.csv > analysis_2024.txt
diffx analysis_2023.txt analysis_2024.txt # Detect changes in statistical patterns
# Continuous monitoring pipeline
for month in {01..12}; do
lawkit analyze --laws all sales_2024_${month}.csv > analysis_${month}.txt
done
diffx analysis_*.txt --chain # Visualize pattern evolution over time
Meta-chaining combines lawkit's built-in time series analysis with diffx for long-term pattern tracking:
graph LR
A[Jan Data] -->|lawkit| B[Jan Analysis]
C[Feb Data] -->|lawkit| D[Feb Analysis]
E[Mar Data] -->|lawkit| F[Mar Analysis]
B -->|diffx| G[Period Differences<br/>JanβFeb]
D -->|diffx| G
D -->|diffx| H[Period Differences<br/>FebβMar]
F -->|diffx| H
G -->|long-term trend| I[Pattern<br/>Evolution]
H -->|long-term trend| I
style I stroke:#0288d1,stroke-width:3px
Built-in Time Series Analysis (single dataset):
Meta-chaining with diffx (multiple time periods):
For comprehensive guides, examples, and API documentation:
π User Guide - Installation, usage, and examples
π§ CLI Reference - Complete command documentation
π Statistical Laws Guide - Detailed analysis examples
β‘ Performance Guide - Optimization and large datasets
π International Support - Multi-language number parsing
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the MIT License - see the LICENSE for details.