| Crates.io | greeners |
| lib.rs | greeners |
| version | 1.3.2 |
| created_at | 2025-12-24 21:24:21.211973+00 |
| updated_at | 2026-01-01 04:20:43.689582+00 |
| description | High-performance econometrics with R/Python formulas. Two-Way Clustering, Marginal Effects (AME/MEM), HC1-4, IV Predictions, Categorical C(var), Polynomial I(x^2), Interactions, Diagnostics. OLS, IV/2SLS, DiD, Logit/Probit, Panel (FE/RE), Time Series (VAR/VECM), Quantile! |
| homepage | |
| repository | https://github.com/sheep-farm/Greeners |
| max_upload_size | |
| id | 2003869 |
| size | 1,549,325 |
Greeners is a lightning-fast, type-safe econometrics library written in pure Rust. It provides a comprehensive suite of estimators for Cross-Sectional, Time-Series, and Panel Data analysis, leveraging linear algebra backends (LAPACK/BLAS) for maximum performance.
Designed for academic research, heavy simulations, and production-grade economic modeling.
Greeners v1.3.2 adds flexible statistical inference, allowing users to choose between Student's t-distribution and Normal (z) distribution for hypothesis testing - bringing statsmodels-compatible inference while maintaining exact finite-sample theory!
StudentT (default, exact) and Normal (asymptotic, statsmodels-compatible).with_inference() method on all linear modelsuse greeners::{OLS, CovarianceType, InferenceType};
// Default: Student's t (exact finite-sample)
let result = OLS::fit(&y, &x, CovarianceType::HC1)?;
println!("{}", result); // Shows "t | P>|t|"
// Switch to Normal/z (large-sample asymptotics, like statsmodels)
let result_z = result.with_inference(InferenceType::Normal)?;
println!("{}", result_z); // Shows "z | P>|z|"
When to use each:
Greeners v1.3.1 enhances the automatic type detection system with smart Int vs Float distinction, DateTime support, revolutionary Binary Boolean Detection, and adds Stata-like automatic collinearity detection across ALL models!
['casado', 'solteiro'], ['M', 'F'], etc.)1 from 1.0 and 1.51/0, yes/no, true/false, t/f variantsWhy Binary Boolean Detection is Revolutionary: Unlike pandas, R, polars, or Stata which require manual conversion of binary variables (married/single, male/female, treated/control), Greeners automatically recognizes and converts them - saving you from repetitive data preprocessing and potential errors!
Greeners v1.3.0 brings pandas-like DataFrame capabilities and essential time series operations for econometric analysis - all while maintaining 100% backward compatibility with v1.0.2!
Store free-form text data alongside numerical columns:
use greeners::DataFrame;
let customers = DataFrame::builder()
.add_int("id", vec![1, 2, 3])
.add_string("name", vec![
"Alice Johnson".to_string(),
"Bob Smith".to_string(),
"Charlie Brown".to_string(),
])
.add_string("email", vec![
"alice@example.com".to_string(),
"bob@example.com".to_string(),
"charlie@example.com".to_string(),
])
.add_column("purchase_amount", vec![150.0, 200.0, 75.0])
.build()?;
// Access string data
let names = customers.get_string("name")?;
println!("First customer: {}", names[0]); // "Alice Johnson"
String vs Categorical:
π See examples/string_features.rs for comprehensive demonstration.
Complete toolkit for handling missing values - just like pandas!
use greeners::DataFrame;
// Detect missing values
let mask = df.isna("temperature")?; // Boolean mask
let n_missing = df.count_na("temperature"); // Count
// Remove missing data
let clean = df.dropna()?; // Drop any row with NaN
let clean_subset = df.dropna_subset(&["price", "quantity"])?; // Drop if specific cols missing
// Fill missing values
let filled = df.fillna("price", 100.0)?; // Fill with constant
let forward = df.fillna_ffill("price")?; // Forward fill (carry last valid)
let backward = df.fillna_bfill("price")?; // Backward fill (carry next valid)
let smooth = df.interpolate("temperature")?; // Linear interpolation
Comprehensive workflow:
isna(), notna(), count_na() for investigationdropna() for complete-case analysisfillna(), ffill(), bfill(), interpolate() for treatmentπ See examples/missing_data_features.rs for complete workflow.
Essential operations for econometric time series analysis:
use greeners::DataFrame;
// Stock price data
let df = DataFrame::builder()
.add_column("date", vec![1.0, 2.0, 3.0, 4.0, 5.0])
.add_column("price", vec![100.0, 102.0, 101.0, 105.0, 103.0])
.build()?;
// Lag operator - create lagged variables
let with_lag = df.lag("price", 1)?; // Previous day's price β price_lag_1
// Essential for AR models: y_t = Ξ²β + Ξ²βΒ·y_{t-1} + Ξ΅_t
// Lead operator - forward-looking variables
let with_lead = df.lead("price", 1)?; // Next day's price β price_lead_1
// Essential for lead-lag analysis and Granger causality
// First differences - achieve stationarity
let stationary = df.diff("price", 1)?; // Ξprice_t = price_t - price_{t-1} β price_diff_1
// Essential for unit root tests and I(1) processes
// Percentage changes - returns calculation
let returns = df.pct_change("price", 1)?; // (price_t - price_{t-1}) / price_{t-1} β price_pct_1
// Standard in finance for asset returns
// Chain operations for complete analysis
let analysis = df
.lag("price", 1)?
.diff("price", 1)?
.pct_change("price", 1)?;
// Creates: price_lag_1, price_diff_1, price_pct_1
Use cases:
pct_change), momentum strategieslag), stationarity testing (diff), GDP growthMathematical relationships:
lag(x, n)[t] = x[t-n]lead(x, n)[t] = x[t+n]diff(x, n)[t] = x[t] - x[t-n]pct_change(x, n)[t] = (x[t] - x[t-n]) / x[t-n]π See examples/time_series_features.rs for 11 practical examples.
Before v1.3.0:
shift() and manual calculationsNow v1.3.0:
100% backward compatible - zero breaking changes!
All v1.0.2 code works unchanged. New capabilities are purely additive:
// Your existing v1.0.2 code
let df = DataFrame::from_csv("data.csv")?;
let formula = Formula::parse("y ~ x1 + x2")?;
let result = OLS::from_formula(&formula, &df, CovarianceType::HC3)?;
// β
Still works perfectly!
// New v1.3.0 capabilities (additive)
let df_with_strings = df.add_string("region", regions)?; // NEW
let clean_df = df.dropna()?; // NEW
let with_lags = df.lag("y", 1)?; // NEW
Greeners v1.0.2 brings human-readable variable names in regression output and flexible data loading from multiple sources!
Load data from CSV, JSON, URLs, or use the Builder pattern - just like pandas/polars!
// 1. CSV from URL (reproducible research!)
let df = DataFrame::from_csv_url(
"https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
)?;
// 2. JSON from local file (column or record oriented)
let df = DataFrame::from_json("data.json")?;
// 3. JSON from URL (API integration)
let df = DataFrame::from_json_url("https://api.example.com/data.json")?;
// 4. Builder pattern (most convenient!)
let df = DataFrame::builder()
.add_column("wage", vec![30000.0, 40000.0, 50000.0])
.add_column("education", vec![12.0, 16.0, 18.0])
.build()?;
// 5. CSV from local file (classic)
let df = DataFrame::from_csv("data.csv")?;
Why this matters:
π See examples/dataframe_loading.rs for all loading methods.
No more generic x0, x1, x2 in regression output! All models now display actual variable names from your Formula:
use greeners::{OLS, DataFrame, Formula, CovarianceType};
let formula = Formula::parse("wage ~ education + experience + female")?;
let result = OLS::from_formula(&formula, &df, CovarianceType::HC3)?;
println!("{}", result);
Before (v1.0.1):
OLS Regression Results
====================================
Variable Coef Std Err t P>|t|
const 5.23 0.45 11.62 0.000
x0 2.15 0.12 17.92 0.000 <- Generic names
x1 0.08 0.02 4.00 0.000
x2 -1.20 0.25 -4.80 0.000
Now (v1.0.2):
OLS Regression Results
====================================
Variable Coef Std Err t P>|t|
const 5.23 0.45 11.62 0.000
education 2.15 0.12 17.92 0.000 <- Actual variable names!
experience 0.08 0.02 4.00 0.000
female -1.20 0.25 -4.80 0.000
Applies to ALL models:
v1.3.0 includes 102 unit tests covering all major functionality:
Run tests locally:
cargo test # Run all 102 tests
cargo test --lib # Library tests only
cargo test dataframe # DataFrame-specific tests
.iter().cloned().collect() with .to_vec() for better performance.contains() instead of manual comparisonsGreeners reaches production stability with comprehensive specification tests for diagnosing regression assumptions!
Diagnose violations of classical regression assumptions and identify appropriate remedies:
use greeners::{OLS, SpecificationTests, Formula, DataFrame, CovarianceType};
// Estimate model
let model = OLS::from_formula(&Formula::parse("wage ~ education + experience")?, &df, CovarianceType::NonRobust)?;
let (y, x) = df.to_design_matrix(&formula)?;
let residuals = model.residuals(&y, &x);
let fitted = model.fitted_values(&x);
// 1. White Test for Heteroskedasticity
let (lm_stat, p_value, df) = SpecificationTests::white_test(&residuals, &x)?;
if p_value < 0.05 {
println!("Heteroskedasticity detected! Use CovarianceType::HC3");
}
// 2. RESET Test for Functional Form Misspecification
let (f_stat, p_value, _, _) = SpecificationTests::reset_test(&y, &x, &fitted, 3)?;
if p_value < 0.05 {
println!("Misspecification detected! Add polynomials or interactions");
}
// 3. Breusch-Godfrey Test for Autocorrelation
let (lm_stat, p_value, df) = SpecificationTests::breusch_godfrey_test(&residuals, &x, 1)?;
if p_value < 0.05 {
println!("Autocorrelation detected! Use CovarianceType::NeweyWest(4)");
}
// 4. Goldfeld-Quandt Test for Heteroskedasticity
let (f_stat, p_value, _, _) = SpecificationTests::goldfeld_quandt_test(&residuals, 0.25)?;
When to Use:
Remedies:
CovarianceType::HC3 or HC4CovarianceType::NeweyWest(lags)I(x^2), x1*x2 interactionsStata/R/Python Equivalents:
estat hettest, estat ovtest, estat bgodfreylmtest::bptest(), lmtest::resettest(), lmtest::bgtest()statsmodels.stats.diagnostic.het_white()π See examples/specification_tests.rs for comprehensive demonstration.
Greeners now supports R/Python-style formula syntax (like statsmodels and lm()), making model specification intuitive and concise:
use greeners::{OLS, DataFrame, Formula, CovarianceType};
// Python equivalent: smf.ols('y ~ x1 + x2', data=df).fit(cov_type='HC1')
let formula = Formula::parse("y ~ x1 + x2")?;
let result = OLS::from_formula(&formula, &df, CovarianceType::HC1)?;
All estimators support formulas: OLS, WLS, DiD, IV/2SLS, Logit/Probit, Quantile Regression, Panel Data (FE/RE/Between), and more!
π See FORMULA_API.md for complete documentation and examples.
Greeners now provides comprehensive tools for panel data model selection and information criteria-based model comparison - essential for rigorous empirical research!
Compare multiple models using AIC/BIC with automatic ranking and Akaike weights for model averaging:
use greeners::{OLS, ModelSelection, DataFrame, Formula, CovarianceType};
// Estimate competing models
let model1 = OLS::from_formula(&Formula::parse("y ~ x1 + x2 + x3")?, &df, CovarianceType::NonRobust)?;
let model2 = OLS::from_formula(&Formula::parse("y ~ x1 + x2")?, &df, CovarianceType::NonRobust)?;
let model3 = OLS::from_formula(&Formula::parse("y ~ x1")?, &df, CovarianceType::NonRobust)?;
// Compare models
let models = vec![
("Full Model", model1.log_likelihood, 4, n_obs),
("Restricted", model2.log_likelihood, 3, n_obs),
("Simple", model3.log_likelihood, 2, n_obs),
];
let comparison = ModelSelection::compare_models(models);
ModelSelection::print_comparison(&comparison);
// Calculate Akaike weights for model averaging
let aic_values: Vec<f64> = comparison.iter().map(|(_, aic, _, _, _)| *aic).collect();
let (delta_aic, weights) = ModelSelection::akaike_weights(&aic_values);
Output:
=============================== Model Comparison ===============================
Model | AIC | BIC | Rank(AIC) | Rank(BIC)
--------------------------------------------------------------------------------
Full Model | 183.83 | 191.48 | 1 | 1
Restricted | 184.77 | 190.50 | 2 | 2
Simple | 188.19 | 192.01 | 3 | 3
π AKAIKE WEIGHTS:
Ξ_AIC < 2: Substantial support
Ξ_AIC 4-7: Considerably less support
Ξ_AIC > 10: Essentially no support
Test whether pooled OLS is appropriate or if panel data methods (Fixed/Random Effects) are needed:
use greeners::{PanelDiagnostics, OLS, Formula};
// Estimate pooled OLS
let model_pooled = OLS::from_formula(&formula, &df, CovarianceType::NonRobust)?;
let (y, x) = df.to_design_matrix(&formula)?;
let residuals = model_pooled.residuals(&y, &x);
// Test for random effects
let (lm_stat, p_value) = PanelDiagnostics::breusch_pagan_lm(&residuals, &firm_ids)?;
// Interpretation:
// Hβ: ΟΒ²_u = 0 (no panel effects, pooled OLS adequate)
// Hβ: ΟΒ²_u > 0 (random effects needed)
// If p < 0.05 β Use Random Effects or Fixed Effects
// Test if firm fixed effects are significant
let (f_stat, p_value) = PanelDiagnostics::f_test_fixed_effects(
ssr_pooled,
ssr_fe,
n_obs,
n_firms,
k_params,
)?;
// Interpretation:
// Hβ: All firm effects are zero (pooled OLS adequate)
// Hβ: Firm effects exist (use fixed effects)
// If p < 0.05 β Use Fixed Effects model
Quick descriptive statistics for initial data exploration:
use greeners::SummaryStats;
let stats = SummaryStats::describe(&data);
// Returns: (mean, std, min, Q25, median, Q75, max, n_obs)
// Pretty-print summary table
let summary_data = vec![
("investment", stats_inv),
("profit", stats_profit),
("cash_flow", stats_cf),
];
SummaryStats::print_summary(&summary_data);
Stata/R/Python Equivalents:
estat ic (AIC/BIC), xttest0 (BP LM), testparm (F-test)AIC(), BIC(), plm::plmtest(), plm::pFtest()statsmodels information criteria, linearmodels.panel diagnosticsπ See examples/panel_model_selection.rs for comprehensive demonstration with panel data workflow.
After estimating Logit/Probit models, coefficients alone are hard to interpret (they're on log-odds/z-score scale). Marginal effects translate these to probability changes - essential for policy analysis and substantive interpretation!
use greeners::{Logit, Formula, DataFrame};
// Estimate Logit model
let formula = Formula::parse("admitted ~ gpa + sat + legacy")?;
let result = Logit::from_formula(&formula, &df)?;
// Get design matrix
let (_, x) = df.to_design_matrix(&formula)?;
// Calculate Average Marginal Effects (AME)
let ame = result.average_marginal_effects(&x)?;
// Interpretation: AME[gpa] = 0.15 means:
// "A 1-point increase in GPA increases admission probability by 15 percentage points"
// (averaged across all students in the sample)
Why AME?
// Calculate Marginal Effects at Means (MEM)
let mem = result.marginal_effects_at_means(&x)?;
// Interpretation: Effect for "average" student
// β οΈ Less robust than AME - can evaluate at impossible values (e.g., average of dummies)
// Predict admission probabilities for new students
let probs = result.predict_proba(&x_new);
// Example: probs[0] = 0.85 β 85% chance of admission
// Both models give similar marginal effects
let logit_result = Logit::from_formula(&formula, &df)?;
let probit_result = Probit::from_formula(&formula, &df)?;
let ame_logit = logit_result.average_marginal_effects(&x)?;
let ame_probit = probit_result.average_marginal_effects(&x)?;
// Typically: ame_logit β ame_probit (differences < 1-2 percentage points)
Stata/R/Python Equivalents:
margins, dydx(*) (AME) or margins, dydx(*) atmeans (MEM)mfx::logitmfx() or margins::margins()statsmodels.discrete.discrete_model.Logit(...).get_margeff()π See examples/marginal_effects.rs for comprehensive demonstration with college admission data.
For panel data with clustering along two dimensions (e.g., firms Γ time):
// Panel data: 4 firms Γ 6 time periods
let firm_ids = vec![0,0,0,0,0,0, 1,1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3];
let time_ids = vec![0,1,2,3,4,5, 0,1,2,3,4,5, 0,1,2,3,4,5, 0,1,2,3,4,5];
// Two-way clustering (Cameron-Gelbach-Miller, 2011)
let result = OLS::from_formula(
&formula,
&df,
CovarianceType::ClusteredTwoWay(firm_ids, time_ids)
)?;
// Formula: V = V_firm + V_time - V_intersection
// Accounts for BOTH within-firm AND within-time correlation
When to use:
Stata equivalent: reghdfe y x, vce(cluster firm_id time_id)
π See examples/two_way_clustering.rs for complete comparison of non-robust vs one-way vs two-way clustering.
Automatic dummy variable creation with R/Python syntax:
// Categorical variable: creates dummies, drops first level
let formula = Formula::parse("sales ~ advertising + C(region)")?;
let result = OLS::from_formula(&formula, &df, CovarianceType::HC3)?;
// If region has values [0, 1, 2, 3] β creates 3 dummies (drops 0 as reference)
How it works:
C(var) detects unique values in the variableNon-linear relationships made easy:
// Quadratic model: captures diminishing returns
let formula = Formula::parse("output ~ input + I(input^2)")?;
// Cubic model: more flexible
let formula = Formula::parse("y ~ x + I(x^2) + I(x^3)")?;
// Alternative syntax (Python-style)
let formula = Formula::parse("y ~ x + I(x**2)")?;
Use cases:
Combine with interactions:
// Region-specific quadratic effects
let formula = Formula::parse("sales ~ C(region) * I(advertising^2)")?;
Critical for panel data and hierarchical structures where observations are grouped:
// Panel data: firms over time
let cluster_ids = vec![0,0,0, 1,1,1, 2,2,2]; // Firm IDs
let result = OLS::from_formula(&formula, &df, CovarianceType::Clustered(cluster_ids))?;
Use clustered SE when:
New diagnostic tools for model validation:
use greeners::Diagnostics;
// Multicollinearity detection
let vif = Diagnostics::vif(&x)?; // Variance Inflation Factor
let cond_num = Diagnostics::condition_number(&x)?; // Condition Number
// Influential observations
let leverage = Diagnostics::leverage(&x)?; // Hat values
let cooks_d = Diagnostics::cooks_distance(&residuals, &x, mse)?; // Cook's Distance
// Assumption testing (already available)
let (jb_stat, jb_p) = Diagnostics::jarque_bera(&residuals)?; // Normality
let (bp_stat, bp_p) = Diagnostics::breusch_pagan(&residuals, &x)?; // Heteroskedasticity
let dw_stat = Diagnostics::durbin_watson(&residuals); // Autocorrelation
Model interaction effects with R/Python syntax:
// Full interaction: x1 * x2 expands to x1 + x2 + x1:x2
let formula = Formula::parse("wage ~ education * female")?;
let result = OLS::from_formula(&formula, &df, CovarianceType::HC3)?;
// Interaction only: just the product term
let formula2 = Formula::parse("wage ~ education + female + education:female")?;
Use cases:
// HC2: Leverage-adjusted (more efficient with small samples)
let result_hc2 = OLS::from_formula(&formula, &df, CovarianceType::HC2)?;
// HC3: Jackknife (most robust - RECOMMENDED for small samples)
let result_hc3 = OLS::from_formula(&formula, &df, CovarianceType::HC3)?;
Comparison:
// Out-of-sample predictions
let x_new = Array2::from_shape_vec((3, 2), vec![1.0, 12.0, 1.0, 16.0, 1.0, 20.0])?;
let predictions = result.predict(&x_new);
// In-sample fitted values
let fitted = result.fitted_values(&x);
// Residuals
let resid = result.residuals(&y, &x);
sudo apt-get update
sudo apt-get install gfortran libopenblas-dev liblapack-dev pkg-config build-essential
sudo dnf install gcc-gfortran openblas-devel lapack-devel pkg-config
sudo pacman -S gcc-fortran openblas lapack base-devel
brew install openblas lapack
Greeners automatically detects column types when loading data from CSV or JSON - including a unique feature not found in pandas, R, polars, or Stata!
Greeners is the only econometrics library that automatically detects any column with exactly 2 unique values as Boolean, regardless of the actual values:
// CSV with binary variables in ANY language:
// id,estado_civil,sexo,aprovado,status
// 1,casado,M,sim,ativo
// 2,solteiro,F,nΓ£o,inativo
// 3,casado,M,sim,ativo
let df = DataFrame::from_csv("survey.csv")?;
// β¨ ALL binary columns automatically detected as Bool!
let civil = df.get_bool("estado_civil")?; // ['casado', 'solteiro'] β Bool β
let gender = df.get_bool("sexo")?; // ['M', 'F'] β Bool β
let approved = df.get_bool("aprovado")?; // ['sim', 'nΓ£o'] β Bool β
let status = df.get_bool("status")?; // ['ativo', 'inativo'] β Bool β
// Mapping is alphabetical: first β false, second β true
// 'casado' β false, 'solteiro' β true
// 'F' β false, 'M' β true
// 'nΓ£o' β false, 'sim' β true
Why this matters for econometrics:
How other tools handle this:
| Tool | Binary Detection | Example |
|---|---|---|
| pandas | β Manual conversion required | df['civil'] = df['civil'].map({'casado': 0, 'solteiro': 1}) |
| R | β Manual conversion required | df$civil <- as.numeric(df$civil == 'solteiro') |
| polars | β Manual conversion required | df.with_columns(pl.col('civil').cast(pl.Boolean)) # Fails! |
| Stata | β Manual encoding required | encode estado_civil, gen(civil_dummy) |
| Greeners | β AUTOMATIC! | df.get_bool("estado_civil")? # Just works! |
π See examples/test_binary_bool_detection.rs for comprehensive demonstration.
true/false, yes/no, t/f, 1/0 β Bool
['casado', 'solteiro'], ['M', 'F'], etc. β Bool β1, 42, -10 β Int (i64)1.5, 3.14, 1.0 β Float (f64) or Int if no fractional part2024-01-15 10:30:00, 2024-01-15T10:30:00 β DateTimeCategoricalString// CSV with mixed types
// id,name,created_at,amount,active,region
// 1,Alice,2024-01-15 10:30:00,100.50,true,North
// 2,Bob,2024-01-16 14:45:00,200.75,false,South
let df = DataFrame::from_csv("data.csv")?;
// Automatic type detection:
let id = df.get_int("id")?; // Int (not Float!)
let name = df.get_string("name")?; // String (high uniqueness)
let timestamp = df.get_datetime("created_at")?; // DateTime
let amount = df.get("amount")?; // Float (has decimals)
let active = df.get_bool("active")?; // Bool
let region = df.get_categorical("region")?; // Categorical (repeated values)
// These CSV values:
// pure_int: 1, 2, 3 β Int (parsed as integers)
// as_float: 1.0, 2.0 β Int (no fractional part)
// decimal: 1.5, 2.7 β Float (has fractional part)
YYYY-MM-DD HH:MM:SS (e.g., 2024-01-15 10:30:00)YYYY-MM-DDTHH:MM:SS (ISO-8601, e.g., 2024-01-15T10:30:00)YYYY-MM-DD HH:MM:SS.fff (with milliseconds)YYYY-MM-DDTHH:MM:SS.fffπ See examples/test_improved_type_detection.rs for comprehensive examples.
Add this to your Cargo.toml:
[dependencies]
greeners = "1.3.1"
ndarray = "0.17"
# Note: You must have a BLAS/LAPACK provider installed on your system
ndarray-linalg = { version = "0.18", features = ["openblas-system"] }
Greeners provides flexible data loading similar to pandas/polars - from local files, URLs, or manual construction:
use greeners::{DataFrame, Formula, OLS, CovarianceType};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load data from CSV file with headers (just like pandas!)
let df = DataFrame::from_csv("data.csv")?;
// Specify model using formula
let formula = Formula::parse("y ~ x1 + x2")?;
// Estimate with robust standard errors
let result = OLS::from_formula(&formula, &df, CovarianceType::HC1)?;
println!("{}", result);
Ok(())
}
// Load data directly from GitHub or any URL
let df = DataFrame::from_csv_url(
"https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
)?;
// Perfect for reproducible research and shared datasets!
// Column-oriented JSON (like pandas.to_json(orient='columns'))
// { "x": [1.0, 2.0, 3.0], "y": [2.0, 4.0, 6.0] }
let df = DataFrame::from_json("data_columns.json")?;
// Or record-oriented JSON (like pandas.to_json(orient='records'))
// [{"x": 1.0, "y": 2.0}, {"x": 2.0, "y": 4.0}]
let df = DataFrame::from_json("data_records.json")?;
// Load JSON directly from APIs or URLs
let df = DataFrame::from_json_url("https://api.example.com/data.json")?;
// Most convenient for manual data construction
let df = DataFrame::builder()
.add_column("wage", vec![30000.0, 40000.0, 50000.0])
.add_column("education", vec![12.0, 16.0, 18.0])
.add_column("experience", vec![5.0, 7.0, 10.0])
.build()?;
let formula = Formula::parse("wage ~ education + experience")?;
let result = OLS::from_formula(&formula, &df, CovarianceType::HC3)?;
Supported formats:
π See examples/dataframe_loading.rs for comprehensive demonstration of all loading methods.
use greeners::{OLS, DataFrame, Formula, CovarianceType};
use ndarray::Array1;
use std::collections::HashMap;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create data manually (like a pandas DataFrame)
let mut data = HashMap::new();
data.insert("y".to_string(), Array1::from(vec![1.0, 2.0, 3.0, 4.0, 5.0]));
data.insert("x1".to_string(), Array1::from(vec![1.0, 2.0, 3.0, 4.0, 5.0]));
data.insert("x2".to_string(), Array1::from(vec![2.0, 2.5, 3.0, 3.5, 4.0]));
let df = DataFrame::new(data)?;
// Specify model using formula (just like Python/R!)
let formula = Formula::parse("y ~ x1 + x2")?;
// Estimate with robust standard errors
let result = OLS::from_formula(&formula, &df, CovarianceType::HC1)?;
println!("{}", result);
Ok(())
}
use greeners::{OLS, CovarianceType};
use ndarray::{Array1, Array2};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let y = Array1::from(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
let x = Array2::from_shape_vec((5, 2), vec![
1.0, 2.0,
2.0, 2.5,
3.0, 3.0,
4.0, 3.5,
5.0, 4.0,
])?;
let result = OLS::fit(&y, &x, CovarianceType::HC1)?;
println!("{}", result);
Ok(())
}
use greeners::{DiffInDiff, DataFrame, Formula, CovarianceType};
// Python: smf.ols('outcome ~ treated + post + treated:post', data=df).fit(cov_type='HC1')
let formula = Formula::parse("outcome ~ treated + post")?;
let result = DiffInDiff::from_formula(&formula, &df, "treated", "post", CovarianceType::HC1)?;
use greeners::{IV, Formula, CovarianceType};
// Endogenous equation: y ~ x1 + x_endog
// Instruments: z1, z2
let endog_formula = Formula::parse("y ~ x1 + x_endog")?;
let instrument_formula = Formula::parse("~ z1 + z2")?;
let result = IV::from_formula(&endog_formula, &instrument_formula, &df, CovarianceType::HC1)?;
use greeners::{Logit, Probit, Formula};
// Binary choice models
let formula = Formula::parse("binary_outcome ~ x1 + x2 + x3")?;
let logit_result = Logit::from_formula(&formula, &df)?;
let probit_result = Probit::from_formula(&formula, &df)?;
use greeners::{FixedEffects, Formula};
let formula = Formula::parse("y ~ x1 + x2")?;
let result = FixedEffects::from_formula(&formula, &df, &entity_ids)?;
use greeners::{QuantileReg, Formula};
// Median regression
let formula = Formula::parse("y ~ x1 + x2")?;
let result = QuantileReg::from_formula(&formula, &df, 0.5, 200)?;
y ~ x1 + x2 + x3 (with intercept)y ~ x1 + x2 - 1 or y ~ 0 + x1 + x2y ~ 1All formulas follow R/Python syntax for familiarity and ease of use.
string_features.rs - String column support (NEW v1.3.0!)missing_data_features.rs - Missing data toolkit (NEW v1.3.0!)time_series_features.rs - Time series operations: lag, lead, diff, pct_change (NEW v1.3.0!)dataframe_loading.rs - Load data from CSV, JSON, URLs, or Builder patterncsv_formula_example.rs - Load CSV files and run regressionsformula_example.rs - General formula API demonstrationdid_formula_example.rs - Difference-in-Differences with formulasquickstart_formula.rs - Quick start examplemarginal_effects.rs - Logit/Probit marginal effects (AME/MEM)specification_tests.rs - White, RESET, Breusch-Godfrey, Goldfeld-Quandt testspanel_model_selection.rs - Panel diagnostics and model comparisonRun examples:
# NEW v1.3.0 examples
cargo run --example string_features # String columns
cargo run --example missing_data_features # Missing data handling
cargo run --example time_series_features # Time series operations
# Other examples
cargo run --example dataframe_loading
cargo run --example csv_formula_example
cargo run --example formula_example
cargo run --example marginal_effects
cargo run --example specification_tests