| Crates.io | fiasto |
| lib.rs | fiasto |
| version | 0.2.7 |
| created_at | 2025-09-04 02:07:09.176522+00 |
| updated_at | 2025-09-10 01:32:56.307933+00 |
| description | High-performance modern Wilkinson's formula parsing for statistical models. Parses R-style formulas into structured JSON metadata supporting linear models, mixed effects, and complex statistical specifications. |
| homepage | https://github.com/alexhallam/fiasto |
| repository | https://github.com/alexhallam/fiasto |
| max_upload_size | |
| id | 1823530 |
| size | 418,077 |
Pronouned like fiasco, but with a t instead of an c
(F)ormulas (I)n (AST) (O)ut
A Language-Agnostic modern Wilkinson's formula parser and lexer.
This library is in test and actively changing.
Formula parsing and materialization is normally done in a single library.
Python, for example, has patsy/formulaic/formulae which all do parsing & materialization.
R's model.matrix also handles formula parsing and design matrix creation.
There is nothing wrong with this coupling. I wanted to try decoupling the parsing and materialization.
I thought this would allow a focused library that could be used in multiple languages or dataframe libraries.
This package has a clear path, to parse and/or lex formulas and return structured JSON metadata.
Note: Technically an AST is not returned. A simplified/structured intermediate representation (IR) in the form of json is returned. This json IR ought to be easy for many language bindings to use.
The library exposes a clean, focused API:
parse_formula() - Takes a Wilkinson's formula string and returns structured JSON metadatalex_formula() - Tokenizes a formula string and returns JSON describing each token
"Only two functions?! What kind of library is this?!"
An easy to maintain library with a small surface area. The best kind.The parser returns a variable-centric JSON structure where each variable is described with its roles, transformations, interactions, and random effects. This makes it easy to understand the complete model structure and generate appropriate design matrices. wayne is a python package that can take this JSON and generates design matrices for use in statistical modeling.
y ~ 1 and y ~ 0 formulas with proper metadata generationbind(y1, y2) ~ x formulas with multiple response variables&str) where possibleFormula Validation: Check if formulas are valid against datasets before expensive computation
Cross-Platform Model Specs: Define models once, implement in multiple statistical frameworks
Intercept-Only Models: Support for null models like y ~ 1 and y ~ 0 for baseline comparisons
Multivariate Models: Support for multiple response variables like bind(y1, y2) ~ x for joint modeling
I can't think of every kind of formula that could be parsed. I do have a checklist to start with.
To my knowldege the brms formula syntax is the most complex and possibly the most complete.
I would like to start with this as a baseline then continue to extend as needed.
I also offer a clean_name for each parameter. This will all a materializer to use a simpler name for the parameter.
Polynomials for example would result in names like x1_poly_1 or x1_poly_2 as opposed to [s]^2. I keep clean_names in camel case.
y ~ 1 -> y ~ 1 (null model with intercept)
y ~ 0 -> y ~ 0 (null model without intercept)
bind(y1, y2) ~ x -> bind(y1, y2) ~ x (multivariate response model)
y ~ x1*x2 + s(z) + (1+x1|1) + (1|g2) - 1 -> y ~ x1 * x2 + s(z) + (1 + x1 | 1) + (1 | g2) - 1
sigma:y ~ x1*x2 + s(z) + (1+x1|1) + (1|g2), sigma ~ x1 + (1|g2) -> y ~ x1 * x2 + s(z) + (1 + x1 | 1) + (1 | g2) and sigma ~ x1 + (1 | g2)
y ~ a1 - a2^x, a1 + a2 ~ 1, nl = TRUE)
y ~ a1 - a2^x
a1 ~ 1
a2 ~ 1
y ~ a1 - a2^x, a1 ~ 1, a2 ~ x + (x|g), nl = TRUE)
y ~ a1 - a2^x
a1 ~ 1
a2 ~ x + (x | g)
y ~ a1 - a2^x, a1 ~ 1 + (1 |2| g), a2 ~ x + (x |2| g), nl = TRUE)
y ~ a1 - a2^x
a1 ~ 1 + (1 | 2 | g)
a2 ~ x + (x | 2 | g)
y ~ a1 - a2^x, a1 ~ 1 + (1 | gr(g, id = 2)), a2 ~ x + (x | gr(g, id = 2)), nl = TRUE)
y ~ a1 - a2^x
a1 ~ 1 + (1 | gr(g, id = 2))
a2 ~ x + (x | gr(g, id = 2))
mvbind(y1, y2) ~ x * z + (1|g)
y1 ~ x * z + (1 | g)
y2 ~ x * z + (1 | g)
y ~ x * z + (1+x|ID1|g), zi ~ x + (1|ID1|g))
y ~ x * z + (1 + x | ID1 | g)
zi ~ x + (1 | ID1 | g)
y ~ mo(x) + more_predictors)
y ~ mo(x) + more_predictors
y ~ cs(x) + more_predictors)
y ~ cs(x) + more_predictors
y ~ cs(x) + (cs(1)|g))
y ~ cs(x) + (cs(1) | g)
y ~ person + item, disc ~ item)
y ~ person + item
disc ~ item
disc ~ item
y ~ me(x, sdx))
y ~ me(x, sdx)
rt | dec(decision) ~ x, bs ~ x, ndt ~ x, bias ~ x)
rt | dec(decision) ~ x
bs ~ x
ndt ~ x
bias ~ x
rt | dec(decision) ~ x, bias = 0.5)
rt | dec(decision) ~ x
bias = 0.5
mix <- mixture(gaussian, gaussian)
mix <- mixture(gaussian, gaussian)
y ~ 1, mu1 ~ x, mu2 ~ z, family = mix)
y ~ 1
mu1 ~ x
mu2 ~ z
y ~ x, sigma2 = "sigma1", family = mix)
y ~ x
sigma2 = sigma1
(y ~ 1) +nlf(sigma ~ a * exp(b * x), a ~ x) + lf(b ~ z + (1|g), dpar = "sigma") + gaussian()
y ~ 1
sigma ~ a * exp(b * x)
a ~ x
b ~ z + (1 | g)
(y1 ~ x + (1|g)) + gaussian() + cor_ar(~1|g) + bf(y2 ~ z) + poisson()
y1 ~ x + (1 | g)
autocor ~ arma(time = NA, gr = g, p = 1, q = 0, cov = FALSE)
y2 ~ z
(y1 ~ 1 + x + (1|c|obs), sigma = 1) + gaussian()
y2 ~ 1 + x + (1|c|obs)) + poisson()
bmi ~ age * mi(chl)) + bf(chl | mi() ~ age) + set_rescor(FALSE)
bmi ~ age * mi(chl)
chl | mi() ~ age
y ~ eta, nl = TRUE) + lf(eta ~ 1 + x) + nlf(sigma ~ tau * sqrt(eta)) + lf(tau ~ 1)
y ~ eta
eta ~ 1 + x
sigma ~ tau * sqrt(eta)
tau ~ 1
(y1 ~ x + (1|g) + (y2 ~ s(z))
y1 ~ x + (1 | g)
y2 ~ s(z)
y ~ x + (1 | g), fill = "mean"
For detailed documentation, see gr() Function Documentation.