Crates.io | temex |
lib.rs | temex |
version | 0.10.0 |
source | src |
created_at | 2023-06-27 01:25:52.873738 |
updated_at | 2023-06-27 01:25:52.873738 |
description | Regex-like temporal expressions for evaluating systems that change over time |
homepage | |
repository | https://gitlab.com/kevinwaynesmith/temex/ |
max_upload_size | |
id | 900882 |
size | 64,538 |
temex
- Regex-like temporal expressions for evaluating systems that change over timeThis library implements temexes, which are structures that combine formulas of propositional logic with regular expression operators to provide a temporal logic designed for simplicity and ease of use.
Temporal logic has useful applications in a wide range of contexts, in and outside of computer science. A detailed examination is beyond the scope of this document, but a good introduction is available here. Although temporal logic has many use cases, most tools that implement some form of temporal logic are designed for formal methods specialists and logicians, not for a general audience. As a result, they can seem arcane and unapproachable to a layman.
The purpose of temex
is to provide a simple tool
that allows generalist software engineers and data analysts to harness the power
of temporal logic. It is designed to be easy to understand and use, requiring only
a basic knowledge of propositional logic and regular expressions.
A truth assignment is a mapping of boolean variables to truth values (i.e., true
or
false
). When all the boolean variables that occur in a formula of propositional logic are
assigned a value, then that formula evaluates to either true
or false
. We can think of a
formula as defining a set of truth assignments: namely, the truth assignments that make that
formula evaluate to true
.
A regular expression is either a character, or it is a composite expression in which subexpressions are joined by regex operators. A regular expression defines a set of sequences of characters: those sequences that match the regular expression.
We can think of truth assignments as characters, and we can think of a propositional
formula as a regular expression in which the truth assignments it defines are joined by
alternation (i.e., the regex operator |
). So, for instance,
the formula not (p and q)
can be thought of as {p = true, q = false} | {p = false, q = true} | {p = false, q = false}
.
If we consider truth assignments as characters and propositional formulas as alternations over these characters, then if we join propositional formulas with regex operators and interpret the result according to the ordinary semantics of regular expressions we have something that defines a sequence of truth assignments. We call this a temex, for temporal expression.
We can use a temex to search sequences of truth assignments (henceforth called traces) in
much the same way we can use regular expressions to search strings. temex
implements this
functionality.
Temex atoms are propositional formulas enclosed in square brackets ([
and ]
). They may be
joined with regex operators in the same ways characters can be joined with regex operators in a
regular expression.
A propositional formula may be a boolean variable (which begins with an alphabetic character and
may contain alphabetic or numeric characters as well as underscores), or a composite formula,
in which subformulas are joined with propositional connectives. temex
uses the English words
not
, and
, and or
for negation, conjunction, and disjunction; not
has the highest
operator precedence, while and
and or
have equal precedence and are right associative.
Examples:
f123
is_sunny and not is_raining
cpu_GT_80 and not (mem_LT_40 or io_LT_20)
temex
's regex operators inherit the syntax of the regex
crate, which temex
uses to implement its regex operators.
In the following forms, phi
, phi_1
, and phi_2
are propositional formulas, and n
and m
are natural numbers:
[phi_1][phi_2] concatenation (phi_1 followed by phi_2) [phi_1]|[phi_2] alternation (phi_1 or phi_2, prefer phi_1) [phi]* zero or more of phi (greedy) [phi]+ one or more of phi (greedy) [phi]? zero or one of phi (greedy) [phi]*? zero or more of phi (ungreedy/lazy) [phi]+? one or more of phi (ungreedy/lazy) [phi]?? zero or one of phi (ungreedy/lazy) [phi]{n,m} at least n phi and at most m phi (greedy) [phi]{n,} at least n phi (greedy) [phi]{n} exactly n phi [phi]{n,m}? at least n phi and at most m phi (ungreedy/lazy) [phi]{n,}? at least n phi (ungreedy/lazy) [phi]{n}? exactly n phi
The following trace formats are supported:
polars dataframes in which all of the columns are boolean. Column names are interpreted as the names of boolean variables and the rows of the dataframe are interpreted as truth assignments.
CSVs encoded as UTF8 strings or files. The first line should contain the
names of boolean variables and the subsequent lines should contain values of either 1
or 0
,
for true
or false
, respectively. The lines after the header are interpreted as truth
assignments.
When matching against a trace, the variable names in the temex
pattern and the trace must be
the same; if there are extra variables in the trace then searching will result in an error.
Prior to performing any search operation, the temex pattern and trace must be compiled. Compiling the temex pattern is done with Temex::new, while compiling the trace is done with TemexTrace::try_from. Searches may be performed using methods on the temex; a complete listing is available here.
is_match
This example shows how to perform a temex
search in a
polars DataFrame. The DataFrame must be converted to a
TemexTrace before searching. The example also shows how to convert a TemexTrace back to a
DataFrame. Note the use of ^
and $
, which are anchors that match to the beginning and end
of the trace, respectively:
use polars::df;
use polars::prelude::*;
use temex::{Temex, TemexTrace};
let df = df! [
"p1" => [true, true, true],
"p2" => [false, false, false],
"p3" => [true, false, true]
]
.unwrap();
let trace = TemexTrace::try_from(df.clone()).unwrap();
let te = Temex::new("^[p1 and (p2 or p3)][p1][p1 or p2 and not p3]$").unwrap();
assert!(te.is_match(&trace).unwrap());
// We can convert the TemexTrace back to a DataFrame
let df_from_trace = polars::frame::DataFrame::try_from(trace).unwrap();
assert_eq!(df, df_from_trace);
find
Here we use the find
method to find the range of the left-most first match in the trace:
let trace_str = "CPU_core1_GT_80, CPU_core2_GT_80, mem_usage_GT_40\n\
0, 0, 0\n\
1, 0, 0\n\
0, 1, 0\n\
0, 0, 1\n\
1, 0, 1\n\
0, 1, 1\n\
1, 1, 0\n\
1, 1, 0\n\
1, 1, 0\n";
let trace = trace_str.parse::<TemexTrace>().unwrap();
let te = Temex::new("[CPU_core1_GT_80 and CPU_core2_GT_80 and not mem_usage_GT_40]+").unwrap();
assert_eq!(te.find(&trace).unwrap().unwrap().range(), 6..9);
find_iter
In this example we see how the find_iter
method can be used to find multiple nonoverlapping
matches within a trace:
let trace_str = "is_sunny, is_windy\n\
1, 0\n\
1, 0\n\
1, 0\n\
0, 0\n\
1, 0\n\
1, 0\n\
1, 0\n\
1, 1\n\
1, 0\n\
1, 0\n\
1, 0\n";
let trace = TemexTrace::try_from(trace_str).unwrap();
let te = Temex::new("[is_sunny and not is_windy]{3}").unwrap();
for te_match in te.find_iter(&trace) {
println!("{:?}", te_match.range());
}
// Output:
// 0..3
// 4..7
// 8..11
Copyright (c) 2023 Kevin Wayne Smith
This project is licensed under either of
Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
MIT License (https://opensource.org/licenses/MIT)
at your option.