Signal Temporal Logic Control Synthesis among Uncontrollable Dynamic Agents with Conformal Prediction arXiv:2312.04242v1 [eess.SY] 7 Dec 2023 Xinyi Yu, Yiqi Zhao, Xiang Yin, and Lars Lindemann ∗ Abstract The control of dynamical systems under temporal logic specifications among uncontrollable dynamic agents is challenging due to the agents’ a-priori unknown behavior. Existing works have considered the problem where either all agents are controllable, the agent models are deterministic and known, or no safety guarantees are provided. We propose a predictive control synthesis framework that guarantees, with high probability, the satisfaction of signal temporal logic (STL) tasks that are defined over the system and uncontrollable stochastic agents. We use trajectory predictors and conformal prediction to construct probabilistic prediction regions for each uncontrollable agent that are valid over multiple future time steps. Specifically, we reduce conservatism and increase data efficiency compared to existing works by constructing a normalized prediction region over all agents and time steps. We then formulate a worstcase mixed integer program (MIP) that accounts for all agent realizations within the prediction region to obtain control inputs that provably guarantee task satisfaction with high probability. To efficiently solve this MIP, we propose an equivalent MIP program based on KKT conditions of the original one. We illustrate our control synthesis framework on two case studies. 1 Introduction Consider the following scenarios in which autonomous dynamical systems need to operate safely among uncontrollable dynamical agents: mobile service robots operating in pedestrian-rich environments, self-driving cars navigating through dense urban traffic, and leader-follower systems such as truck platoons reducing air drag to improve fuel economy. All these scenarios have in common that the autonomous system needs to satisfy complex safety-critical specifications in the presence of uncontrollable agents. Achieving these specifications is challenged by the fact that communication is prohibited, limited and unstable, or only one-directional, while the agents’ intentions may not always be known. In this setting, the autonomous system needs to make predictions about the behavior of the dynamic agents and select safe actions accordingly. In this work, we propose a predictive control synthesis framework with probabilistic safety for complex temporal logic specifications. Control of multi-agent systems under temporal logic specifications has attracted much attention, see e.g., [1–8]. However, the agent dynamics here are usually known and agents are assumed to be ∗ This work was supported by the National Natural Science Foundation of China (62061136004, 62173226, 61833012). Xinyi Yu, Yiqi Zhao and Lars Lindemann are with Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA. e-mail: {xinyi.yu12, yiqizhao, llindema}@usc.edu. Xiang Yin is with Department of Automation and Key Laboratory of System Control and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, China. e-mail: yinxiang@sjtu.edu.cn. collaborative. In this paper, we are interested in control synthesis among uncontrollable dynamic stochastic agents under temporal logic tasks, and specifically in settings where communication is limited and no information about agent intentions and models is available. This problem has been studied in the literature, e.g., using sampling-based approaches [9, 10], game theory [11], supervisory control theory [12], or optimization-based methods [13–15]. These works, however, only consider qualitative temporal logic specifications such as linear temporal logic, while we are here interested in quantitative temporal logics that allow to measure the quality of task satisfaction. More importantly, these works make simplifying assumptions on the models of the agents such as being linear or having Gaussian noise distributions. Recent works in [16–19] make no such assumptions and instead use learning-enabled trajectory predictors and conformal prediction to obtain probabilistic prediction regions that can be used for downstream control. However, these works do not consider complex specifications expressed in temporal logics. Contribution. We consider the signal temporal logic (STL) control synthesis problem in the presence of uncontrollable dynamic agents without making any assumptions on the agents’ dynamics and distribution. Specifically, we make the following contributions: • We use trajectory predictors and conformal prediction to construct probabilistic prediction regions that contain the trajectories of uncontrollable agents. Compared to existing works, we reduce conservatism and increase data efficiency to promote efficient long horizon control with multiple agents. • We propose a mixed integer program (MIP) based on a worst case formulation of the STL constraints over all uncontrollable agent trajectories within these prediction regions. We present an equivalent MIP program based on KKT conditions of the original MIP to efficiently compute control inputs that provably guarantee task satisfaction with high probability. • We validate and illustrate the efficacy of our algorithms on temperature control and robot motion planning problems. Organization. After discussing related work in Section 1.1, we present preliminaries and the problem definition in Section 2. We show how to compute prediction regions for multiple uncontrollable agents using conformal prediction in Section 3 and present a predictive STL control synthesis framework in Section 4. We present simulations in Section 5 and conclude in Section 6. 1.1 Related Work Quantitative temporal logics such as signal temporal logic (STL) [20] have increasingly been used in control. The reason for their success in control is their ability to reason over the robustness of a control system with respect to the specification via the STL quantitative semantics [21, 22]. The quantitative semantics of STL provide a natural control objective that can be maximized, e.g., via gradient-based optimization [23–26], mixed integer programming [27–30], or control barrier functions [31–33]. While STL control synthesis problems have been well studied for deterministic systems, the problem becomes more challenging for stochastic systems. There are various approaches that ensure probabilistic task satisfaction, e.g., via mixed integer programming [34–36], sampling-based approaches [37–39], and stochastic control barrier functions [40]. Another recent line of work considers the minimization of mathematical risk of STL specifications [41–43]. In addition, there are also some related approaches for reactive STL control synthesis, see e.g., [44–47]. Conceptually closest to our work are [16, 17, 19] where the authors apply conformal prediction to obtain probabilistic prediction regions of uncontrollable dynamic stochastic agents. Conformal 2 prediction is a lightweight statistical tool for uncertainty quantification, originally introduced in [48, 49], that has recently attracted attention within the machine learning community, see e.g., [50, 51] for up-to-date tutorials, and [52, 53] for more recent applications. Compared to [16, 17, 19], we consider complex signal temporal logic specifications as objectives. Importantly, this requires the data-efficient computation of non-conservative probabilistic prediction regions to promote long horizon control with multiple agents. 2 Preliminaries & Problem Definition We consider an autonomous system, e.g., a robotic system or an autonomous car, that is described by the discrete-time dynamics xk+1 = f (xk , uk ), x0 ∈ X (1) where the function f : X ×U → X is defined over the state and input domains X and U, respectively. Here, x0 is the known initial state and xk ∈ X ⊂ Rnx and uk ∈ U ⊂ Rm are the nx -dimensional state and the m-dimensional control input at time k, respectively. To synthesize control sequences later, let us also define uk:Tϕ −1 := uk . . . uTϕ −1 and xk+1:Tϕ := xk+1 . . . uTϕ . The system in equation (1) operates among N uncontrollable and stochastic dynamic agents, e.g., autonomous or remote-controlled robots or humans. We model the state of agent i at time k as a random variable Yk,i ∈ Yi ⊂ Rny,i with state domain Yi , and we define the random state of all agents as Yk := [Yk,1 , . . . , Yk,N ] ∈ Y ⊂ Rny with ny := ny,1 + . . . + ny,N and Y := Y1 × . . . × YN . We do not assume to have any a-priori knowledge of the agent dynamics and intentions, and since the agents are uncontrollable, their trajectories are a-priori unknown. However, we assume that their trajectories follow an unknown distribution D, i.e., Y := (Y0 , Y1 , . . . ) ∼ D. We further assume that we have access to K̄ realizations from D. (j) (j) Assumption 1 We have access to K̄ independent trajectories Y (j) := (Y1 , Y2 , . . . ) from the distribution D. We split the trajectories into calibration and training sets Dcal := {Y (1) , . . . , Y (K) } and Dtrain := {Y (K+1) , . . . , Y (K̄) }, respectively. Modeling dynamic agents by a distribution D includes general classes of systems such as stochastic hybrid systems and Markov decision processes. We make the restriction that D does not depend on the behavior of the autonomous system in (1), i.e., D does not depend on x. Similar modeling assumptions have been made in [16] and are justified in many applications, e.g., when conservative control actions are taken in autonomous driving that do only alter the behavior of pedestrians minimally. By using robust or adaptive versions of conformal prediction, see e.g., [54, 55], we can relax this assumption. Lastly, Assumption 1 is not restrictive in most applications, e.g., past trajectory data of dynamic agents is available in robotic or transportation systems. In this paper, we consider the control of the system in (1) under temporal logic specifications that are jointly defined over the system and the uncontrollable agents. Therefore, we define their joint state at time k as Sk := [xk , Yk ] ∈ X × Y ⊂ Rn with n := nx + ny and the random trajectory S := S0 , S1 , . . .. 3 2.1 Signal Temporal Logic Control Synthesis among Uncontrollable Agents In this paper, we use signal temporal logic to describe the tasks defined over the system in (1) and uncontrollable agents. Signal temporal logic was originally introduced in [20], and we here only give a brief introduction. The syntax of STL formulae is as follows ϕ ::= ⊤ | π µ | ¬ϕ | ϕ1 ∧ ϕ2 | ϕ1 U[a,b] ϕ2 , (2) where ⊤ is the true symbol (⊥ denotes the false symbol) and π µ : Rn → {⊤, ⊥} is a predicate whose truth value is determined by the sign of an underlying predicate function µ : Rn → R, i.e., for the state s ∈ Rn we have that π µ (s) := ⊤ if and only if µ(s) ≥ 0. The symbols ¬ and ∧ denote the standard Boolean operators “negation” and “conjunction”, respectively, and U[a,b] is the temporal operator “until ” with a, b ∈ Z≥0 with Z≥0 being the natural numbers. These operators can be used to define “disjunction” by ϕ1 ∨ ϕ2 := ¬(¬ϕ1 ∧ ¬ϕ2 ), “implication” by ϕ1 → ϕ2 := ¬ϕ1 ∨ ϕ2 , “eventually” by F[a,b] ϕ := ⊤U[a,b] ϕ, and “always” by G[a,b] ϕ := ¬F[a,b] ¬ϕ. STL Semantics. STL formulae are evaluated over trajectories s := s0 s1 · · · where s could be a realization of S. The notation (s, k) |= ϕ denotes that s satisfies the STL formula ϕ at time k. Particularly, we have (s, k) |= π µ iff µ(sk ) ≥ 0, (s, k) |= ¬ϕ iff (s, k) ̸|= ϕ, (s, k) |= ϕ1 ∧ ϕ2 iff (s, k) |= ϕ1 and (s, k) |= ϕ2 , and (s, k) |= ϕ1 U[a,b] ϕ2 iff ∃k ′ ∈ [k + a, k + b] such that (s, k ′ ) |= ϕ2 and ∀k ′′ ∈ [k, k ′ ], we have (s, k ′′ ) |= ϕ1 , i.e., ϕ2 will hold some time between [k + a, k + b] in the future and until then ϕ1 always holds. We write s |= ϕ instead of (s, 0) |= ϕ for simplicity. STL Quantitative Semantics. In addition to these qualitative Boolean semantics, we can also define quantitative semantics ρϕ (s, k) ∈ R that describe how well ϕ is satisfied [56]. Formally, we have ρ⊤ (s, k) := ∞ µ ρπ (s, k) := µ(sk ) ρ¬ϕ (s, k) := −ρϕ (s, k) ρϕ1 ∧ϕ2 (s, k) := min(ρϕ1 (s, k), ρϕ2 (s, k)) ρϕ1 U[a,b] ϕ2 (s, k) := max min(ρϕ2 (s, k ′ ), min ρϕ1 (s, k ′′ )). k′ ∈[k+a,k+b] k′′ ∈[k,k′ ] Problem Formulation. In this paper, we consider bounded STL formulae ϕ where time intervals [a, b] in ϕ are bounded. The satisfaction of a bounded STL formula ϕ can be computed over bounded trajectories. Specifically, we require trajectories with a minimum length of Tϕ where Tϕ is the formula horizon formally defined in [57, 58]. For example, for ϕ := F[3,8] G[1,2] π µ , we have Tϕ = 10. We additionally assume that ϕ is in positive normal form, i.e., the formula ϕ contains no negations, and is build from convex and differentiable predicate functions µ. The former assumption is without loss of generality as every STL formula can be re-written into an equivalent STL formula that is in positive normal form [28, 58]. The latter assumption is made for computational reasons and can be relaxed. We summarize these assumptions next. Assumption 2 We consider bounded STL formulae ϕ in positive normal form. We further assume that all predicate functions µ in ϕ are convex and differentiable with respect to Yτ . We impose specifications ϕ that satisfy Assumption 2 over the random trajectory S, i.e., over the system trajectory xk and the random trajectories of uncontrollable dynamic agents Yk . Formally, we consider the following problem in this paper. 4 Problem 1 Given the system in (1), the random trajectory of uncontrollable agents Y := (Y0 , Y1 , . . . ) ∼ D, a set of trajectories Dcal and Dtrain that satisfy Assumption 1, an STL formula ϕ from (2) that satisfies Assumption 2, and the failure probability δ ∈ (0, 1), compute the control inputs uk for all k ∈ {0, . . . , Tϕ − 1} such that the STL formula is satisfied with a probability of at least 1 − δ, i.e., Prob(S |= ϕ) ≥ 1 − δ. The challenge for solving Problem 1 is the dependency of ϕ on the agents in Y , i.e., the joint trajectory S is only partially controllable. Therefore, our approach is to predict uncontrollable agents, construct prediction regions that capture uncertainty of these prediction, and synthesize the controller using this information. 2.2 Trajectory Predictors & Conformal Prediction Trajectory Predictors. From the training set Dtrain , we first train a trajectory predictor that estimates future states (Yk+1 , . . . , YTϕ ) from past observations (Y0 , . . . Yk ). Specifically, we denote the predictions made at time k as (Ŷk+1|k , . . . , ŶTϕ |k ). For instance, we can use recurrent neural networks (RNNs) [59], long short term memory (LSTM) networks [60], or transformers [61]. Conformal Prediction. The predictions Ŷτ |k of Yτ for τ > k, which may be obtained from learning-enabled components, may not always be accurate. We will use conformal prediction to obtain prediction regions for Yτ that are valid with high probability. We leverage conformal prediction which is a statistical tool for uncertainty quantification with minimal assumptions [48,49]. Let R(0) , . . . , R(K) be K + 1 i.i.d. random variables which are referred to as the nonconformity (j) (j) score. For instance, we may define the prediction error R(j) := ||Yτ − Ŷτ |k || at time τ > k for all calibration trajectories Y (j) ∈ Dcal . Naturally, a small value of R(j) in this case implies accurate predictions. We later present nonconformity scores to obtain prediction regions for uncontrollable agents over multiple time steps. Given a failure probability δ ∈ (0, 1), our goal is to compute C from R(1) , . . . , R(K) such that the probability of R(0) being bounded by C is not smaller than 1 − δ, i.e., Prob(R(0) ≤ C) ≥ 1 − δ. By the results in Lemma 1 of [62], we can compute C := Quantile1−δ (R(1) , . . . , R(K) , ∞) as the (1 − δ)th quantile over the empirical distribution of R(1) , . . . , R(K) and ∞.1 By assuming that R(1) , . . . , R(K) are sorted in non-decreasing order and by adding R(K+1) := ∞, we have that C = R(p) where p := ⌈(K + 1)(1 − δ)⌉, i.e., C is the pth smallest nonconformity score. To obtain meaningful prediction regions, we remark that K ≥ ⌈(K + 1)(1 − δ)⌉ has to hold, otherwise we get C = ∞. Lastly, note that the guarantees Prob(R(0) ≤ C) ≥ 1 − δ are marginal over the randomness in R(0) , R(1) , . . . , R(K) (with C depending on R(1) , . . . , R(K) ) as opposed to being conditional on R(1) , . . . , R(K) . 3 Prediction Regions for Multiple Uncontrollable Agents To solve Problem 1, we use trajectory predictions and compute probabilistic prediction regions for each uncontrollable agent that are valid over multiple time steps. Specifically, we construct two 1 Formally, we define the quantile function as Quantile1−δ (R(1) , . . . , R(K) , ∞) := inf{z ∈ R|Prob(Z ≤ z) ≥ 1 − δ} P with the random variable Z := 1/(K + 1)( i δR(i) + δ∞ ) where δR(i) is a dirac distribution centered at R(i) . 5 different prediction regions for open-loop and closed-loop control synthesis. Recall that Yτ,i denotes the random state of agent i at time τ , and let Ŷτ |k,i denote the prediction of Yτ,i made at time k. For open-loop control, we are interested in constructing prediction regions for all predictions τ ∈ {1, . . . , Tϕ } made at time k := 0 and for all agents i ∈ {1, . . . , N }, i.e., such that Prob(||Yτ,i − Ŷτ |0,i || ≤ Cτ |0,i , ∀(τ, i) ∈ {1, . . . , Tϕ } × {1, . . . , N }) ≥ 1 − δ (3) where Cτ |0,i indicates the τ -step ahead prediction error of agent i for predictions Ŷτ |0,i made at time k = 0. For closed-loop control, on the other hand, we fix τ := k + 1 and are interested in constructing prediction regions for all times k ∈ {0, . . . , Tϕ − 1} and agents i ∈ {1, . . . , N }, i.e., such that Prob(||Yk+1,i − Ŷk+1|k,i || ≤ Ck+1|k,i , ∀(k, i) ∈ {0, . . . , Tϕ − 1} × {1, . . . , N }) ≥ 1 − δ, (4) where Ck+1|k,i indicates the one-step-ahead prediction error of agent i for predictions Ŷk+1|k,i made at time k. Data-efficient and accurate prediction regions. Computing τ -step ahead prediction regions via Cτ |0,i (and one-step-ahead prediction regions via Ck+1|k,i ) is in general difficult. Existing works [16, 63] are conservative, data-inefficient, and do not provide prediction regions for multiple agents. The reason is that Cτ |0,i (and Ck+1|k,i ) are independently computed for all (τ, i) ∈ {1, . . . , Tϕ } × {1, . . . , N } (for all (k, i) ∈ {0, . . . , Tϕ − 1} × {1, . . . , N }). This independent computation is followed by a conservative and inefficient union bounding argument (more details below). Instead, we obtain these prediction regions jointly. For the open-loop case, we define the normalized nonconformity score (j) (j) ROL := (j) ||Yτ,i − Ŷτ |0,i || max (5) στ |0,i (τ,i)∈{1,...,Tϕ }×{1,...,N } for calibration trajectories Y (j) ∈ Dcal where στ |0,i > 0 are constants that normalize the prediction error to [0, 1] for all times τ and agents i. This nonconformity score is inspired by [64], but different in two ways. First, we normalize over predictions τ and agents i (instead of only predictions). Second, we do not need to solve nonconvex optimization problems and instead simply normalize over the training dataset Dtrain which has practical advantages. Specifically, we use the normalization (j) (j) στ |0,i := maxj ∥Yτ,i − Ŷτ |0,i ∥ for training trajectories Y (j) ∈ Dtrain . For the closed-loop case, we follow a similar idea and define the normalized nonconformity score (j) RCL := ||Yk+1,i − Ŷk+1|k,i || σk+1|k,i (k,i)∈{0,...,Tϕ −1}×{1,...,N } (6) max for calibration trajectories Y (j) ∈ Dcal where σk+1|k,i > 0 are again constants that normalize the (j) prediction error to [0, 1] for all times k and agents i. In this case, we let σk+1|k,i := maxj ∥Yk+1,i − (j) Ŷk+1|k,i ∥ for training trajectories Y (j) ∈ Dtrain . The intuitions behind the nonconformity scores in (5) and (6) are that normalization via στ |0,i (j) (j) (and σk+1|k,i ) will prevent the prediction error ||Yτ,i − Ŷτ |0,i || (and ||Yk+1,i − Ŷk+1|k,i ||) for a specific agent and time to dominate the max operator. This would result in overly conservative prediction regions for other times and agents. 6 Valid Prediction Regions with Conformal Prediction. We can now apply conformal (j) prediction, as introduced in Section 2.2, to the open-loop and closed-loop nonconformity scores ROL (j) and RCL , respectively. By re-normalizing these nonconformity scores, we then obtain prediction regions as in equations (3) and (4). Theorem 1 Given the random trajectory Y := (Y0 , Y1 , . . . ) ∼ D, a set of trajectories Dcal and Dtrain that satisfy Assumption 1, and the failure probability δ ∈ (0, 1), then the prediction regions in equations (3) and (4) hold for the choices of Cτ |0,i := COL στ |0,i (7) Ck+1|k,i := CCL σk+1|k,i (8) where στ |0,i and σk+1|k,i are positive constants and (1) (K) (1) (K) COL := Quantile1−δ (ROL , . . . , ROL , ∞) CCL := Quantile1−δ (RCL , . . . , RCL , ∞). Proof 1 For brevity, we only show that the open-loop prediction region in (3) is valid with the choice of Cτ |0,i := COL στ |0,i . The proof for the closed-loop prediction region in (4) follows exactly the same steps and is omitted. For a test trajectory Y := (Y0 , Y1 , . . . ) ∼ D, we immediately know from conformal prediction that  Prob  ||Yτ,i − Ŷτ |0,i || ≤ COL ≥ 1 − δ στ |0,i (τ,i)∈{1,...,Tϕ }×{1,...,N } max (9) by construction of COL . We then see that equation (9) implies that  ||Yτ,i − Ŷ  τ |0,i || Prob ≤ COL , ∀(τ, i) ∈ {1, . . . , Tϕ } × {1, . . . , N } ≥ 1 − δ. στ |0,i (10) Since στ |0,i > 0, we can multiply the inequality in (10) with στ |0,i . We then see that (3) is valid with the choice of Cτ |0,i := COL στ |0,i which completes the proof. Comparison with existing work. With slight modification of [16, 63], an alternative approach for obtaining Cτ |0,i would be to compute Cτ |0,i := Quantile1−δ/(Tϕ N ) (R(1) , . . . , R(K) , ∞) with the nonconformity score R(j) := ||Yτ,i − Ŷτ |0,i ||. In a similar way, we could find Ck+1|k,i := Quantile1−δ/(Tϕ N ) (R(1) , . . . , R(K) , ∞) with the nonconformity score R(j) := ||Yk+1,i −Ŷk+1|k,i ||. However, it is evident that taking the 1 − δ quantile in Theorem 1 compared to the 1 − δ/(Tϕ N ) quantile is much more data inefficient. Specifically, for a fixed δ ∈ (0, 1), we only require K ≥ (1 − δ)/δ calibration trajectories as opposed to K ≥ (Tϕ N − δ)/δ calibration trajectories.2 Intuitively, it is also evident that our choice of normalization constants στ |0,i and σk+1|k,i provides less conservative prediction regions in practice as evaluating the 1 − δ quantile is more favorable compared to the 1 − δ/(Tϕ N ) for larger task horizons Tϕ and number of agents N . Finally, we remark that one could even consider obtaining conditional guarantees using conditional conformal prediction [65]. Similarly, we can use versions of robust conformal prediction to be robust against distribution shifts [54, 62], e.g., caused by interaction between the system in (1) and the distribution D. 2 Computing the 1−δ and 1−δ/(Tϕ N ) quantiles requires that ⌈(K +1)(1−δ)⌉ ≤ K and ⌈(K +1)(1−δ/(Tϕ N )⌉ ≤ K, respectively. 7 For practitioners. We summarize our method to obtain the prediction regions in equations (3) and (4) via Theorem 1 in Algorithm 1. We note that, in order to compute COL and CCL , (K+1) (K+1):=∞ we sort the corresponding nonconformity scores (after adding ROL := ∞ and RCL ) in nondecreasing order and set COL and CCL to be the pth smallest nonconformity score where p := ⌈(K +1)(1−δ)⌉, as explained in Section 2.2. Specifically, the variable p is set in line 1. We then (j) compute the predictions Ŷτ |k,i for all calibration data in lines 2-6 and calculate the normalization (j) constants στ |k,i over the training data in line 7. Finally, we compute the nonconformity scores ROL (j) and RCL for all calibration data in lines 8-10 and apply conformal prediction to obtain COL and CCL in lines 11-12. According to Theorem 1, we then obtain the open-loop prediction regions Cτ |0,i and the closed-loop prediction regions Ck+1|k,i . Algorithm 1: Multi-Agent Prediction Regions Input: failure probability δ, dataset D, horizon Tϕ Output: Constants COL , CCL , and στ |k,i 1 p ← ⌈(K + 1)(1 − δ)⌉ 2 forall i ∈ {1, . . . , N } do 3 forall k ∈ [0, Tϕ − 1] do 4 forall τ ∈ [k + 1, Tϕ ] do 5 forall j ∈ {1, . . . , K̄} do (j) (j) (j) 6 Compute Ŷτ |k,i from (Y0,i , . . . , Yk,i ) 7 (j) (j) στ |k,i ← maxj∈{K+1,...,K̄} ||Yτ,i − Ŷτ |k,i || 8 forall j ∈ {1, . . . , K} do (j) (j) 9 Compute ROL and RCL as in equations (5) and (6) (K+1) (K+1) 10 Set ROL ← ∞ and RCL ← ∞ (j) (j) 11 Sort ROL and RCL in nondecreasing order (p) (p) 12 Set COL ← ROL and CCL ← RCL 4 Predictive STL Control Synthesis In this section, we use the previously computed prediction regions and design predictive controllers that solve Problem 1, i.e., controllers u that ensure that x is such that Prob(S |= ϕ) ≥ 1 − δ. Specifically, we use the prediction regions and present an MIP encoding for the STL task ϕ. We first present a qualitative and a quantitative encoding using the Boolean and quantitative semantics of ϕ in Sections 4.1 and 4.2, respectively. In Section 4.3, we use these encodings and present openloop and closed-loop controllers. 4.1 Qualitative STL Encoding with Multi-Agent Prediction Regions A standard way of encoding STL tasks for deterministic trajectories s is by using an MIP encoding [27, 47]. The idea is to introduce a binary variable z0ϕ and a set of mixed integer constraints over s such that z0ϕ = 1 if and only if s |= ϕ. In this paper, however, we deal with stochastic trajectories S that consist of the system trajectory x and the stochastic trajectories of dynamic agents Y . We 8 will instead introduce a binary variable z̄0ϕ and a set of mixed integer constraints over x and the prediction regions in (3) and (4) such that z̄0ϕ = 1 implies that S |= ϕ with a probability of at least 1 − δ. In this way, z̄0ϕ can be seen as a probabilistic version of z0ϕ . We emphasize that our MIP encoding provides sufficient but not necessary conditions. Specifically, we can not guarantee necessity due to the use of probabilistic prediction regions. In the remainder, we generate the mixed integer constraints that define z̄0ϕ recursively on the structure of ϕ. Our main innovation is a probabilistic MIP encoding for predicates, while the encoding of Boolean and temporal operators follows [27,47]. We here recall from Assumption 2 that the formula ϕ is in positive normal form so that we do not need to consider negations. To provide a general encoding that can be used for open-loop and closed-loop control, we let k ≥ 0 denote the current time. As we will treat the system state xk separately from the state of uncontrollable dynamic agents Yk contained in Sk , we will write µ(xk , Yk ) instead of µ(sk ). Predicates µ. Let us denote the set of predicates π µ in ϕ by P. Now, for each predicate µ π ∈ P and for each time τ ≥ 0, we introduce a binary variable z̄τµ ∈ {0, 1}. If τ ≤ k, then we have observed the value of Yτ already, and we set z̄τµ = 1 if and only if µ(xτ , Yτ ) ≥ 0. If τ > k, then we would like to constrain z̄τµ such that z̄τµ = 1 implies Prob(µ(xτ , Yτ ) ≥ 0) ≥ 1 − δ. Motivated by Theorem 1, we use the Big-M method and define the constraint −miny∈Bτ µ(xτ , y) ≤ M (1 − z̄τµ ) − ϵ, (11) where M and ϵ are sufficiently large and sufficiently small positive constants, respectively, see [66] for details on how to select M and ϵ. The minimization of µ(xτ , y) over the y component within the set Bτ will account for all Yτ that are contained within our probabilistic prediction regions. Specifically, the set Bτ contains all states contained within a geometric ball that is centered around the predictions Ŷτ |k,i with a size of Cτ |k,i and is defined as Bτ := {[y1 , . . . , yN ]⊤ ∈ Rny | ∀i ∈ {1, . . . , N }, ||yi − Ŷτ |k,i || ≤ Cτ |k,i }. In the case when k = 0, we use the values of Cτ |0,i as in (7) that define the open-loop prediction region in (3). This implies that Prob(Yτ ∈ Bτ , τ ∈ {1, . . . , Tϕ }) ≥ 1−δ. For k > 0, we use Cτ |k,i as in (8) that define the closed-loop prediction region in (4) for τ = k + 1, and we use Cτ |k,i := CCL στ |k,i for τ > k + 1. Remark 1 For k > 0, note that we have derived probabilistic guarantees for one-step ahead predictions with Cτ |k,i for τ = k + 1. This implies that Prob(Yk+1 ∈ Bk+1 , k ∈ {0, . . . , Tϕ − 1}) ≥ 1 − δ. We, however, do not have (and do not need as we show later) probabilistic guarantees for multi-step ahead predictions with Cτ |k,i for τ > k + 1. If such guarantees were required, we could consider a nonconformity score similar to (6), but where we maximize the normalized prediction error over k ∈ {0, . . . , Tϕ − 1}, τ ∈ {k + 1, . . . , Tϕ }, and i ∈ {1, . . . , N }. By Theorem 1 and the construction in equation (11), we have the following straightforward result for the case where k = 0. Corollary 1 At time k = 0, if z̄τµ = 1 for all times τ ∈ T and for all predicates π µ ∈ P for some sets (T, P ) ⊆ {1, . . . , Tϕ } × P, then it holds that Prob(µ(xτ , Yτ ) ≥ 0, ∀(τ, π µ ) ∈ T × P ) ≥ 1 − δ. Proof 2 If z̄τµ = 1 for all τ ∈ T and π µ ∈ P , then it follows by equation (11) that xτ is such that 0 < ϵ ≤ miny∈Bτ µ(xτ , y) for all τ ∈ T and π µ ∈ P . Now, using Theorem 1, we can directly conclude that Prob(µ(xτ , Yτ ) ≥ 0, ∀(τ, π µ ) ∈ T × P ) ≥ 1 − δ. 9 We emphasize that these results are stated for the case where k = 0. We will discuss the extension to k > 0 later when we present a receding horizon control strategy where only the k + 1 step-ahead predictions become relevant for obtaining probabilistic guarantees. Remark 2 We do not need to enforce z̄τµ = 0 since the STL formula ϕ is in positive normal form as per Assumption 2. However, by using the constraint maxy∈Bτ µ(xτ , y) ≤ M z̄τµ −ϵ we could enforce that z̄τµ = 0 implies Prob(µ(xτ , Yτ ) < 0, ∀(τ, π µ ) ∈ T × P ) ≥ 1 − δ. Computational considerations. Note that we minimize µ(xτ , y) over the y component within the ball Bτ in equation (11). In an outer loop, we will additionally need to optimize over xτ to compute control inputs uτ . To obtain computational tractability for this, we will now remove the minimum over y and instead write equation (11) in closed-form. Since the function µ is continuously differentiable and convex in its parameters by Assumption 2, we can use the KKT conditions of the inner problem to do so. Specifically, the constraints in equation (11) can be converted into a bilevel optimization problem where the outer problem consists of the constraints −µ(xτ , y ∗ ) ≤ M (1 − z̄τµ ) − ϵ, (12) and where the inner optimization problem is y ∗ := argmin µ(xτ , y), y (13) s.t. ||yi − Ŷτ |k,i || ≤ Cτ |k,i , ∀i ∈ {1, . . . , N }. We can use the KKT conditions of the inner optimization problem in (13) after a minor modification in which we change the constraints ||yi − Ŷτ |k,i || ≤ Cτ |k,i into (yi − Ŷτ |k,i )⊤ (yi − Ŷτ |k,i ) ≤ Cτ2|k,i to obtain differentiable constraint functions. Formally, we obtain ∂(yi∗ − Ŷτ |k,i )⊤ (yi∗ − Ŷτ |k,i ) ∂µ(xτ , y ∗ ) + ΣN λ =0 i=1 i ∂y ∂yi (14) (yi∗ − Ŷτ |k,i )⊤ (yi∗ − Ŷτ |k,i ) ≤ Cτ2|k,i , ∀i ∈ {1, . . . , N } (15) λi ≥ 0, ∀i ∈ {1, . . . , N } (16) λi ((yi∗ − Ŷτ |k,i )⊤ (yi∗ − Ŷτ |k,i ) − Cτ2|k,i ) = 0, ∀i ∈ {1, . . . , N } (17) as the set of KKT conditions for the inner problem in (13). Since Slater’s condition holds for the inner optimization problem (13), with the reasoning that Cτ |k,i > 0 due to στ |0,i > 0 and σk+1|k,i > 0, we have that a solutions y ∗ to (13) is a feasible solution to the KKT conditions in (14), (15), (16), and (17) and vice versa [67]. As a result, we can use equations the KKT conditions (14), (15), (16), and (17) instead of the optimization problem (13), and thus use equations (12), (14), (15), (16), and (17) instead of (11). Boolean and temporal operators. So far, we have encoded predicates π µ . We can encode Boolean and temporal operators recursively using the standard MIP encoding [27, 47]. In brief, for ϕ ϕ the conjunction ϕ := ∧m i=1 ϕi we introduce the binary variable z̄τ ∈ {0, 1} that is such that z̄τ = 1 if and only if z̄τϕ1 = . . . = z̄τϕm = 1. Consequently, z̄τϕ = 1 implies that (S, τ ) |= ϕ holds with a probability of at least 1 − δ. We achieve this by enforcing the constraints z̄τϕ ≤ z̄τϕi , i = 1, . . . , m, ϕi z̄τϕ ≥ 1 − m + Σm i=1 z̄τ , 10 For the disjunction ϕ := ∨m i=1 ϕi , we follow the same idea but instead enforce the constraints z̄τϕ ≥ z̄τϕi , i = 1, . . . , m, ϕi z̄τϕ ≤ Σm i=1 z̄τ . For temporal operators, we follow a similar procedure, but first note that temporal operators can be written as Boolean combinations of conjunctions and disjunctions. For ϕ := G[a,b] ϕ1 , we V ϕ1 again introduce a binary variable z̄τϕ ∈ {0, 1} and encode z̄τϕ = ττ +b using the MIP constraints ′ =τ +a z̄τ W ϕ1 for conjunctions introduced before. Similarly, for ϕ := F[a,b] ϕ1 we use that z̄τϕ = ττ +b ′ =τ +a z̄τ , while V W ϕ2 ϕ1 τ′ for ϕ = ϕ1 U[a,b] ϕ2 we use that z̄τϕ = ττ +b ′ =τ +a (z̄τ ′ ∧ τ ′′ =τ z̄τ ′′ ). Soundness of the encoding. The procedure described above provides us with a binary variable z̄0ϕ and a set of mixed integer constraints over x such that z̄0ϕ = 1 implies that S |= ϕ holds with a probability of at least 1 − δ. We summarize this result next. Theorem 2 Let the conditions from Problem 1 hold. At time k = 0, if z̄0ϕ = 1 then we have that Prob(S |= ϕ) ≥ 1 − δ. Proof 3 By enforcing that z̄0ϕ = 1, there is an assignment of 0 and 1 values to the binary variables z̄τµ for each τ ∈ {1, . . . , Tϕ } and for each π µ ∈ P. This assignment is non-unique. Specifically, let z̄τµ = 1 for all times τ ∈ T and for all predicates π µ ∈ P for some sets (T, P ) ⊆ {1, . . . , Tϕ }×P, and let z̄τµ = 0 otherwise. By Corollary 1, it follows that Prob(µ(xτ , Yτ ) ≥ 0, ∀(τ, π µ ) ∈ T × P ) ≥ 1 − δ. By the recursive definition of z̄0ϕ using the encoding from [27,47] for Boolean and temporal operators and since ϕ is in positive normal form, it follows that Prob(S |= ϕ) ≥ 1 − δ. 4.2 Quantitative STL Encoding with Multi-Agent Prediction Regions The qualitative MIP encoding ensures that z̄0ϕ = 1 implies that Prob(S |= ϕ) ≥ 1 − δ. However, in some cases one may want to optimize over the quantitative semantics ρϕ (S, 0). We next present a quantitative MIP encoding for ϕ. Specifically, we will recursively define a continuous variable r̄0ϕ ∈ R that will be such that ρϕ (S, 0) ≥ r̄τϕ with a probability of at least 1 − δ. Our main innovation is a quantitative MIP encoding for predicates, while the Boolean and temporal operators again follow the standard MIP encoding [27, 47]. Predicates. For each predicate π µ ∈ P and for each time τ ≥ 0, we introduce a continuous variable r̄τµ ∈ R. Inspired by [68], we define r̄τµ as  µ(xτ , Yτ ) if τ ≤ k, µ r̄τ := (18) miny∈Bτ µ(xτ , y) otherwise where we again notice that Yτ is known if τ ≤ k, while we compute miny∈Bτ µ(xτ , y) if τ > k to consider the worst case of µ(xτ , y) for y within the prediction region Bτ . By Theorem 1, which ensures that Yτ ∈ Bτ holds with a probability of at least 1 − δ, and the construction in equation (18), we have the following straightforward result for the case where k = 0. Corollary 2 At time k = 0, it holds that Prob(ρµ (S, τ ) ≥ r̄τµ , ∀(τ, π µ ) ∈ {1, . . . , Tϕ } × P) ≥ 1 − δ. Proof 4 The result follows directly from Theorem 1 and the construction of r̄τµ in equation (18) using the prediction region Bτ . 11 Similarly to the qualitative encoding, the term miny∈Bτ µ(xτ , y) in equation (18) can be written as in equation (13), and we can use equations (14), (15), (16), and (17) instead of (13). Boolean and temporal operators. As for the qualitative encoding, we only provide a brief summary for the quantitative encoding of Boolean and temporal operators that follow standard ϕ encoding rules. For the conjunction ϕ := ∧m i=1 ϕi , we introduce the continuous variable r̄τ ∈ R that will be such that r̄τϕ = min{r̄τϕ1 , . . . , r̄τϕm }. Particularly, we achieve this by enforcing the constraints ϕi Σm i=1 pτ = 1 r̄τϕ ≤ r̄τϕi , i = 1, . . . , m, r̄τϕi − (1 − pϕτ i )M ≤ r̄τϕ ≤ r̄τϕi + M (1 − pϕτ i ), i = 1, . . . , m, where pϕτ i ∈ {0, 1} are m new binary variables so that r̄τϕ = rτϕi if and only if pϕτ i = 1. By this encoding, r̄τϕ ≥ 0 implies that (S, τ ) |= ϕ holds with a probability of at least 1 − δ. For the disjunction ϕ := ∨m i=1 ϕi , we follow the same idea but instead enforce the constraints ϕi Σm i=1 pτ = 1 r̄τϕ ≥ r̄τϕi , i = 1, . . . , m, r̄τϕi − (1 − pϕτ i )M ≤ r̄τϕ ≤ r̄τϕi + M (1 − pϕτ i ), i = 1, . . . , m. For temporal operators, we again note that we can write each operator as Boolean combinations of conjunctions and disjunctions and then follow the same procedure as for the qualitative encoding. Soundness of the encoding. The procedure described above gives us a continuous variable r̄0ϕ and a set of mixed integer constraints over x such that r̄0ϕ is a probabilistic lower bound of ρϕ (S, 0). We summarize this result next. Theorem 3 Let the conditions from Problem 1 hold. At time k = 0, it holds that Prob(ρϕ (S, 0) ≥ r̄0ϕ ) ≥ 1 − δ. Proof 5 Recall that Prob(ρµ (S, τ ) ≥ r̄τµ , ∀(τ, π µ ) ∈ {1, . . . , Tϕ } × P) ≥ 1 − δ by Corollary 2. Since the task ϕ is in positive normal form, which excludes negations, and due to the recursive structure of ϕ we can directly conclude that Prob(ρϕ (S, 0) ≥ r̄0ϕ ) ≥ 1 − δ. We emphasize that the previous result will allow us to directly optimize over the variable r̄0ϕ to achieve r̄0ϕ > 0, which then implies that Prob(S |= ϕ) ≥ 1 − δ. We note that the quantitative encoding allows for robustness considerations and provides a more fine-grained analysis. The qualitative encoding, on the other hand, is computationally more tractable. 4.3 Predictive MIP-Based Control Synthesis For control synthesis, we can now use the qualitative encoding via the binary variable z̄0ϕ or the quantitative encoding via the continuous variable r̄0ϕ to solve Problem 1. Formally, we then solve 12 the following mixed integer optimization problem u∗k:Tϕ −1 := argmin J(xk , uk:Tϕ −1 ) (19a) uk:Tϕ −1 ∈ U Tϕ −k , xk+1:Tϕ ∈ X Tϕ −k (19b) xτ +1 = f (xτ , uτ ), τ = k, . . . , Tϕ − 1, (19c) z̄0ϕ = 1 or r̄0ϕ > 0. (19d) uk:Tϕ −1 subject to Note particularly that we enforce either z̄0ϕ = 1 or r̄0ϕ > 0 in equation (19d). This enables us to select between the quantitative and the qualitative encoding. Open-loop synthesis. We will start with analyzing the case when k = 0 where we get an open-loop control sequence u∗0:Tϕ −1 . By Theorems 2 and 3 it is easy to see that we solve Problem 1 if we can find a feasible solution to the optimization problem in (19). We next summarize the result that solves Problem 1. Theorem 4 (Open-loop control) Let the conditions from Problem 1 hold. If the optimization problem in (19) is feasible at time k = 0, then applying the open-loop control sequence u∗0:Tϕ −1 to (1) results in Prob(S |= ϕ) ≥ 1 − δ. Proof 6 Note that we enforce that either z̄0ϕ = 1 or r̄0ϕ > 0 within the optimization problem in (19). By Theorems 2 and 3 this implies that Prob(S |= ϕ) ≥ 1 − δ or Prob(ρϕ (S, 0) ≥ r̄0ϕ ) ≥ 1 − δ, respectively. In the later case, we can similarly conclude that Prob(S |= ϕ) ≥ 1 − δ since r̄0ϕ > 0. The complexity of the optimization problem in (19) is in general that of a mixed integer program which is NP-hard. Receding horizon control strategy. Such an open-loop control strategy may be conservative as the prediction regions Cτ |0,i will usually be conservative for large τ as then the predictions ŷτ |0 lose accuracy. Furthermore, due to the lack of state feedback, the open-loop controller is not robust. Therefore, we propose to solve the optimization problem in (19) in a receding horizon fashion. In other words, at each time k we measure the states xk and Yk , update our predictions Ŷτ |k , compute the control sequence u∗k:Tϕ −1 by solving the optimization problem in (19), and apply the first element u∗k of u∗k:Tϕ −1 to the system in (1) before we repeat the process. Naturally, we do so in a shrinking horizon manner where the prediction horizon H decreases by one at each time. In this process, we note that the prediction regions Bτ need to be updated at each time k. In this regard, we recall Remark 1. Indeed, if k = 0 as in the open-loop case, we were guaranteed that the prediction region in equation (3) are valid, i.e., that Prob(Yτ ∈ Bτ , τ ∈ {1, . . . , Tϕ }) ≥ 1 − δ. For the receding horizon strategy, however, the prediction region in equation (8) is valid, i.e., that instead Prob(Yk+1 ∈ Bk+1 , k ∈ {0, . . . , Tϕ −1}) ≥ 1−δ. This allows us to provide formal guarantees for Problem 1. Theorem 5 (Closed-loop control) Let the conditions from Problem 1 hold. If the optimization problem in (19) is feasible at each time k ∈ {0, . . . , Tϕ − 1}, then applying the first element u∗k of u∗k:Tϕ −1 to (1) at each time k results in Prob(S |= ϕ) ≥ 1 − δ. 13 temp.) Yr2 (Room 2 x (Room 1 temp.) temp.) Yr3 (Room 3 Figure 1: The floor plan of the hall. 30 20 15 10 10 5 0 RCL Nonconformity Scores CCL Frequency Frequency 20 ROL Nonconformity Scores COL 0.6 0.8 Score 1.0 0 1.2 (j) 0.7 0.8 0.9 Score 1.0 1.1 (j) Figure 2: Nonconformity scores ROL (left) and RCL (right) on Dcal in the temperature control case. Proof 7 By Theorem 1, we know that Yk+1 ∈ Bk+1 for all times k ∈ {0, . . . , Tϕ − 1} with a probability of at least 1 − δ. By assumption, the optimization problem in (19) is feasible at each time k ∈ {0, . . . , Tϕ − 1}. Similarly to Corollaries 1 and 2, this allows us to conclude that Prob(µ(xk , Yk ) ≥ 0, ∀(k, π µ ) ∈ T × P ) ≥ 1 − δ for all times k ∈ T and for all predicates π µ ∈ P for sets (T, P ) ⊆ {0, . . . , Tϕ }×P where z̄kµ = 1 (qualitative encoding) and Prob(ρµ (S, k) ≥ r̄kµ , ∀(k, π µ ) ∈ {1, . . . , Tϕ } × P) ≥ 1 − δ (quantitative encoding). It is then straightforward to show, following the same reasoning in Theorems 2 and 3 that Prob(S |= ϕ) ≥ 1 − δ if z̄0ϕ = 1 (qualitative encoding) or r̄0ϕ > 0 (quantitative encoding) as enforced in equation (19d). 5 Case studies In this section, we illustrate our control method on two case studies. All simulations are conducted in Python 3 and we use SCIP [69] to solve the optimization problem. Our implementations are available at https://github.com/SAIDS-Lab/STL-Synthesis-among-Uncontrollable-Agents, where more details can be found. 5.1 Building temperature control We consider the temperature control of a public space in a hall with floor plan shown in Figure 1. Specifically, we can control the temperature of Room 1, which is the public space, while Rooms 2 and 3 are uncontrollable for us, e.g., these may be meeting rooms where the temperature can only be adjusted manually. Formally, the dynamics of the system are given as xk+1 = xk + τs (αe (Te − xk ) + αH (Th − xk )uk ), where the state xk ∈ X = [0, 45] denotes the temperature of the public space at time k, the control input uk ∈ U = [0, 1] is the ratio of the heater valve, τs is the sampling time, Th = 55◦ C is the 14 150 125 100 75 50 25 0 Empirical Coverage of (4) Frequency Frequency 150 Empirical Coverage of (3) 125 100 75 50 25 0 0.6 0.7 0.8 0.9 1.0 Percentage 0.7 0.8 0.9 Percentage 1.0 (j) (j) Figure 3: Empirical validation results about the coverage of ROL ≤ COL (left) and RCL ≤ CCL (right) in the temperature control case. open-loop results 28 26 24 22 20 18 16 14 0 10 20 28 26 24 22 20 18 16 14 30 closed-loop results (k=8) 0 10 20 30 28 26 24 22 20 18 16 14 closed-loop results (k=16) 28 closed-loop results (k=26) 26 24 22 20 18 16 14 0 10 20 30 0 10 20 Predicted region of Y1 Predicted region of Y2 Known states of x Known states of Y1 Known states of Y2 predicted states of x Predicted states of Y1 Predicted states of Y2 30 Figure 4: The result of the temperature control of the first case study. heater temperature, Te = 5◦ C is the outside temperature, and αe = 0.06 and αH = 0.08 are heat exchange coefficients. The initial state is x0 = 5. The model is adopted from [40]. System Specification: The task is to ensure that the difference between the temperatures in the public space and the two meeting rooms is bounded, e.g., such that during transition between rooms people feel comfortable. Let Yr2 and Yr3 denote the uncontrollable random variables describing the temperatures of Rooms 2 and 3. In particular, we require that, if the difference between x and the temperatures of Yr2 or Yr3 is larger than the threshold 5◦ C, then this difference should go back below the threshold of 5◦ C within the next 4 minutes. We set the sampling time to be τs = 2 minutes, and we describe the task as follows:   ϕ := G[1,60] x−Yr2 ≥ 5 ∨ x−Yr2 ≤ −5 ∨ x−Yr3 ≥ 5 ∨ x−Yr3 ≤ −5  =⇒ F[0,2] x−Yr2 ≤ 5 ∧ x−Yr2 ≥ −5 ∧ x−Yr3 ≤ 5 ∧ x−Yr3 ≥ −5 . In this case, we set the failure probability to be 15%, i.e., cost function is set to P62δ := 0.15. The 2 be the sum of squares of temperature differences as J := k=1 (x − Yr2 ) + (x − Yr3 )2 . Behavior of uncontrollable rooms: The group of people will adjust the temperature in the rooms based on their individual temperature preferences when they enter the meeting room. We assume the temperature variation in Yr1 and Yr2 generated by this adjustment is described by Newton’s Law of Cooling [70]. Data Collection: We collected 2000 trajectories of temperature trajectories for Rooms 2 and 3 – generated as described before. We split them into training (used for the trajectory predictor), calibration (used for conformal prediction), and test (used for validation) datasets with sizes |Dtrain | = 500, |Dcal | = 500 and |Dtest | = 1000, respectively. From the training data Dtrain , we train the long-short-term memory (LSTM) network to make predictions about the room temperature. We note that we warm-start the LSTM at time k = 0 with data, e.g., the LSTM is fed with 15 ROL Nonconformity Scores COL 20 RCL Nonconformity Scores CCL 15 Frequency Frequency 15 10 10 5 00.2 0.4 0.6 0.8 1.0 1.2 0 0.4 Score (j) 5 0.6 0.8 1.0 Score 1.2 1.4 (j) Figure 5: Nonconformity scores ROL (left) and RCL (right) on Dcal in the robot motion planning case. past information y−6 , . . . , y−1 . Multi-agent prediction regions. Fig. 2 shows histograms of the nonconformity scores (j) (j) ROL and RCL evaluated on Dcal . From these histograms of nonconformity scores, we compute COL := 0.941 and CCL := 0.968 according to Theorem 1. Next, we empirically validate the correctness of the prediction regions by checking whether or not (3) and (4) hold. We perform the following two experiments 1000 times each. In the first experiment, we randomly sample 50 calibration trajectories from Dcal and 50 test trajectories from Dtest . Then, we construct COL and CCL from the 50 calibration trajectories and compute the ratio of how many of the 50 test (j) (j) trajectories satisfy ROL ≤ COL and RCL ≤ CCL , respectively. Fig. 3 shows the histogram over these ratios, and we can observe that the result achieves the desired coverage 1 − δ = 0.85. In the second experiment, we randomly sample 50 calibration trajectories from Dcal and 1 test trajectories from Dtest . Then, we construct COL and CCL from the 50 calibration trajectories and check whether or (j) (j) not the sampled test data satisfies ROL ≤ COL and RCL ≤ CCL , respectively. We find a ratio of (j) (j) 0.867 for ROL ≤ COL and a ratio of 0.869 for RCL ≤ CCL , respectively, which further empirically confirms (3) and (4). For one of these test trajectories, Fig. 4 shows the prediction regions defined by Ŷτ |k and Cτ |k for τ > k where k = 0, k = 8, k = 16 and k = 26 from the left to the right respectively. Predictive control synthesis. Fig. 4 shows the result of the proposed control framework. Fig. 4(a) specifically presents the result of the open loop controller where the predict regions are large for large τ while Figs. 4(b)(c)(d) present the result of the closed loop controller at times k = 8, 16, 26, respectively. 5.2 Robot motion planning We consider the planar motion of a single robot (denoted by R1 ) with double integrator dynamics, where the system model and temporal logic task are similar to those in [71,72]. The system model, with a sampling period of 1 second, is described as     1 1 0 0 0.5 0 0 1 0 0  0  xk +  1  xk+1 =  0 0 1 1  0 0.5 uk , 0 0 0 1 0 1 where the state xk = [p1x vx1 p1y vy1 ]⊤ denotes the x-position, x-velocity, y-position and y-velocity, and where the control input uk = [ux uy ]T denotes the x-acceleration and y-acceleration, respectively. 16 Empirical Coverage of (3) 200 200 150 Frequency Frequency 150 Empirical Coverage of (4) 100 100 50 0 0.8 0.9 0 1.0 Percentage 50 0.7 0.8 0.9 1.0 Percentage (j) (j) Figure 6: Empirical validation results about the coverage of ROL ≤ COL (left) and RCL ≤ CCL (right) in the robot motion planning case. open-loop results 10 closed-loop results (k=6) 10 10 closed-loop results (k=12) 10 8 8 8 8 6 6 6 6 4 4 4 4 2 2 2 2 0 0 2 4 6 8 10 0 0 2 4 6 8 10 0 0 2 4 6 8 10 0 closed-loop results (k=18) Predicted regions of Y Known states of x Predicted states of x Known states of Y Predicted states of Y 0 2 4 6 8 10 Figure 7: The result of the robot motion planning of the second case study. The physical constraints are x ∈ X = [0, 10] × [−1.5, 1.5] × [0, 10] × [−1.5, 1.5] and u ∈ U = [−1, 1] × [−1, 1]. The initial state is x0 = [1, 0, 1, 0]⊤ . Furthermore, we have one more uncontrollable agent (denoted by R2 ). We denote its state as Yk = [Px2 Py2 ]⊤ and then define the joint state as sk = [p1x vx1 p1y vy1 Px2 Py2 ]⊤ . In this case, R1 is controllable and R2 is uncontrollable. System Specification: The objective of the two robots is to visit specific regions to perform specific tasks, such as collecting packages and transporting them to a final destination. For this collaborative task, R2 is the leader and R1 should follow R2 . Specifically, R2 should arrive and stay in the left blue region and top blue region, as shown in the Fig. 7, within the time intervals [4, 6] and [9, 13], respectively. After that, starting from a certain time instant between the time interval [16, 18], R1 and R2 should stay in bottom right red and blue regions respectively for at least 2 time instants. During the task execution, the distances between the two agents should be at most D meters such that the two robots can communicate or perform collaborative tasks. Furthermore, the two robots should always avoid the obstacles (grey shaded areas). In particular, such a task can be described by the following formula: ϕ := G[4,6] π µ1 ∧ G[9,13] π µ2 ∧ F[16,18] G[0,2] (π µ3 ∧ π µ4 ) ∧ G[0,20] (π µclose ∧ π µobs1 ∧ π µobs2 ), where µ1 := min(Px2 , 2 − Px2 , Py2 − 4, 6 − Py2 ), µ2 := min(Px2 − 3.5, 6.5 − Px2 , Py2 − 8, 10 − Py2 ), µ3 := min(Px2 − 8.5, 10 − Px2 , Py2 , 2 − Py2 ) and µ4 := min(p1x − 7, 8.5 − p1x , p1y , 2 − Py1 ) represent the tasks for R2 in the left blue region, R2 in the top blue region, R2 in the bottom right blue region, and R1 in the bottom red region, respectively. Here, µclose := D −min(Px2 −p1x , p1x −Px2 , Py2 −p1y , p1y −Py2 ) is the predicate regarding the distance requirement between the two robots where we set D = 2. The predicates µobs1 := min(max(1.6 − p1x , p1x − 2.6, 2 − p1y , p1y − 3), max(8.3 − p1x , p1x − 9.3, 6.5 − p1y , p1y − 7.5), max(5.7 − p1x , p1x − 6.7, 2.7 − p1y , p1y − 3.7)) and µobs2 := min(max(1.6 − Px2 , Px2 − 2.6, 2 − Py2 , Py2 − 3), max(8.3 − Px2 , Px2 − 9.3, 6.5 − Py2 , Py2 − 7.5), max(5.7 − Px2 , Px2 − 6.7, 2.7 − Py2 , Py2 − 3.7)) 17 describe the tasks for obstacle avoidance. From an individual robot perspective, we can decompose ϕ into two task for each of the two robots as follows ϕR1 := F[16,18] G[0,2] π µ4 ∧ G[0,20] (π µclose ∧ π µobs1 ), ϕR2 := G[4,6] π µ1 ∧ G[9,13] π µ2 ∧ F[16,18] G[0,2] π µ3 ∧ G[0,20] π µobs2 . As R2 will lead the task, its behavior will not be influenced by R1. On the other hand, R1 is responsible for the task µclose which means it should always track the behavior of R2 . Regarding the execution of the task, we only enforce R1 to achieve ϕR1 instead of ϕ, which makes sense in the leader-follower setting of this case study. Motion of uncontrollable agent. The motion of the uncontrollable agent R2 is described as follows. We set its dynamical system as yk+1 = f (yk , uyk ) + ωk where f (yk , uyk ) describes the same double integrator dynamics as for R1 , and ωk ∈ W is the uniformly distributed disturbance from the set W = ([−0.15, 0.15] × [0, 0])2 . We compute an closed-loop control sequence for R2 under the specification ϕR2 using a standard MIP encoding [27]. Data collection. We collected 2000 trajectories of the uncontrollable agent. We split the data into training, calibration, and test datasets with sizes |Dtrain | = 500, |Dcal | = 500 and |Dtest | = 1000, respectively. We again trained an LSTM on Dtrain for trajectory prediction. (j) (j) Prediction regions. Fig. 5 shows histograms of the nonconformity scores ROL and RCL evaluated on Dcal . Based on these nonconformity scores, we have that COL = 0.891 and CCL = 0.954 by using δ = 0.1. Next, we empirically validate the correctness of the prediction regions by checking whether or not (3) and (4) hold on Dtest by the same two experiments as the temperature control case. Fig. 6 shows the histogram over the result of the ratios in the first experiment, and we can observe that the result achieves the desired coverage 1 − δ = 0.9. In the second experiment, (j) (j) we obtain a ratio of 0.916 for ROL ≤ COL and a ratio of 0.904 for RCL ≤ CCL , respectively, which also empirically confirms (3) and (4). For one of these test trajectories, Fig. 7 shows the prediction regions defined by Ŷτ |k and Cτ |k for τ > k where k = 0, k = 6, k = 12 and k = 18, respectively. Predictive control synthesis. Fig. 7 shows the result of the proposed control framework, where we can see that robot R1 will use the prediction regions of R2 to compute a control input that results in task satisfaction. Fig. 7(a) specifically presents the result of the open loop controller while Figs. 4(b)(c)(d) present the result of the closed loop controller at times k = 6, 12, 18, respectively. 6 Conclusion In this paper, we presented a control framework for signal temporal logic (STL) control synthesis where the STL specification is defined over the dynamical system and uncontrollable dynamic agents. We use trajectory predictors and conformal prediction to provide prediction regions for the uncontrollable agents that are valid with high probability. Then, we formulated predictive control laws that consider the worst case realization of the uncontrollable agents within the prediction region. Specifically, we proposed a mixed integer program (MIP) that results in the probabilistic satisfaction of the STL specification. Finally, we presented an equivalent MIP program based on the KKT conditions of the original MIP program to obtain more efficient solutions. We provided two case studies for temperature control and robot motion planning, and we demonstrated the correctness and efficiency of our control framework. 18 References [1] M. Guo and D. Dimarogonas, “Multi-agent plan reconfiguration under local LTL specifications,” The International Journal of Robotics Research, vol. 34, no. 2, pp. 218–235, 2015. [2] M. Kloetzer and C. Belta, “Automatic deployment of distributed teams of robots from temporal logic motion specifications,” IEEE Transactions on Robotics, vol. 26, no. 1, pp. 48–61, 2009. [3] Y. Kantaros and M. Zavlanos, “Stylus*: A temporal logic optimal control synthesis algorithm for large-scale multi-robot systems,” The International Journal of Robotics Research, vol. 39, no. 7, pp. 812–836, 2020. [4] D. Sun, J. Chen, S. Mitra, and C. Fan, “Multi-agent motion planning from signal temporal logic specifications,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3451–3458, 2022. [5] Y. Kantaros, M. Guo, and M. Zavlanos, “Temporal logic task planning and intermittent connectivity control of mobile robot networks,” IEEE Transactions on Automatic Control, vol. 64, no. 10, pp. 4105–4120, 2019. [6] A. Büyükkoçak, D. Aksaray, and Y. Yazicioglu, “Distributed planning of multi-agent systems with coupled temporal logic specifications,” in AIAA Scitech 2021 Forum, p. 1123, 2021. [7] L. Lindemann and D. Dimarogonas, “Control barrier functions for multi-agent systems under conflicting local signal temporal logic tasks,” IEEE control systems letters, vol. 3, no. 3, pp. 757–762, 2019. [8] M. Srinivasan, S. Coogan, and M. Egerstedt, “Control of multi-agent systems with finite time control barrier certificates and temporal logic,” in 2018 IEEE Conference on Decision and Control (CDC), pp. 1991–1996, IEEE, 2018. [9] S. Kalluraya, G. Pappas, and Y. Kantaros, “Multi-robot mission planning in dynamic semantic environments,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 1630–1637, IEEE, 2023. [10] B. Hoxha and G. Fainekos, “Planning in dynamic environments through temporal logic monitoring.,” in AAAI Workshop: Planning for Hybrid Systems, vol. 16, p. 12, 2016. [11] Y. Li, E. Shahrivar, and J. Liu, “Safe linear temporal logic motion planning in dynamic environments,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9818–9825, IEEE, 2021. [12] M. Kloetzer and C. Mahulea, “LTL planning in dynamic environments,” IFAC Proceedings Volumes, vol. 45, no. 29, pp. 294–300, 2012. [13] A. Ulusoy and C. Belta, “Receding horizon temporal logic control in dynamic environments,” The International Journal of Robotics Research, vol. 33, no. 12, pp. 1593–1607, 2014. [14] A. Ayala, S. Andersson, and C. Belta, “Temporal logic control in dynamic environments with probabilistic satisfaction guarantees,” in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3108–3113, IEEE, 2011. 19 [15] P. Purohit and I. Saha, “Dt: Temporal logic path planning in a dynamic environment,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3627– 3634, IEEE, 2021. [16] L. Lindemann, M. Cleaveland, G. Shim, and G. Pappas, “Safe planning in dynamic environments using conformal prediction,” IEEE Robotics and Automation Letters, 2023. [17] A. Dixit, L. Lindemann, S. Wei, M. Cleaveland, G. Pappas, and J. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” in Learning for Dynamics and Control Conference, pp. 300–314, PMLR, 2023. [18] S. Tonkens, S. Sun, R. Yu, and S. Herbert, “Scalable safe long-horizon planning in dynamic environments leveraging conformal prediction and temporal correlations,” [19] A. Muthali, H. Shen, S. Deglurkar, M. Lim, R. Roelofs, A. Faust, and C. Tomlin, “Multi-agent reachability calibration with conformal prediction,” arXiv preprint arXiv:2304.00432, 2023. [20] O. Maler and D. Nickovic, “Monitoring temporal properties of continuous signals,” in Conference on Formal Techniques, Modelling and Analysis of Timed and Fault Tolerant Systems, pp. 152–166, Springer, 2004. [21] G. Fainekos and G. Pappas, “Robustness of temporal logic specifications for continuous-time signals,” Theoretical Computer Science, vol. 410, no. 42, pp. 4262–4291, 2009. [22] A. Donzé and O. Maler, “Robust satisfaction of temporal logic over real-valued signals,” in International Conference on Formal Modeling and Analysis of Timed Systems, pp. 92–106, Springer, 2010. [23] Y. V. Pant, H. A., and R. Mangharam, “Smooth operator: Control using the smooth robustness of temporal logic,” in 2017 IEEE Conference on Control Technology and Applications (CCTA), pp. 1235–1240, IEEE, 2017. [24] Y. Pant, H. Abbas, R. Quaye, and R. Mangharam, “Fly-by-logic: Control of multi-drone fleets with temporal logic objectives,” in 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS), pp. 186–197, IEEE, 2018. [25] N. Mehdipour, C. Vasile, and C. Belta, “Arithmetic-geometric mean robustness for control from signal temporal logic specifications,” in American Control Conference, pp. 1690–1695, 2019. [26] Y. Gilpin, V. Kurtz, and H. Lin, “A smooth robustness measure of signal temporal logic for symbolic control,” IEEE Control Systems Letters, vol. 5, no. 1, pp. 241–246, 2020. [27] V. Raman, A. Donzé, M. Maasoumy, R. Murray, A. Sangiovanni-Vincentelli, and S. Seshia, “Model predictive control with signal temporal logic specifications,” in 53rd IEEE Conference on Decision and Control, pp. 81–87, IEEE, 2014. [28] S. Sadraddini and C. Belta, “Formal synthesis of control strategies for positive monotone systems,” IEEE Transactions on Automatic Control, vol. 64, no. 2, pp. 480–495, 2018. [29] V. Kurtz and H. Lin, “Mixed-integer programming for signal temporal logic with fewer binary variables,” IEEE Control Systems Letters, vol. 6, pp. 2635–2640, 2022. 20 [30] X. Yu, C. Wang, D. Yuan, S. Li, and X. Yin, “Model predictive control for signal temporal logic specifications with time interval decomposition,” arXiv preprint arXiv:2211.08031, 2022. [31] L. Lindemann and D. Dimarogonas, “Control barrier functions for signal temporal logic tasks,” IEEE control systems letters, vol. 3, no. 1, pp. 96–101, 2018. [32] M. Charitidou and D. Dimarogonas, “Receding horizon control with online barrier function design under signal temporal logic specifications,” IEEE Transactions on Automatic Control, 2022. [33] W. Xiao, C. Belta, and C. Cassandras, “High order control lyapunov-barrier functions for temporal logic specifications,” in 2021 American Control Conference (ACC), pp. 4886–4891, IEEE, 2021. [34] S. Farahani, R. Majumdar, V. Prabhu, and S. Soudjani, “Shrinking horizon model predictive control with chance-constrained signal temporal logic specifications,” in American Control Conference, pp. 1740–1746, 2017. [35] S. Jha, V. Raman, D. Sadigh, and S. Seshia, “Safe autonomy under perception uncertainty using chance-constrained temporal logic,” Journal of Automated Reasoning, vol. 60, pp. 43–62, 2018. [36] D. Sadigh and A. Kapoor, “Safe control under uncertainty with probabilistic signal temporal logic,” in Proceedings of Robotics: Science and Systems XII, 2016. [37] R. Ilyes, Q. Ho, and M. Lahijanian, “Stochastic robustness interval for motion planning with signal temporal logic,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 5716–5722, IEEE, 2023. [38] C. Vasile, V. Raman, and S. Karaman, “Sampling-based synthesis of maximally-satisfying controllers for temporal logic specifications,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3840–3847, IEEE, 2017. [39] G. Scher, S. Sadraddini, A. Yadin, and H. Kress-Gazit, “Ensuring reliable robot task performance through probabilistic rare-event verification and synthesis,” arXiv preprint arXiv:2304.14886, 2023. [40] P. Jagtap, S. Soudjani, and M. Zamani, “Formal synthesis of stochastic systems via control barrier certificates,” IEEE Transactions on Automatic Control, vol. 66, no. 7, pp. 3097–3110, 2020. [41] S. Safaoui, L. Lindemann, D. Dimarogonas, I. Shames, and T. Summers, “Control design for risk-based signal temporal logic specifications,” IEEE Control Systems Letters, vol. 4, no. 4, pp. 1000–1005, 2020. [42] P. Akella, A. Dixit, M. Ahmadi, J. Burdick, and A. Ames, “Sample-based bounds for coherent risk measures: Applications to policy synthesis and verification,” arXiv preprint arXiv:2204.09833, 2022. [43] N. Hashemi, X. Qin, J. Deshmukh, G. Fainekos, B. Hoxha, D. Prokhorov, and T. Yamaguchi, “Risk-awareness in learning neural controllers for temporal logic objectives,” in 2023 American Control Conference (ACC), pp. 4096–4103, IEEE, 2023. 21 [44] L. Lindemann, G. Pappas, and D. Dimarogonas, “Reactive and risk-aware control for signal temporal logic,” IEEE Transactions on Automatic Control, vol. 67, no. 10, pp. 5262–5277, 2021. [45] D. Gundana and H. Kress-Gazit, “Event-based signal temporal logic synthesis for single and multi-robot tasks,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3687–3694, 2021. [46] D. Gundana and H. Kress-Gazit, “Event-based signal temporal logic tasks: Execution and feedback in complex environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10001–10008, 2022. [47] V. Raman, A. Donzé, D. Sadigh, R. Murray, and S. Seshia, “Reactive synthesis from signal temporal logic specifications,” in Proceedings of the 18th international conference on hybrid systems: Computation and control, pp. 239–248, 2015. [48] G. Shafer and V. Vovk, “A tutorial on conformal prediction.,” Journal of Machine Learning Research, vol. 9, no. 3, 2008. [49] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic learning in a random world, vol. 29. Springer, 2005. [50] A. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distributionfree uncertainty quantification,” arXiv preprint arXiv:2107.07511, 2021. [51] M. Fontana, G. Zeni, and S. Vantini, “Conformal prediction: a unified review of theory and new challenges,” Bernoulli, vol. 29, no. 1, pp. 1–23, 2023. [52] S. Yang, G. Pappas, R. Mangharam, and L. Lindemann, “Safe perception-based control under stochastic sensor uncertainty using conformal prediction,” arXiv preprint arXiv:2304.00194, 2023. [53] S. Su, S. Han, Y. Li, Z. Zhang, C. Feng, C. Ding, and F. Miao, “Collaborative multi-object tracking with conformal uncertainty propagation,” arXiv preprint arXiv:2303.14346, 2023. [54] M. Cauchois, S. Gupta, A. Ali, and J. Duchi, “Robust validation: Confident predictions even when distributions shift,” arXiv preprint arXiv:2008.04267, 2020. [55] I. Gibbs and E. Candes, “Adaptive conformal inference under distribution shift,” Advances in Neural Information Processing Systems, vol. 34, pp. 1660–1672, 2021. [56] A. Donzé and O. Maler, “Robust satisfaction of temporal logic over real-valued signals,” in Formal Modeling and Analysis of Timed Systems, pp. 92–106, 2010. [57] A. Dokhanchi, B. Hoxha, and G. Fainekos, “On-line monitoring for temporal logic robustness,” in Runtime Verification, pp. 231–246, Springer, 2014. [58] S. Sadraddini and C. Belta, “Robust temporal logic model predictive control,” in Annual Allerton Conference on Communication, Control, and Computing, pp. 772–779, IEEE, 2015. [59] H. Salehinejad, S. Sankar, J. Barfett, E. Colak, and S. Valaee, “Recent advances in recurrent neural networks,” arXiv preprint arXiv:1801.01078, 2017. [60] Y. Yu, X. Si, C. Hu, and J. Zhang, “A review of recurrent neural networks: Lstm cells and network architectures,” Neural computation, vol. 31, no. 7, pp. 1235–1270, 2019. 22 [61] K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in transformer,” Advances in Neural Information Processing Systems, vol. 34, pp. 15908–15919, 2021. [62] R. Tibshirani, R. Foygel Barber, E. Candes, and A. Ramdas, “Conformal prediction under covariate shift,” Advances in neural information processing systems, vol. 32, 2019. [63] K. Stankeviciute, A. M Alaa, and M. van der Schaar, “Conformal time-series forecasting,” Advances in neural information processing systems, vol. 34, pp. 6216–6228, 2021. [64] M. Cleaveland, I. Lee, G. Pappas, and L. Lindemann, “Conformal prediction regions for time series using linear complementarity programming,” arXiv preprint arXiv:2304.01075, 2023. [65] I. Gibbs, J. Cherian, and E. Candès, “Conformal prediction with conditional guarantees,” arXiv preprint arXiv:2305.12616, 2023. [66] A. Bemporad and M. Morari, “Control of systems integrating logic, dynamics, and constraints,” Automatica, vol. 35, no. 3, pp. 407–427, 1999. [67] S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004. [68] L. Lindemann, X. Qin, J. V. Deshmukh, and G. Pappas, “Conformal prediction for STL runtime verification,” in Proceedings of the ACM/IEEE 14th International Conference on CyberPhysical Systems (with CPS-IoT Week 2023), pp. 142–153, 2023. [69] K. Bestuzheva, M. Besançon, W. Chen, A. Chmiela, T. Donkiewicz, J. van Doornmalen, L. Eifler, O. Gaul, G. Gamrath, A. Gleixner, et al., “Enabling research through the scip optimization suite 8.0,” ACM Transactions on Mathematical Software, vol. 49, no. 2, pp. 1–21, 2023. [70] M. Vollmer, “Newton’s law of cooling revisited,” European Journal of Physics, vol. 30, no. 5, p. 1063, 2009. [71] L. Lindemann and D. Dimarogonas, “Robust motion planning employing signal temporal logic,” in American Control Conference, pp. 2950–2955, IEEE, 2017. [72] X. Yu, X. Yin, and L. Lindemann, “Efficient STL control synthesis under asynchronous temporal robustness constraints,” arXiv preprint arXiv:2307.12855, 2023. 23