Crates.io | pure-stage |
lib.rs | pure-stage |
version | 0.1.0 |
created_at | 2025-06-15 07:31:43.83726+00 |
updated_at | 2025-06-15 07:31:43.83726+00 |
description | A library for building and running simulations of distributed systems. |
homepage | https://github.com/pragma-org/amaru |
repository | https://github.com/pragma-org/amaru |
max_upload_size | |
id | 1713041 |
size | 191,827 |
Design goals:
We will need to model state machines, processing network nodes, and wiring between those nodes.
Writing state machines in Rust can be done explicitly using an enum
that implements a trait for computing state × input → state × output
.
One advantage of this approach is that infrastructure like a (virtual) clock can be passed into the transition function as well.
The main disadvantage is that additional states are needed for modelling (biased) input operations, complicating the state machine code.
Another syntactic means to this end is an async fn
, which gets translated by the compiler into a state machine like the one above, albeit with a limited transition function: the Future::poll()
method.
When using this approach, effects like input and output need to be offered as Future
s; the programmer may also await other Futures which perform untracked effects.
Such abuse can only be rejected using runtime checks because the return type of an .await
point cannot be constrained.
The downside of the second approach is that the internal state of a stage would be inaccessible, it is wrapped in the opaque Future
generated by the compiler.
One way to fix this is to compromise: offer all effects apart from receiving input as Future
s and thus have the programmer write an explicit state machine that only needs to switch for inputs from upstream, not results from other effects.
Since logging state progression is a rather important debugging tool, we are going for this variant.
The main purpose of an API for declaring processing nodes is to obtain type-level information on the connectivity expected by this node, i.e. the input message type, the typed outputs, and the internally handled state (for inspection from the outside during tests).
Whether the state machine inside is implemented explicitly or using async fn
doesn't matter at this level.
While Rust can in principle model type-state to track what has already been wired, this is inconvenient in practice because it requires shadowing and thus restricts the code a programmer can write. Therefore, a compromise would be to establish that all processing nodes are declared first, followed by the declaration of the wiring. The wiring function ensures that message types do match, but it won't prevent connecting the same output to multiple inputs or multiple outputs to the same input; in fact, this freedom is quite desirable.
Call
effect to allow internal asynchronous collaboration before accepting the next input (to keep back pressure intact in this case)Clock
and Wait
effects)Storing all effects and responses requires serialization (because trait objects cannot implement Clone
), which in turn requires some gymnastics due to the incompatibility of serde's Serialize
and Deserialize
traits with trait objects.
While typetag
can solve the serialization issue (via erased_serde::Serialize
), deserialization will always require the full and concrete target type to be known plus matching deserialization code.
The solution typetag
provides therefore cannot support generic types, which closes the door on any kind of type parameter occurring within messages, states, and effects in the system.
This is clearly a restriction that weighs too heavily, thus a different solution is required.
The current design shifts all deserialization to places where the concrete target type can be named and where thus the compiler can instantiate a Deserialize
instance.
This unfortunately requires some trace entries to be deserialized multiple times, with information for the simulation machinery being done in the generic parts and then the specific application data type (for messages or effect responses) to be deserialized from the generic cbor4ii::core::Value
on the application side of the airlock; let us give thanks to the universe for schemaless deserialization in this context.