| Crates.io | gpu-mumu |
| lib.rs | gpu-mumu |
| version | 0.2.0-rc.1 |
| created_at | 2025-08-14 12:18:22.829913+00 |
| updated_at | 2025-10-03 15:43:28.155674+00 |
| description | GPU/Vulkan matrix and tensor operations for the mumu/lava language |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1794790 |
| size | 63,030 |
A MuMu/Lava plugin that adds matrix & tensor operations with an optional Vulkan backend — and a zero-drama CPU fallback when no GPU is available.
Crate: gpu-mumu
Library name (cdylib): mumugpu → built as libmumugpu.{so|dylib} (Windows: mumugpu.dll)
Version: 0.2.0-rc.1
Engine compatibility: core-mumu = 0.9.0-rc.3
License: MIT OR Apache-2.0
Repository: https://gitlab.com/tofo/gpu-mumu
Homepage: https://lava.nu11.uk
Under the hood the crate ships GLSL compute shaders (built to SPIR-V if glslc is available at build time) alongside robust CPU implementations to guarantee portability.
Load the plugin and multiply two matrices:
extend("gpu")
A = [
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]
]
B = [
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13,14, 15, 16]
]
AT = gpu:to_tensor(A) // validate & convert to Float2DArray
BT = gpu:to_tensor(B)
CT = gpu:multiply(AT, BT) // (4×4) · (4×4) -> (4×4)
slog(gpu:to_array(CT)) // -> [[1,2,3,4], [5,6,7,8], ...]
The loader resolves
extend("gpu")to a shared library namedlibmumu**gpu**.{so|dylib}(Windows:mumu**gpu**.dll) using the search paths documented by the core engine.
All functions are registered as dynamic MuMu functions when the plugin is loaded.
Types below are MuMu runtime types from core-mumu:
| Function | Signature | Returns | Notes |
|---|---|---|---|
gpu:to_tensor |
`(Int2DArray | Float2DArray)` | Float2DArray |
gpu:to_array |
(Float2DArray) |
Float2DArray |
Identity helper (useful to signal intent when composing). |
gpu:multiply |
(Float2DArray A, Float2DArray B) |
Float2DArray |
Matrix product (m×k) · (k×n) -> (m×n). Errors on ragged rows or incompatible dimensions. |
gpu:add |
(Float2DArray A, Float2DArray B) |
Float2DArray |
Elementwise sum. Shapes must match exactly. |
gpu:subtract |
(Float2DArray A, Float2DArray B) |
Float2DArray |
Elementwise difference. Shapes must match. |
gpu:hadamard |
(Float2DArray A, Float2DArray B) |
Float2DArray |
Elementwise product (Hadamard). Shapes must match. |
gpu:transpose |
(Float2DArray T) |
Float2DArray |
Transpose m×n -> n×m. Validates rectangular rows. |
gpu:inverse |
(Float2DArray T) |
Float2DArray (2×2) |
Only 2×2 currently. Errors if singular or wrong size. |
gpu:reduce_sum |
(Float2DArray T) |
Float |
Sum of all elements. |
gpu:scale |
`(Int | Float scalar, Float2DArray T)` | Float2DArray |
| Function | Signature | Returns | Notes |
|---|---|---|---|
gpu:last_call |
() |
KeyedArray { op: string, used_gpu: bool } |
Inspects the last GPU function call. used_gpu indicates whether a Vulkan context was active for that call (some ops currently run on CPU even if a context exists). |
extend("gpu"), the plugin tries to create a Vulkan device using ash.gpu:last_call() makes this explicit.Float2DArray in the core runtime.gpu:to_tensor acts as an ingest gate: it validates rectangular shapes and
normalizes ints to floats, so the rest of the API can assume dense float
matrices. Most ops will error on ragged rows or mismatched shapes.AshVulkanContext is stored in a global Arc<Mutex<Option<_>>>.This crate builds a cdylib for dynamic loading. Typical flows:
# Build with Cargo (release)
cargo build --release
# Or use the provided Makefile (build + copy .so to /usr/local/lib)
make
sudo make install
Vulkan & shader notes
- A working Vulkan loader/runtime enables the GPU context.
- If
glslcis inPATH,build.rscompiles shaders inshader/*.glslto SPIR-V and embeds them; otherwise the build continues with a warning.- The plugin remains fully functional on CPU without glslc or GPU drivers.
core-mumu = 0.9.0-rc.3 (dynamic function registry, MuMu Value types).ash = 0.38 (optional at runtime; CPU works without GPU).anyhow, log, env_logger, lazy_static, indexmap, libloading.Web/WASM is not a target for this crate (host-only by design).
extend("gpu") prints “plugin could not be located”
→ Ensure libmumugpu.{so|dylib|dll} is on a loader search path
(core engine looks in common system locations and $MUMU_PLUGIN_PATH).
“No Vulkan physical devices found” on load
→ That’s OK. The plugin will use the CPU reference path.
Want to see what happened?
RUST_LOG=info to see setup logs from the Vulkan context.LAVA_TIMING_VERBOSE=1 to make the core REPL/driver print timing ticks.gpu:last_call() to inspect op and used_gpu.Elementwise operations and reductions:
extend("gpu")
T1 = gpu:to_tensor([[1,2,3],[4,5,6]])
T2 = gpu:to_tensor([[6,5,4],[3,2,1]])
slog(gpu:add(T1, T2)) // -> [[7,7,7],[7,7,7]]
slog(gpu:hadamard(T1, T2)) // -> [[6,10,12],[12,10,6]]
slog(gpu:reduce_sum(T1)) // -> 21
slog(gpu:scale(0.5, T1)) // -> [[0.5,1,1.5],[2,2.5,3]]
Matrix multiply and transpose:
extend("gpu")
A = gpu:to_tensor([[1,2],[3,4]]) // 2×2
B = gpu:to_tensor([[4,3],[2,1]]) // 2×2
C = gpu:multiply(A, B) // -> 2×2
slog(gpu:to_array(gpu:transpose(C)))
Examples intentionally stay small; consult the function table for signatures.
src/lib.rs — dynamic entrypoint (Cargo_lock) and registration.src/registration.rs — registers all gpu:* functions into the engine.src/operators/* — operation bridges & helpers (ensure_float2d, elementwise, conversions).src/cpu_ops.rs — CPU reference implementations (multiply, transpose, reduce, scale, 2×2 inverse).src/vulkan.rs — ash-based Vulkan context initialisation.shader/*.glsl — compute kernels (compiled by build.rs if glslc is present).examples/4x4.mu — tiny end-to-end sample script.This crate follows pre-release semver while the MuMu/Lava engine evolves.
The API is expected to stabilise with the 0.2.x series.
Licensed under either of:
at your option.
Built for the MuMu/Lava ecosystem. Thanks to the ash project and the Vulkan community.
If you have ideas, issues, or want to wire more ops to the GPU kernels, please open an issue or MR at GitLab: https://gitlab.com/tofo/gpu-mumu.