| Crates.io | train-station |
| lib.rs | train-station |
| version | 0.3.0 |
| created_at | 2025-08-29 00:01:03.387544+00 |
| updated_at | 2025-09-29 22:47:56.661827+00 |
| description | A high-performance, PyTorch inspired, zero dependency Rust machine learning library |
| homepage | https://github.com/ewhinery8/train-station |
| repository | https://github.com/ewhinery8/train-station |
| max_upload_size | |
| id | 1814919 |
| size | 2,707,840 |
A zero-dependency, PyTorch-inspired, maximum-performance Rust machine learning library.
Pre-1.0 notice: The public API is still evolving. Until 1.0, breaking changes may occur in minor releases (e.g., 0.x → 0.(x+1)). Pin versions accordingly if you need stability.
Train Station’s purpose is to advance research. It provides low-level control and simple, composable building blocks so you can construct larger objects and full networks with confidence. We aim to be a solid foundation for the next generation of AI architectures, training procedures, and systems.
Note on data types: the core currently targets f32 tensors. We will expand to additional data types over time.
use train_station::{Tensor, Device, Adam};
let x = Tensor::randn(vec![32, 784], None);
let w = Tensor::randn(vec![784, 128], None).with_requires_grad();
let b = Tensor::zeros(vec![128]).with_requires_grad();
let y = x.matmul(&w).add_tensor(&b).relu();
let loss = y.sum();
loss.backward(None);
let mut opt = Adam::new();
opt.add_parameters(&[&w, &b]);
opt.step(&mut [&mut w, &mut b]);
examples/ folder:
Featured runnable examples (quick start)
Neural networks (building blocks)
cargo run --release --example basic_linear_layercargo run --release --example feedforward_networkexamples/neural_networks/*Supervised learning
cargo run --release --example supervised_bcecargo run --release --example supervised_regressioncargo run --release --example supervised_classificationReinforcement learning (small YardEnv control tasks)
cargo run --release --example dqncargo run --release --example td3cargo run --release --example ppo_continuouscargo run --release --example ppo_discreteWhat these examples demonstrate
Tip: run with --release for speed. Some RL examples support env vars (e.g., DQN_STEPS, PPO_STEPS) to adjust runtime.
For the most up-to-date notes:
Latest: https://github.com/ewhinery8/train-station/releases/latest
All releases (browse recent three): https://github.com/ewhinery8/train-station/releases
Why it stands out
How it works
Controls: with_no_mem_pool forces system allocation;
with_no_mem_pool for those allocations.Why it stands out
How it works
as_strided/slices stay in-bounds; offsets validated before construction.Why it stands out
How it works
Why it stands out
How it works
retain_grad, grad_or_fetch, and clear_* helpers manage lifecycle deterministically.Why it stands out
How it works
| Category | Ops | Broadcasting | SIMD | Autograd |
|---|---|---|---|---|
| Element-wise | add, sub, mul, div |
Yes (NumPy rules) | AVX2 (runtime dispatch) | Yes |
| Activations | relu, leaky_relu, sigmoid, tanh, softmax |
N/A (shape-preserving) | ReLU/SQRT paths SIMD where applicable | Yes |
| Math | exp, log, sqrt, pow |
N/A | sqrt SIMD; others optimized scalar |
Yes |
| Matrix | matmul |
Yes (batched ND) | AVX512/AVX2/SSE2 kernels | Yes |
| Transforms | reshape, transpose, slice, as_strided, element_view |
Zero-copy views | N/A | Yes (view mappings) |
Notes:
Real-world, apples-to-apples comparisons vs LibTorch (CPU):





[dependencies]
train-station = "0.2"
For detailed platform matrices, cross-compilation, and feature flags, see the original README.md.
cuda feature is experimental and not ready for general use. It currently exposes scaffolding only; CPU is the supported path. Expect breaking changes while this area evolves.f32 while preserving ergonomics and speed.— Built for speed. Validated for correctness. Iterate faster.