| Crates.io | blazr |
| lib.rs | blazr |
| version | 0.1.0-beta.1 |
| created_at | 2025-12-05 10:30:59.236553+00 |
| updated_at | 2025-12-05 10:30:59.236553+00 |
| description | Blazing-fast inference server for oxidizr models (Mamba2 + MLA + MoE) |
| homepage | https://github.com/farhan-syah/blazr |
| repository | https://github.com/farhan-syah/blazr |
| max_upload_size | |
| id | 1967989 |
| size | 338,957 |
A blazing-fast inference server for hybrid neural architectures, supporting Mamba2 SSM, Multi-Head Latent Attention (MLA), Mixture of Experts (MoE), and standard transformers.
/v1/completions and /v1/chat/completions endpoints# Clone the repository
git clone https://github.com/farhan-syah/blazr.git
cd blazr
# Build (CPU-only)
cargo build --release
# Build with CUDA support (requires CUDA 12.x)
cargo build --release --features cuda
blazr generate \
--model ./checkpoints/nano \
--prompt "Once upon a time" \
--max-tokens 100
blazr serve --model ./checkpoints/nano --port 8080
Then make API requests:
curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "Hello, world!",
"max_tokens": 50,
"temperature": 0.7
}'
blazr info --model ./checkpoints/nano
blazr auto-detects and supports:
Models can mix and match these layer types freely.
# Generate text from a prompt
blazr generate --model <path> --prompt "text" [OPTIONS]
# Start inference server
blazr serve --model <path> [--port 8080] [--host 0.0.0.0]
# Display model configuration
blazr info --model <path>
--max-tokens - Maximum tokens to generate (default: 100)--temperature - Sampling temperature (default: 0.7)--top-p - Nucleus sampling threshold (default: 0.9)--top-k - Top-k sampling (default: 40)--cpu - Force CPU inference even if CUDA is availableblazr loads models from SafeTensors checkpoints:
checkpoint_dir/
├── model.safetensors # Model weights
└── config.json # Model configuration (optional)
If config.json is missing, blazr will auto-detect the architecture from tensor names.
Apache-2.0 License - see LICENSE for details.
Contributions are welcome! Please open an issue or submit a pull request.