Crates.io | minillm |
lib.rs | minillm |
version | 0.1.1 |
created_at | 2025-09-20 18:28:03.187205+00 |
updated_at | 2025-09-25 17:51:18.083665+00 |
description | A mini inference engine for running transformer language models |
homepage | https://github.com/bmqube/minillm |
repository | https://github.com/bmqube/minillm |
max_upload_size | |
id | 1848106 |
size | 137,443 |
A lightweight, efficient transformer inference engine written in Rust. MiniLLM provides a clean, well-documented implementation of GPT-2 style transformer models with support for text generation.
src/
βββ lib.rs # Library entry point and public API
βββ main.rs # Simple CLI example (clean 27 lines)
βββ inference.rs # High-level inference engine
βββ gpt.rs # GPT model implementation
βββ transformer.rs # Transformer block components
βββ attention.rs # Multi-head attention mechanism
βββ mlp.rs # Feed-forward network layers
βββ tensor.rs # Tensor operations and math
βββ weights.rs # Model weight loading (SafeTensors)
βββ config.rs # Model configuration handling
examples/
βββ basic_generation.rs # Simple text generation
βββ interactive_chat.rs # Interactive chat interface
βββ tokenization.rs # Tokenization examples
use minillm::inference::InferenceEngine;
fn main() -> minillm::Result<()> {
// Load a GPT-2 model
let engine = InferenceEngine::new("openai-community/gpt2")?;
// Generate text
let prompt = "The future of AI is";
let generated = engine.generate(prompt, 20)?;
println!("Generated: {}", generated);
Ok(())
}
# Run the main example
cargo run
# Run specific examples
cargo run --example basic_generation
cargo run --example interactive_chat
cargo run --example tokenization
Set your HuggingFace token:
echo "HF_TOKEN=your_token_here" > .env
ndarray
- Tensor operationssafetensors
- Model weight loadingtokenizers
- Text tokenizationhf-hub
- HuggingFace model downloadingserde
- Configuration parsingThe main high-level interface:
// Create engine
let engine = InferenceEngine::new("openai-community/gpt2")?;
// Generate text
let result = engine.generate("prompt", max_tokens)?;
// Tokenization
let tokens = engine.tokenize("text")?;
let text = engine.decode(&tokens)?;
// Get model info
let config = engine.config();
For custom implementations, you can use the individual components:
GPTModel
- Complete transformer modelTransformerBlock
- Individual transformer layersMultiHeadAttention
- Attention mechanismMLP
- Feed-forward networksTensor
- Mathematical operationscargo run --example basic_generation
Demonstrates simple text generation with model configuration display.
cargo run --example interactive_chat
Interactive command-line chat interface with the model.
cargo run --example tokenization
Shows tokenization, encoding/decoding, and round-trip verification.
MiniLLM is designed for inference efficiency:
# Clone and build
git clone https://github.com/bmqube/minillm
cd minillm
cargo build --release
# Run tests
cargo test
# Check examples
cargo check --examples
# Generate documentation
cargo doc --open
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
BM Monjur Morshed