ds-r1-rs

Crates.iods-r1-rs
lib.rsds-r1-rs
version0.1.1
created_at2025-08-29 15:29:31.49868+00
updated_at2025-08-29 15:55:30.157548+00
descriptionA DeepSeek R1-inspired reasoning model prototype in Rust
homepagehttps://github.com/k5602/ds-r1_rs
repositoryhttps://github.com/k5602/ds-r1_rs
max_upload_size
id1816023
size812,448
Leibniz (k5602)

documentation

https://docs.rs/ds-r1-rs

README

🧠 DeepSeek R1 (Rust) β€” Research-Grade Reasoning Model Prototype

Crates.io docs.rs CI Docs License: MIT

A Rust implementation of a DeepSeek R1–inspired reasoning model focused on clarity, testability, and strong engineering practices. This project is designed to be an impressive portfolio piece: it includes a modular transformer architecture, reasoning-aware inference, evaluation harness, examples, comprehensive tests, and CI.

Highlights:

  • Fully-typed Rust 2024 crate with modules for model, inference, training, and utilities
  • Transformer stack with rotary embeddings, standard attention, pre-norm layers, and an LM head
  • MLA (Multi-head Latent Attention) and MoE (Mixture of Experts) components implemented and tested
  • Reasoning-aware generation pipeline with … parsing and structured analysis
  • Evaluation harness for benchmarks across math, logic, programming, and general reasoning
  • Examples that compile and run via cargo
  • GitHub Actions CI with fmt, clippy, build, unit + integration tests, benchmarks (artifacts), and docs publishing

πŸš€ Quick Start

Prerequisites: Rust (stable), Cargo.

Build:

cargo build

Run CLI:

# Help (shows available commands)
cargo run

Core commands:

# Show default model configuration
cargo run -- config

# Show version and build info
cargo run -- version

# Run basic checks and smoke tests
cargo run -- test

# Generate text from a prompt (uses simple model forward)
cargo run -- generate "Explain Rust ownership in simple terms"

# Evaluate reasoning benchmarks (math, logic, programming, general)
cargo run -- eval

# Export evaluation results as JSON (for dashboards)
cargo run -- eval --json > results.json

# Save current model weights (full)
cargo run -- save-weights ckpt.json

# Save only lm_head parameters; exclude embeddings
cargo run -- save-weights ckpt.json --include lm_head --exclude embeddings

# Save a small demo-size checkpoint (size-conscious)
cargo run -- save-weights ckpt-small.json --demo-small

# Load weights and generate deterministically (temperature=0)
cargo run -- load-weights ckpt.json "Explain Rust ownership"

# Load only lm_head from checkpoint, allowing missing others
cargo run -- load-weights ckpt.json --allow-missing --include lm_head "Explain Rust ownership"

Run examples:

cargo run --example config_demo
cargo run --example generation_demo
cargo run --example math_solver_demo
cargo run --example training_demo

Tests + checks:

# Unit + doc tests
cargo test

# Integration tests (CLI)
cargo test --test cli_integration
# Optional heavier integration tests
cargo test --test cli_integration -- --ignored

# Lints/format
cargo clippy --all-targets -- -D warnings
cargo fmt --all -- --check

# Benchmarks (Criterion)
cargo bench --bench decoding -- --warm-up-time 0.5 --measurement-time 10

🧰 Devcontainer

A ready-to-use devcontainer is provided at .devcontainer/devcontainer.json for reproducible development with VS Code or compatible editors.

Requirements:

  • Docker (or a compatible container runtime)
  • VS Code with the β€œDev Containers” extension (or an equivalent)

Usage:

  1. Open the project folder in VS Code.
  2. When prompted, choose β€œReopen in Container” (or use the Command Palette: β€œDev Containers: Reopen in Container”).
  3. The container installs Rust stable, rustfmt, clippy, llvm-tools, and utilities like cargo-tarpaulin and cargo-criterion.

Common commands inside the devcontainer:

# Run unit + integration tests
cargo test
cargo test --test cli_integration
cargo test --test cli_integration -- --ignored

# Lints/format
cargo clippy --all-targets -- -D warnings
cargo fmt --all -- --check

# Benchmarks (Criterion)
cargo bench --bench decoding -- --warm-up-time 0.5 --measurement-time 10

Notes:

  • Cargo registries are cached via container volumes for faster builds.
  • The environment enables colored output and backtraces by default.

🐳 Docker Usage

A minimal Dockerfile is included for reproducible builds and tests.

Build the image:

docker build -t ds-r1-rs:latest .

Run the CLI:

# Print version
docker run --rm ds-r1-rs:latest version

# Show default config
docker run --rm ds-r1-rs:latest config

# Generate text
docker run --rm ds-r1-rs:latest generate "Explain Rust ownership"

Mount the project and run from source (optional):

# Evaluate and export JSON results using the container runtime
docker run --rm -v "$PWD":/work -w /work ds-r1-rs:latest ds-r1-rs eval --json

Run tests inside the container:

# Use the built toolchain to run tests against your mounted workspace
docker run --rm -v "$PWD":/work -w /work ds-r1-rs:latest bash -lc "cargo test --all --release --locked"

Tip:

  • For faster local iteration, you can keep the container warm and re-run commands without rebuilding the image unless dependencies change.

πŸ—οΈ Project Structure


β”œβ”€β”€ ds-r1_rs/                 # Rust crate
β”‚   β”œβ”€β”€ Cargo.toml
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ main.rs           # CLI: config/version/test/generate/eval
β”‚   β”‚   β”œβ”€β”€ lib.rs            # Public crate API & re-exports
β”‚   β”‚   β”œβ”€β”€ model/            # Core model components
β”‚   β”‚   β”‚   β”œβ”€β”€ config.rs     # Model configuration & validation
β”‚   β”‚   β”‚   β”œβ”€β”€ transformer.rs# Transformer stack + LM head (implemented)
β”‚   β”‚   β”‚   β”œβ”€β”€ attention.rs  # Standard attention + MLA + Linear
β”‚   β”‚   β”‚   β”œβ”€β”€ layers.rs     # Pre-norm TransformerLayer, FFN (SwiGLU), LayerNorm
β”‚   β”‚   β”‚   β”œβ”€β”€ embeddings.rs # Token + Rotary embeddings
β”‚   β”‚   β”‚   └── moe.rs        # Mixture of Experts (router, experts, load balancing)
β”‚   β”‚   β”œβ”€β”€ inference/        # Inference & reasoning
β”‚   β”‚   β”‚   β”œβ”€β”€ engine.rs     # InferenceEngine + high-level solve/explain APIs
β”‚   β”‚   β”‚   β”œβ”€β”€ generation.rs # Text generation & configs (KV cache placeholder)
β”‚   β”‚   β”‚   β”œβ”€β”€ sampling.rs   # Greedy/temperature/top-k sampling
β”‚   β”‚   β”‚   β”œβ”€β”€ reasoning.rs  # <think> parsing, states, analysis
β”‚   β”‚   β”‚   β”œβ”€β”€ math_solver.rs# Structured math solver utilities
β”‚   β”‚   β”‚   └── code_analyzer.rs
β”‚   β”‚   β”œβ”€β”€ training/         # Training infrastructure (supervised + RL scaffolding)
β”‚   β”‚   β”‚   β”œβ”€β”€ data.rs       # Datasets + loaders + synthetic generator
β”‚   β”‚   β”‚   β”œβ”€β”€ loss.rs       # CrossEntropy + metrics
β”‚   β”‚   β”‚   β”œβ”€β”€ optimizer.rs  # Adam optimizer
β”‚   β”‚   β”‚   └── trainer.rs    # BasicTrainer + RLTrainer (REINFORCE scaffolding)
β”‚   β”‚   └── utils/            # Errors, math, tokenizer, evaluation harness
β”‚   └── examples/             # Ready-to-run demos
β”‚       β”œβ”€β”€ generation_demo.rs
β”‚       β”œβ”€β”€ math_solver_demo.rs
β”‚       β”œβ”€β”€ training_demo.rs
β”‚       └── config_demo.rs
    └── .github/workflows/ci.yml  # CI: build, lint, test, examples, coverage

🧩 What’s Implemented

  • Model
    • Token embeddings (+ scaling), Rotary embedding (RoPE)
    • Transformer layers with pre-norm and residuals
    • Standard multi-head attention with causal masking
    • Feed-forward with SwiGLU activation
    • Final layer norm + LM head (Linear)
    • Forward pass returning flattened logits [seq_len * vocab_size]
  • Advanced Modules (standalone, tested)
    • MLA (Multi-head Latent Attention) with compressed KV via LoRA-style compression
    • Mixture of Experts (experts, router, load balancer)
  • Inference & Reasoning
    • InferenceEngine with text generation APIs
    • Reasoning-aware generation with … support
    • Reasoning chain parsing, analysis, and structured outputs
  • Evaluation
    • EvaluationHarness to run curated benchmarks (math, logic, programming, science, general)
    • Per-problem metrics, performance placeholders, category & difficulty breakdowns
  • Training (Prototype)
    • Basic supervised training scaffold (cross-entropy)
    • RL training scaffold (REINFORCE with a simple reward function)
  • Utilities
    • Tokenizer powered by tiktoken-rs (BPE), math helpers, error handling (thiserror)
  • Engineering
    • Unit tests across modules
    • CI (fmt, clippy, build, test, run examples, coverage with tarpaulin)
    • Examples showing end-to-end flows

πŸ§ͺ Usage Examples

Programmatic usage:

use ds_r1_rs::{
    model::{ModelConfig, DeepSeekR1Model},
    inference::engine::InferenceEngine,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build model with validated config
    let config = ModelConfig::default();
    let model = DeepSeekR1Model::new(config)?;

    // Inference engine with default generation configs
    let mut engine = InferenceEngine::new(model)?;

    // Basic generation
    let text = engine.generate_text("The Rust language is")?;
    println!("Generated: {}", text);

    // Reasoning-aware generation
    let reasoning = engine.generate_with_reasoning("Explain ownership in Rust")?;
    println!("Steps: {:?}", reasoning.thinking_chain);
    println!("Answer: {}", reasoning.final_answer);

    Ok(())
}

CLI usage (quick):

cargo run -- config
cargo run -- generate "List 3 benefits of static typing"
cargo run -- eval
cargo run -- eval --json > results.json
cargo run -- save-weights ckpt.json
cargo run -- load-weights ckpt.json "Explain Rust ownership"

πŸ’Ύ Checkpointing & Reproducibility

You can save/load checkpoints in JSON v1 format. Partial save/load is supported via name-prefix filters. For size-conscious artifacts, use the demo-small model configuration.

Examples:

# Full save
cargo run -- save-weights ckpt.json

# Partial save/load only lm_head.*
cargo run -- save-weights ckpt.json --include lm_head
cargo run -- load-weights ckpt.json --allow-missing --include lm_head "Your prompt"

# Size-conscious small artifact
cargo run -- save-weights ckpt-small.json --demo-small
cargo run -- load-weights ckpt-small.json --demo-small "Your prompt"

# Deterministic generation (temperature=0 applied automatically in load-weights flow)
cargo run -- load-weights ckpt.json "Explain Rust ownership"

🧠 How Reasoning Works Here

This prototype uses special thinking tokens and a reasoning state machine to parse and structure β€œthoughts” during generation:

  • The generator can produce <think> ... </think> sections.
  • The ReasoningEngine tracks states (Normal/Thinking/Answering), captures steps, and produces a ReasoningOutput.
  • The EvaluationHarness aggregates metrics (accuracy, clarity, verification presence) across curated benchmarks and reports performance and breakdowns.

βš™οΈ Implementation Notes

  • The transformer forward is implemented and functional:
    • Embeddings β†’ N Γ— TransformerLayer β†’ FinalNorm β†’ LM Head
    • Standard attention uses RoPE and causal masking.
    • Output shape is flattened [seq_len * vocab_size] for simple integration with training and demos.
  • MLA and MoE are integrated into the Transformer stack via config toggles (Standard|MLA attention, Dense|MoE FFN), with support for mixed-depth patterns (e.g., periodic MLA/MoE) and telemetry (compression, routing).
  • Generation includes sampling strategies (greedy, temperature, top-k) and incremental decoding with a per-layer KV cache; tokens/sec is reported in CLI and evaluation.
  • Training code is intentionally conservativeβ€”scaffolding and examples demonstrate APIs, not production SGD for large checkpoints.

πŸ”¬ Benchmarks & Evaluation

Use:

cargo run -- eval

This runs curated reasoning benchmarks via the EvaluationHarness:

  • Mathematics (arithmetic, algebra, word problems, equations)
  • Logical reasoning
  • Programming logic
  • Science reasoning
  • General reasoning

Metrics reported:

  • Accuracy proxy with numeric tolerance for math answers
  • Reasoning depth, clarity, verification presence
  • Tokens/sec and reasoning overhead

🧭 Roadmap

What’s next (post v0.1)

  • Inference
    • True streaming token-by-token callbacks
    • Beam search, top-p sampling, and repetition penalty with full history
  • Architecture
    • Additional telemetry for MLA compression and MoE routing balance
    • Configurable dropouts, norms, activations; adapter/residual options for MLA paths
  • Training
    • Extend backward pass beyond LM head/embeddings; broader parameter updates
    • Mixed precision and larger-batch experiments
  • Evaluation
    • Exact-match datasets and code execution-based tasks
    • Richer telemetry and standardized result schemas
  • Tooling
    • More integration tests and benchmark coverage

🧰 CI/CD

GitHub Actions workflow runs on PRs and main:

  • rustfmt, clippy (CI runs warn-only; locally recommend -D warnings)
  • build + unit and integration tests
  • run examples and Criterion benchmarks (artifacts uploaded)
  • coverage via tarpaulin (artifacts) and docs published (docs.rs per release, GitHub Pages via workflow)

🀝 Contributing

This is a research/education project. Issues and PRs are welcome. Please:

  • Keep code modular, documented, and tested
  • Maintain CI green (fmt, clippy, tests)
  • Include examples or docs for new features

πŸ“„ License

MIT β€” see the crate manifest for details.

Made with insistence by Khaled.

Commit count: 24

cargo fmt