trustformers-core

Crates.iotrustformers-core
lib.rstrustformers-core
version0.1.0-alpha.1
created_at2025-11-09 10:05:10.315025+00
updated_at2025-11-09 10:05:10.315025+00
descriptionCore traits and utilities for TrustformeRS
homepage
repositoryhttps://github.com/cool-japan/trustformers
max_upload_size
id1923929
size4,933,171
KitaSan (cool-japan)

documentation

README

trustformers-core

Core infrastructure crate providing fundamental abstractions and utilities for the TrustformeRS ecosystem.

Current State

This crate is mature and comprehensive, serving as the foundation for all other TrustformeRS components. It provides high-performance tensor operations, layer implementations, and advanced optimization techniques.

Features

Tensor Operations

  • Comprehensive tensor abstraction supporting multiple backends
  • SciRS2 integration for SIMD-optimized operations
  • GPU support through multiple backends (CUDA, Metal, Vulkan, WebGPU)
  • Automatic differentiation framework (in progress)
  • Memory-efficient operations with zero-copy views

Layer Implementations

  • Core Layers: Linear, Embedding, LayerNorm, Dropout
  • Attention Mechanisms:
    • Multi-head attention with causal masking
    • FlashAttention & FlashAttention-2 for memory efficiency
    • Multi-Query Attention (MQA) and Grouped-Query Attention (GQA)
    • PagedAttention for KV cache management
    • Optimized SDPA kernels with adaptive strategies
  • Advanced Layers: FeedForward, PositionalEncoding, RMSNorm

Performance Optimizations

  • SIMD Operations: Optimized LayerNorm, Softmax, and RoPE implementations
  • Quantization Support: INT8, INT4, GPTQ, AWQ with calibration
  • Custom Kernels: Fused operations for reduced memory bandwidth
  • Memory Management: Efficient allocation strategies and pooling

Export and Interoperability

  • ONNX Export: Complete graph construction and runtime support
  • GGML/GGUF: Advanced quantization formats for edge deployment
  • CoreML: iOS deployment support
  • TensorRT: NVIDIA GPU optimization (framework ready)

Advanced Features

  • Evaluation Framework: GLUE, SuperGLUE, MMLU, HellaSwag, HumanEval benchmarks
  • Monitoring: TensorBoard integration, gradient flow analysis, activation statistics
  • Caching System: Multiple eviction policies (LRU, LFU, ARC)
  • A/B Testing: Infrastructure for model comparison
  • Model Compression: Pruning and distillation support

Distributed and Parallel Computing

  • Model Parallelism: Tensor and pipeline parallelism support
  • Data Parallelism: Multi-GPU training infrastructure
  • Communication Backends: NCCL, MPI, Gloo support
  • Process Groups: All-reduce, broadcast, and collective operations

PEFT (Parameter-Efficient Fine-Tuning)

  • LoRA: Low-rank adaptation with weight merging
  • QLoRA: Quantized LoRA for memory efficiency
  • Adapters: Bottleneck adapter layers
  • Prefix Tuning: Trainable prefix embeddings
  • Prompt Tuning: Virtual token optimization

Architecture

trustformers-core/
├── src/
│   ├── tensor/           # Tensor abstractions and operations
│   ├── layers/           # Neural network layers
│   ├── attention/        # Attention mechanisms
│   ├── optimization/     # Performance optimizations
│   ├── quantization/     # Quantization infrastructure
│   ├── export/           # Model export formats
│   ├── evaluation/       # Benchmark implementations
│   ├── monitoring/       # Profiling and analysis
│   ├── parallel/         # Distributed computing
│   └── peft/            # Parameter-efficient fine-tuning

Usage Example

use trustformers_core::{
    tensor::Tensor,
    layers::{Linear, Layer},
    attention::FlashAttention,
};

// Create tensors
let input = Tensor::randn(&[32, 512, 768])?;

// Create layers
let linear = Linear::new(768, 768, true)?;
let attention = FlashAttention::new(768, 12)?;

// Forward pass
let output = linear.forward(&input)?;
let attended = attention.forward(&output, None)?;

Performance

  • FlashAttention: O(N) memory complexity vs O(N²) standard
  • Quantization: 50-75% memory reduction with INT8/INT4
  • SIMD: 2-3x speedup on supported operations
  • PagedAttention: Eliminates KV cache fragmentation

Testing

The crate includes comprehensive test coverage:

  • Unit tests for all operations
  • Integration tests for complex scenarios
  • Property-based testing with proptest
  • Memory leak detection
  • Performance benchmarks

Dependencies

  • scirs2-core: SIMD operations and parallelism
  • ndarray: Tensor backend (being migrated to SciRS2)
  • half: FP16/BF16 support
  • rayon: Parallel iteration (via SciRS2)
  • Various serialization and utility crates

License

MIT OR Apache-2.0

Commit count: 0

cargo fmt