| Crates.io | tensorlogic-infer |
| lib.rs | tensorlogic-infer |
| version | 0.1.0-alpha.2 |
| created_at | 2025-11-07 22:24:39.115767+00 |
| updated_at | 2026-01-03 21:03:50.048355+00 |
| description | Execution and autodiff traits for TensorLogic inference engines |
| homepage | https://github.com/cool-japan/tensorlogic |
| repository | https://github.com/cool-japan/tensorlogic |
| max_upload_size | |
| id | 1922254 |
| size | 1,048,191 |
Engine-agnostic execution traits, optimization utilities, and planning API for TensorLogic.
tensorlogic-infer provides the abstract execution interface and comprehensive optimization infrastructure for TensorLogic backends. This crate defines traits that backends must implement, along with powerful utilities for optimization, scheduling, profiling, and memory management.
use tensorlogic_infer::{TlExecutor, TlAutodiff};
use tensorlogic_scirs_backend::Scirs2Exec;
use tensorlogic_ir::EinsumGraph;
// Create executor
let mut executor = Scirs2Exec::new();
// Forward pass
let outputs = executor.forward(&graph, &inputs)?;
// Backward pass
executor.backward(&outputs, &gradients)?;
let param_grads = executor.get_gradients()?;
Basic execution interface for forward passes:
pub trait TlExecutor {
type Tensor;
type Error;
fn execute(
&self,
graph: &EinsumGraph,
inputs: &HashMap<String, Self::Tensor>,
) -> Result<Vec<Self::Tensor>, Self::Error>;
}
Automatic differentiation support:
pub trait TlAutodiff: TlExecutor {
fn forward(
&mut self,
graph: &EinsumGraph,
inputs: &HashMap<String, Self::Tensor>,
) -> Result<Vec<Self::Tensor>, Self::Error>;
fn backward(
&mut self,
outputs: &[Self::Tensor],
output_grads: &[Self::Tensor],
) -> Result<(), Self::Error>;
fn get_gradients(&self) -> Result<HashMap<String, Self::Tensor>, Self::Error>;
}
Efficient batch execution with parallel support:
pub trait TlBatchExecutor: TlExecutor {
fn execute_batch(
&mut self,
graph: &EinsumGraph,
batch_inputs: Vec<HashMap<String, Self::Tensor>>,
) -> Result<BatchResult<Self::Tensor>, Self::Error>;
fn execute_batch_parallel(
&mut self,
graph: &EinsumGraph,
batch_inputs: Vec<HashMap<String, Self::Tensor>>,
num_threads: Option<usize>,
) -> Result<BatchResult<Self::Tensor>, Self::Error>;
fn optimal_batch_size(&self, graph: &EinsumGraph) -> usize;
}
Streaming execution for large datasets:
pub trait TlStreamingExecutor {
type Tensor;
type Error;
fn execute_stream(
&mut self,
graph: &EinsumGraph,
input_stream: Vec<Vec<Vec<Self::Tensor>>>,
config: &StreamingConfig,
) -> Result<Vec<StreamResult<Self::Tensor>>, Self::Error>;
fn execute_chunk(
&mut self,
graph: &EinsumGraph,
chunk_inputs: Vec<Self::Tensor>,
metadata: &ChunkMetadata,
) -> Result<StreamResult<Self::Tensor>, Self::Error>;
}
Streaming Modes:
use tensorlogic_infer::{StreamingMode, StreamingConfig};
// Fixed chunk size
let config = StreamingConfig::new(StreamingMode::FixedChunk(64))
.with_prefetch(2)
.with_checkpointing(100);
// Dynamic chunk sizing based on memory
let config = StreamingConfig::new(StreamingMode::DynamicChunk {
target_memory_mb: 512,
});
// Adaptive chunking based on performance
let config = StreamingConfig::new(StreamingMode::Adaptive {
initial_chunk: 32,
});
Query backend capabilities:
pub trait TlCapabilities {
fn capabilities(&self) -> BackendCapabilities;
}
// Example usage
let caps = executor.capabilities();
println!("Devices: {:?}", caps.devices);
println!("DTypes: {:?}", caps.dtypes);
println!("Features: {:?}", caps.features);
Execution profiling and performance analysis:
pub trait TlProfiledExecutor: TlExecutor {
fn enable_profiling(&mut self);
fn disable_profiling(&mut self);
fn get_profile_data(&self) -> ProfileData;
}
// Example usage
executor.enable_profiling();
executor.execute(&graph, &inputs)?;
let profile = executor.get_profile_data();
for (op_name, stats) in &profile.op_profiles {
println!("{}: avg={}ms, count={}",
op_name, stats.avg_time_ms, stats.count);
}
Just-In-Time compilation with hot path detection and adaptive optimization:
pub trait TlJitExecutor: TlExecutor {
fn execute_jit(
&mut self,
graph: &EinsumGraph,
inputs: &HashMap<String, Self::Tensor>,
config: &JitConfig,
) -> Result<Vec<Self::Tensor>, Self::Error>;
fn get_jit_stats(&self) -> JitStats;
fn clear_jit_cache(&mut self);
}
// Example usage
use tensorlogic_infer::{TlJitExecutor, JitConfig};
let config = JitConfig::default()
.with_hot_path_threshold(10)
.with_max_cache_size(100);
let outputs = executor.execute_jit(&graph, &inputs, &config)?;
let stats = executor.get_jit_stats();
println!("Hot paths detected: {}", stats.hot_paths_detected);
println!("Cache hit rate: {:.2}%", stats.cache_hit_rate * 100.0);
JIT Features:
Multi-device distributed execution with data/model/pipeline parallelism:
pub trait TlDistributedExecutor {
type Tensor;
type Error;
fn execute_distributed(
&mut self,
graph: &EinsumGraph,
inputs: &HashMap<String, Self::Tensor>,
config: &DistributedConfig,
) -> Result<Vec<Self::Tensor>, Self::Error>;
fn get_distributed_stats(&self) -> DistributedStats;
}
// Example usage - Data Parallelism
use tensorlogic_infer::{
DistributedConfig, DistributedParallelismStrategy, Device
};
let devices = vec![Device::GPU(0), Device::GPU(1), Device::GPU(2), Device::GPU(3)];
let config = DistributedConfig::new(devices)
.with_strategy(DistributedParallelismStrategy::DataParallel {
num_replicas: 4,
});
let outputs = executor.execute_distributed(&graph, &inputs, &config)?;
let stats = executor.get_distributed_stats();
println!("Communication time: {}ms", stats.communication_time_ms);
println!("Computation time: {}ms", stats.computation_time_ms);
println!("Efficiency: {:.2}%", stats.efficiency * 100.0);
Distributed Parallelism Strategies:
Data Parallelism: Replicate model across devices, split data
DistributedParallelismStrategy::DataParallel {
num_replicas: 4, // 4 GPUs
}
Model Parallelism: Split model across devices
DistributedParallelismStrategy::ModelParallel {
sharding_spec: ShardingSpec::new()
.shard_tensor("weights", 0, 4), // Shard along dimension 0
}
Pipeline Parallelism: Split model into stages
DistributedParallelismStrategy::PipelineParallel {
num_stages: 4,
micro_batch_size: 32,
}
Hybrid Parallelism: Combine multiple strategies
DistributedParallelismStrategy::Hybrid {
data_parallel_groups: 2,
model_parallel_size: 2,
pipeline_stages: 2,
}
Execution with error recovery, checkpointing, and fault tolerance:
pub trait TlRecoverableExecutor: TlExecutor {
fn execute_with_recovery(
&mut self,
graph: &EinsumGraph,
inputs: &HashMap<String, Self::Tensor>,
config: &RecoveryConfig,
) -> RecoveryResult<Vec<Self::Tensor>, Self::Error>;
fn save_checkpoint(&mut self, path: &str) -> Result<(), Self::Error>;
fn load_checkpoint(&mut self, path: &str) -> Result<(), Self::Error>;
}
// Example usage
use tensorlogic_infer::{RecoveryConfig, RecoveryStrategy, RetryPolicy};
let config = RecoveryConfig::default()
.with_strategy(RecoveryStrategy::RetryWithBackoff)
.with_retry_policy(RetryPolicy::exponential(3, 100))
.with_checkpointing(true);
match executor.execute_with_recovery(&graph, &inputs, &config)? {
RecoveryResult::Success { result, stats } => {
println!("Success after {} retries", stats.retries);
}
RecoveryResult::PartialSuccess { result, failed_nodes, stats } => {
println!("Partial success: {} nodes failed", failed_nodes.len());
}
RecoveryResult::Failure { error, stats } => {
println!("Failed after {} retries", stats.retries);
}
}
Recovery Strategies:
Efficient memory-safe tensor views and slicing without data duplication:
use tensorlogic_infer::{TensorView, SliceSpec, ViewBuilder, TensorViewable};
// Create a tensor view
let view = TensorView::new(base_tensor_id, vec![
SliceSpec::Range(10..50),
SliceSpec::Full,
]);
// Check properties
println!("Is contiguous: {}", view.is_contiguous());
println!("Rank: {}", view.rank());
// Ergonomic view builder
let view = ViewBuilder::new(tensor_id, 3)
.range_dim(0, 10, 20) // Slice dimension 0
.index_dim(1, 5) // Index dimension 1
.with_offset(100)
.build();
// Compose views (create view of a view)
let composed = view1.compose(&view2)?;
// Slice specifications
let specs = vec![
SliceSpec::Full, // Full dimension
SliceSpec::Range(0..100), // Range slice
SliceSpec::Index(42), // Single index
SliceSpec::Strided { start: 0, end: 100, stride: 2 }, // Every 2nd element
SliceSpec::Reverse, // Reverse order
];
Key Features:
Use Cases:
Non-blocking execution with async/await support (feature-gated):
use tensorlogic_infer::{
TlAsyncExecutor, TlAsyncBatchExecutor,
AsyncExecutorPool, AsyncConfig
};
// Enable async feature in Cargo.toml
// [dependencies]
// tensorlogic-infer = { version = "*", features = ["async"] }
// Async execution
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut executor = MyAsyncExecutor::new();
let outputs = executor.execute_async(&graph, &inputs).await?;
println!("Got {} outputs", outputs.len());
Ok(())
}
// Async batch processing
let batch_outputs = executor.execute_batch_async(&graph, batch_inputs).await?;
// Async streaming with backpressure
let config = AsyncConfig::default()
.with_max_concurrent(4)
.with_backpressure_threshold(100);
let stream_results = executor
.execute_stream_async(&graph, input_stream, &config)
.await?;
// Load-balanced executor pool
let pool = AsyncExecutorPool::new(vec![
executor1,
executor2,
executor3,
executor4,
]);
// Pool automatically distributes work
let output = pool.execute(&graph, &inputs).await?;
// Cancellable execution
let handle = executor.execute_async(&graph, &inputs);
// ... later ...
handle.cancel();
let stats = pool.stats();
println!("Total executions: {}", stats.total_executions);
println!("Average queue time: {}ms", stats.avg_queue_time_ms);
Key Features:
Use Cases:
Rich error messages with helpful suggestions and context:
use tensorlogic_infer::{
Diagnostic, DiagnosticCollector, Severity,
ShapeMismatchDiagnostic, MemoryDiagnostic,
PerformanceDiagnostic, SourceLocation,
};
// Create diagnostic with context
let diag = Diagnostic::error("Tensor operation failed")
.with_code("E001")
.with_context("Expected shape [64, 128], got [64, 256]")
.with_suggestion("Use tensor.reshape([64, 128]) to match expected shape")
.with_suggestion("Check input tensor dimensions")
.with_location(
SourceLocation::new()
.with_file("model.rs".to_string())
.with_line(42)
);
println!("{}", diag.format());
// Shape mismatch diagnostics
let expected = TensorShape::static_shape(vec![64, 128]);
let actual = TensorShape::static_shape(vec![64, 256]);
let diag = ShapeMismatchDiagnostic::create(&expected, &actual, "matmul");
// Memory diagnostics
let diag = MemoryDiagnostic::out_of_memory(
1024 * 1024 * 1024, // 1 GB requested
512 * 1024 * 1024 // 512 MB available
);
println!("{}", diag); // Includes helpful suggestions
// Performance diagnostics
let diag = PerformanceDiagnostic::slow_operation(
"einsum",
150.0, // actual: 150ms
50.0 // expected: 50ms
);
// Diagnostic collector
let mut collector = DiagnosticCollector::new();
collector.add(diag1);
collector.add(diag2);
collector.add(diag3);
if collector.has_errors() {
println!("{}", collector.format_all());
println!("Errors: {}, Warnings: {}",
collector.error_count(),
collector.warning_count()
);
}
Example Output:
[ERROR] Shape mismatch in matmul operation
at model.rs:42
code: E001
Context:
Expected shape: [64, 128], but got: [64, 256]
Dimension 1 mismatch: expected Static(128), got Static(256)
Suggestions:
1. Check your input tensor shapes match the expected dimensions
2. Use tensor.reshape([64, 128]) to match the expected shape
Summary: 1 error(s), 0 warning(s)
Diagnostic Types:
Severity Levels:
Ahead-of-time graph compilation with multiple optimization levels:
pub trait TlCompilableExecutor: TlExecutor {
fn compile_graph(
&mut self,
graph: &EinsumGraph,
config: &CompilationConfig,
) -> Result<CompiledGraph, Self::Error>;
fn execute_compiled(
&mut self,
compiled: &CompiledGraph,
inputs: &HashMap<String, Self::Tensor>,
) -> Result<Vec<Self::Tensor>, Self::Error>;
}
// Example usage
use tensorlogic_infer::{
TlCompilableExecutor, CompilationConfig, OptimizationLevel, GraphCompiler
};
let config = CompilationConfig::default()
.with_optimization_level(OptimizationLevel::Aggressive)
.with_fusion_enabled(true)
.with_constant_folding(true);
// Compile once
let compiled = executor.compile_graph(&graph, &config)?;
// Execute multiple times with different inputs
let outputs1 = executor.execute_compiled(&compiled, &inputs1)?;
let outputs2 = executor.execute_compiled(&compiled, &inputs2)?;
let outputs3 = executor.execute_compiled(&compiled, &inputs3)?;
// Check compilation statistics
let stats = compiled.compilation_stats();
println!("Nodes before: {}", stats.nodes_before_optimization);
println!("Nodes after: {}", stats.nodes_after_optimization);
println!("Reduction: {:.2}%", stats.reduction_percentage);
Optimization Levels:
Compilation Cache:
use tensorlogic_infer::{CompilationCache, CompilationKey};
let mut cache = CompilationCache::new(100); // Cache up to 100 graphs
// Automatic caching
let key = CompilationKey::from_graph(&graph, &config);
if let Some(compiled) = cache.get(&key) {
println!("Cache hit!");
} else {
let compiled = executor.compile_graph(&graph, &config)?;
cache.insert(key, compiled);
}
let stats = cache.stats();
println!("Hit rate: {:.2}%", stats.hit_rate * 100.0);
Analyze and optimize computation graphs:
use tensorlogic_infer::{GraphOptimizer, OptimizationResult};
let optimizer = GraphOptimizer::new();
let result: OptimizationResult = optimizer.analyze(&graph);
println!("Fusion opportunities: {}", result.fusion_opportunities.len());
println!("Dead nodes: {}", result.dead_nodes.len());
println!("Estimated speedup: {:.2}x", result.estimated_speedup);
Plan operation fusion:
use tensorlogic_infer::{FusionPlanner, FusionType};
let planner = FusionPlanner::new();
let opportunities = planner.find_fusion_opportunities(&graph);
for opp in &opportunities {
match opp.fusion_type {
FusionType::ElementWise => println!("Can fuse element-wise ops"),
FusionType::Reduction => println!("Can fuse reduction ops"),
FusionType::Einsum => println!("Can merge einsum operations"),
}
}
Execution scheduling with multiple strategies:
use tensorlogic_infer::{Scheduler, SchedulingStrategy};
let scheduler = Scheduler::new(SchedulingStrategy::CostBased {
cost_threshold: 1000,
});
let schedule = scheduler.schedule(&graph)?;
println!("Execution order: {:?}", schedule.node_order);
println!("Parallel groups: {:?}", schedule.parallel_groups);
Scheduling Strategies:
Sequential: Simple topological orderParallel: Maximize parallelism across independent nodesCostBased: Balance parallelism with execution costMulti-device placement optimization:
use tensorlogic_infer::{PlacementOptimizer, PlacementStrategy, Device};
let devices = vec![Device::CPU(0), Device::GPU(0)];
let optimizer = PlacementOptimizer::new(devices, PlacementStrategy::LoadBalance);
let plan = optimizer.optimize(&graph)?;
for (node_id, device) in &plan.node_placements {
println!("Node {} -> {:?}", node_id, device);
}
TensorCache: Cache computation results
use tensorlogic_infer::{TensorCache, EvictionPolicy};
let mut cache = TensorCache::new(EvictionPolicy::LRU, 1000); // 1000 MB limit
// Cache usage is automatic when integrated with executor
cache.insert(key, tensor);
if let Some(tensor) = cache.get(&key) {
// Cache hit
}
MemoryPool: Reuse tensor allocations
use tensorlogic_infer::MemoryPool;
let mut pool = MemoryPool::new();
// Allocate or reuse
let tensor = pool.allocate(shape)?;
// Return to pool
pool.deallocate(tensor);
// Stats
let stats = pool.stats();
println!("Reuse rate: {:.2}%", stats.reuse_rate * 100.0);
Configure complete execution strategy:
use tensorlogic_infer::{
ExecutionStrategy, ExecutionMode, PrecisionMode,
MemoryStrategy, ParallelismStrategy, GradientStrategy,
};
let strategy = ExecutionStrategy {
mode: ExecutionMode::Graph, // Graph, Eager, or JIT
precision: PrecisionMode::FP32,
memory: MemoryStrategy::Optimize,
parallelism: ParallelismStrategy::Auto,
gradient: GradientStrategy::Eager,
};
let optimizer = StrategyOptimizer::new();
let optimized = optimizer.optimize_for_throughput(&graph, &strategy);
Manage execution state with lifecycle hooks:
use tensorlogic_infer::{ExecutionContext, LoggingHook, ExecutionPhase};
let mut context = ExecutionContext::new();
context.add_hook(Box::new(LoggingHook::new()));
context.notify(ExecutionPhase::GraphLoad);
context.notify(ExecutionPhase::Execution);
context.notify(ExecutionPhase::Complete);
Validate computation graphs:
use tensorlogic_infer::GraphValidator;
let validator = GraphValidator::new();
let result = validator.validate(&graph);
if !result.is_valid() {
for error in &result.errors {
println!("Error: {}", error);
}
}
Estimate memory usage:
use tensorlogic_infer::MemoryEstimator;
let estimator = MemoryEstimator::new();
let estimate = estimator.estimate(&graph);
println!("Peak memory: {} MB", estimate.peak_memory_mb);
println!("Tensor lifetimes: {:?}", estimate.lifetimes);
Infer tensor shapes:
use tensorlogic_infer::ShapeInferenceContext;
let mut ctx = ShapeInferenceContext::new();
ctx.set_input_shape("x", vec![64, 10]);
let inferred = ctx.infer_shapes(&graph)?;
for (tensor_id, shape) in &inferred {
println!("{}: {:?}", tensor_id, shape);
}
Record and analyze execution flow:
use tensorlogic_infer::debug::ExecutionTracer;
let mut tracer = ExecutionTracer::new();
tracer.enable();
tracer.start_trace(Some(graph_id));
// Execute operations...
let handle = tracer.record_operation_start(node_id, "einsum", input_ids);
// ... operation execution ...
tracer.record_operation_end(handle, node_id, "einsum", input_ids, output_ids, metadata);
// Get trace
let trace = tracer.get_trace();
let summary = trace.summary();
println!("Total operations: {}", summary.total_operations);
println!("Total time: {:.2}ms", summary.total_time_ms);
// Find slowest operations
let slowest = trace.slowest_operations(5);
for entry in slowest {
println!("Node {}: {:.2}ms", entry.node_id, entry.duration_ms());
}
Examine intermediate tensor values:
use tensorlogic_infer::debug::{TensorInspector, TensorStats};
let mut inspector = TensorInspector::new();
inspector.enable();
inspector.watch(tensor_id); // Watch specific tensor
// Record statistics
let stats = TensorStats::new(tensor_id, vec![64, 128], "f64")
.with_statistics(min, max, mean, std_dev, num_nans, num_infs);
inspector.record_stats(stats);
// Check for numerical issues
let problematic = inspector.find_problematic_tensors();
for tensor in problematic {
println!("Tensor {} has {} NaNs, {} Infs",
tensor.tensor_id,
tensor.num_nans.unwrap_or(0),
tensor.num_infs.unwrap_or(0)
);
}
Pause execution for debugging:
use tensorlogic_infer::debug::{BreakpointManager, Breakpoint};
let mut breakpoints = BreakpointManager::new();
breakpoints.enable();
// Add various breakpoint types
breakpoints.add_node_breakpoint(node_id);
breakpoints.add_operation_breakpoint("matmul");
breakpoints.add_numerical_issue_breakpoint();
breakpoints.add_time_threshold_breakpoint(5000); // 5ms
// Check during execution
if let Some(hit) = breakpoints.should_break(node_id, op_name, elapsed_us, has_nan) {
println!("Breakpoint hit at node {}", hit.node_id);
// Inspect state, then continue
breakpoints.continue_execution();
}
Full execution recording for replay:
use tensorlogic_infer::debug::ExecutionRecorder;
let mut recorder = ExecutionRecorder::new();
recorder.enable();
// All debugging features enabled
recorder.tracer().start_trace(Some(graph_id));
recorder.inspector().watch(tensor_id);
recorder.breakpoints().add_node_breakpoint(5);
// Generate comprehensive report
let report = recorder.generate_report();
println!("{}", report);
Create detailed execution timelines:
use tensorlogic_infer::{TimelineProfiler, ProfilerHook};
let mut profiler = TimelineProfiler::new();
let hook = ProfilerHook::new(&mut profiler);
// Attach to context
context.add_hook(Box::new(hook));
// Execute
executor.execute(&graph, &inputs)?;
// Analyze timeline
let entries = profiler.entries();
for entry in entries {
println!("{}: {}ms", entry.name, entry.duration_ms);
}
Identify performance bottlenecks:
use tensorlogic_infer::BottleneckAnalyzer;
let analyzer = BottleneckAnalyzer::new();
let report = analyzer.analyze(&profile_data);
println!("Bottlenecks:");
for bottleneck in &report.bottlenecks {
println!(" {}: {:.2}% of total time",
bottleneck.operation,
bottleneck.percentage);
}
println!("\nRecommendations:");
for rec in &report.recommendations {
println!(" - {}", rec);
}
Compare execution strategies:
use tensorlogic_infer::PerformanceComparison;
let baseline = PerformanceBaseline::from_profile(&profile1);
let comparison = PerformanceComparison::new(baseline, &profile2);
println!("Speedup: {:.2}x", comparison.speedup);
println!("Memory reduction: {:.2}%", comparison.memory_reduction_pct);
Minimal executor for testing:
use tensorlogic_infer::DummyExecutor;
let executor = DummyExecutor::new();
let outputs = executor.execute(&graph, &inputs)?;
// Returns empty outputs for testing
use tensorlogic_infer::TlExecutor;
use tensorlogic_scirs_backend::Scirs2Exec;
use std::collections::HashMap;
let executor = Scirs2Exec::new();
let mut inputs = HashMap::new();
inputs.insert("x".to_string(), tensor_x);
let outputs = executor.execute(&graph, &inputs)?;
use tensorlogic_infer::TlBatchExecutor;
let batch_inputs = vec![inputs1, inputs2, inputs3];
let result = executor.execute_batch_parallel(&graph, batch_inputs, Some(4))?;
println!("Processed {} items", result.len());
println!("Batch time: {}ms", result.total_time_ms);
use tensorlogic_infer::{TlStreamingExecutor, StreamingConfig, StreamingMode};
let config = StreamingConfig::new(StreamingMode::Adaptive {
initial_chunk: 64,
}).with_prefetch(2);
let results = executor.execute_stream(&graph, input_stream, &config)?;
for result in results {
println!("Chunk {}: {} items in {}ms",
result.metadata.chunk_id,
result.metadata.size,
result.processing_time_ms);
}
use tensorlogic_infer::TlAutodiff;
// Forward pass
let outputs = executor.forward(&graph, &inputs)?;
// Compute loss gradients
let loss_grads = compute_loss_gradients(&outputs, &targets);
// Backward pass
executor.backward(&outputs, &loss_grads)?;
// Get parameter gradients
let grads = executor.get_gradients()?;
// Update parameters
for (param_name, grad) in grads {
update_parameter(¶m_name, &grad);
}
tensorlogic-infer
├── Core Traits
│ ├── TlExecutor (basic execution)
│ ├── TlAutodiff (training with gradients)
│ ├── TlEagerAutodiff (eager mode autodiff) 🆕
│ ├── TlAsyncExecutor (async/await execution) 🆕 Alpha.2
│ ├── TlAsyncBatchExecutor (async batching) 🆕 Alpha.2
│ ├── TlAsyncStreamExecutor (async streaming) 🆕 Alpha.2
│ ├── TlBatchExecutor (batch processing)
│ ├── TlStreamingExecutor (streaming for large datasets)
│ ├── TlCompilableExecutor (AOT graph compilation)
│ ├── TlJitExecutor (JIT compilation) 🆕
│ ├── TlDistributedExecutor (multi-device) 🆕
│ ├── TlRecoverableExecutor (error recovery) 🆕
│ ├── TlCapabilities (backend queries)
│ └── TlProfiledExecutor (profiling & analysis)
├── Compilation & Optimization
│ ├── GraphCompiler (AOT compilation)
│ ├── CompilationCache (compiled graph caching)
│ ├── JitCompiler (runtime compilation) 🆕
│ ├── JitCache (JIT-specific caching) 🆕
│ ├── HotPathDetector (hot path identification) 🆕
│ ├── AdaptiveOptimizer (adaptive optimization) 🆕
│ ├── GraphOptimizer (fusion, DCE, redundancy)
│ ├── FusionPlanner (operation fusion)
│ ├── Scheduler (execution ordering)
│ └── PlacementOptimizer (device placement)
├── Distributed Execution 🆕
│ ├── DistributedExecutor (multi-device coordinator)
│ ├── DataParallelCoordinator (data parallelism)
│ ├── ModelParallelCoordinator (model parallelism)
│ ├── PipelineParallelCoordinator (pipeline parallelism)
│ └── CommunicationBackend (device communication)
├── Runtime & Memory
│ ├── TensorCache (result caching)
│ ├── MemoryPool (allocation pooling)
│ ├── TensorView (zero-copy views) 🆕 Alpha.2
│ ├── ViewBuilder (ergonomic view API) 🆕 Alpha.2
│ ├── ExecutionStrategy (strategy config)
│ ├── ExecutionContext (state management)
│ ├── AsyncExecutorPool (async load balancing) 🆕 Alpha.2
│ ├── CheckpointManager (checkpointing) 🆕
│ └── StreamProcessor (streaming processing)
├── Analysis & Validation
│ ├── GraphValidator (graph validation)
│ ├── MemoryEstimator (memory estimation)
│ ├── ShapeInferenceContext (shape inference)
│ └── BottleneckAnalyzer (performance analysis)
├── Debugging & Profiling 🆕
│ ├── ExecutionTracer (execution recording)
│ ├── TensorInspector (tensor inspection)
│ ├── BreakpointManager (execution breakpoints)
│ ├── ExecutionRecorder (full history recording)
│ ├── TimelineProfiler (timeline visualization)
│ └── Visualization (DOT, JSON, GraphML export)
├── Enhanced Diagnostics 🆕 Alpha.2
│ ├── Diagnostic (rich error messages)
│ ├── DiagnosticCollector (error aggregation)
│ ├── ShapeMismatchDiagnostic (shape errors)
│ ├── MemoryDiagnostic (memory issues)
│ ├── PerformanceDiagnostic (performance warnings)
│ └── SourceLocation (error tracking)
└── Testing Support 🆕
├── DummyExecutor (test executor)
├── BackendTestAdapter (backend test templates)
├── GradientChecker (numerical gradient checking)
└── PerfRegression (performance regression testing)
tensorlogic-scirs-backend: Reference implementation using SciRS2
use tensorlogic_scirs_backend::Scirs2Exec;
let executor = Scirs2Exec::new();
tensorlogic-train: Training infrastructure
use tensorlogic_train::{Trainer, TrainerConfig};
let trainer = Trainer::new(executor, config);
tensorlogic-compiler: Compile TLExpr to EinsumGraph
use tensorlogic_compiler::compile;
let graph = compile(&expr, &context)?;
let outputs = executor.execute(&graph, &inputs)?;
cargo bench -p tensorlogic-infer
# Run all tests
cargo test -p tensorlogic-infer
# Run with output
cargo test -p tensorlogic-infer -- --nocapture
# Run specific test
cargo test -p tensorlogic-infer test_streaming
Test Coverage: 368 tests covering all traits and utilities (100% passing)
The following production-grade modules have been added in Alpha.2:
quantization.rs)Complete quantization pipeline for model compression:
dynamic_batching.rs)Adaptive request batching for inference serving:
fusion.rs)Pattern-based fusion optimization:
workspace.rs)Memory pool for efficient allocation reuse:
multimodel.rs)Ensemble and multi-model management:
See CONTRIBUTING.md for guidelines.
Apache-2.0
Status: 🎉 Production Ready (v0.1.0-alpha.2) Last Updated: 2025-12-10 Tests: 368 passing (100%) Code: 46 files, 19,921 lines Completeness: 100% Alpha.1 Features: JIT Compilation, Distributed Execution, Comprehensive Debugging Tools Alpha.2 Features: Zero-Copy Tensor Views, Async Execution, Enhanced Diagnostics, Advanced Quantization, Dynamic Batching, Kernel Fusion, Workspace Management, Multi-Model Coordination 🆕 Part of: TensorLogic Ecosystem