| Crates.io | kryst |
| lib.rs | kryst |
| version | 3.2.0 |
| created_at | 2025-06-16 06:30:37.58013+00 |
| updated_at | 2026-01-11 00:55:27.656735+00 |
| description | Krylov subspace and preconditioned iterative solvers for dense and sparse linear systems, with shared and distributed memory parallelism. |
| homepage | https://github.com/tmathis720/kryst |
| repository | https://github.com/tmathis720/kryst |
| max_upload_size | |
| id | 1713936 |
| size | 61,074,838 |
High-performance Krylov subspace and preconditioned iterative solvers for dense and sparse linear systems, with advanced preconditioning strategies and automated parameter optimization.
pc_chain optionamg_nu_pre, amg_nu_post)DistCsrOp, Block Jacobi on DistCsrOp, and SuperLU_DIST (when the superlu_dist feature is enabled).IterationMonitorParameterTuner and grid searchenable_csv_logging()f64.--features complex): Internals promote Kryst's scalar alias S to num_complex::Complex64 while the Matrix Market tooling converts boundary data to and from complex storage.S is the internal scalar alias and R is its real partner. In real builds
S = R = f64. In complex builds S = Complex64 and R = f64.
| Feature | Enables | Notes |
|---|---|---|
mpi |
MPI communication backend | Requires MPI installed; examples run via mpirun |
complex |
Complex scalar S |
Classical and pipelined GMRES/FGMRES variants are supported |
backend-faer |
Dense/CSR backends and most PCs | Default feature |
| backend flags | Direct solvers / matrix backends | e.g. superlu_dist (where available) |
mpi — enable distributed-memory execution via the mpi crate. Optional and independent from Rayon.rayon — turn on shared-memory parallel kernels. Combine with -ksp_threads to size the worker pool.complex — lift internal kernels to Complex64 while keeping the public API monomorphic on f64 inputs.logging — route internal tracing to the log facade for integration with env_logger or similar backends.backend-faer + rayon + mpi — supported for distributed runs with parallel local kernels; see
docs/matrix_features.md for the expected feature combinations and matrix capabilities.The Krylov drivers expose command-line options to balance global reductions
against additional local work. The most common flags mirror PETSc's -ksp_*
options and can be combined with the deterministic reduction feature for
reproducible CI runs.
| Flag | Default | Effect |
|---|---|---|
| `-ksp_cg_variant classic | pipelined` | classic |
-ksp_reproducible |
false |
Enable deterministic reductions (rank-ordered MPI sums and fixed-order local kernels). |
-ksp_threads <N> |
unset | Request N Rayon workers (requires --features rayon). Ignored in builds without Rayon. |
| `-ksp_gmres_variant classical | pipelined | sstep[:s]` |
-ksp_residual_replacement <iters> |
50 |
Force periodic residual recomputation in pipelined CG to control drift (0 disables). |
-ksp_trust_region <radius> |
unset | Enable CG trust-region safeguarding with the provided radius. |
| `-ksp_reorthog never | ifneeded | always` |
Recommended settings for local kernels:
-ksp_threads <N> selects the Rayon worker count used by Kryst kernels (shared-memory only).KRYST_PAR_CUTOFF=<rows> controls the minimum CSR row count before parallel SpMV is used
(default 4096); raise it if you see parallel overhead on small problems.Legacy -ksp_cg_pipelined remains available as an alias for
-ksp_cg_variant pipelined. For bit-for-bit reproducibility, combine
-ksp_reproducible with -ksp_threads 1. When Rayon is enabled with more
than one worker, runs remain deterministic for a fixed thread count but may
differ across thread-count configurations.
When -ksp_reproducible is enabled the solver switches to rank-ordered MPI
reductions and fixed-order local kernels. This guarantees bit-for-bit equality
between runs that use the same communicator size and Rayon thread count. For
strict reproducibility we recommend pinning Rayon to a single thread via
-ksp_threads 1 (or the RAYON_NUM_THREADS environment variable); otherwise,
results remain deterministic for the configured thread count but may differ
between thread-count configurations.
Use this configuration when validating deterministic reductions:
RAYON_NUM_THREADS=1 mpirun -n 4 cargo run --example mpi_parallel_demo --features "mpi rayon" -- \
-ksp_reproducible -ksp_threads 1
Each solver also records the number of global reductions performed in
SolveStats::counters.num_global_reductions, making it easy to assert expected
latency costs in automated tests.
Use these rules of thumb when combining MPI ranks with Rayon threads:
(MPI ranks) × (threads per rank) matches physical cores. Start with
-ksp_threads 2-4 per rank and adjust based on local cache behavior and
kernel mix (SpMV vs. ILU/ASM work).-ksp_reproducible enabled and
fix the thread count per rank (-ksp_threads 1 or RAYON_NUM_THREADS=1).
Results remain deterministic for a fixed communicator size and thread count.Example hybrid runs:
# Throughput-oriented: 4 ranks × 4 threads (16 cores total)
RAYON_NUM_THREADS=4 mpirun -n 4 cargo run --example mpi_parallel_demo --features "mpi rayon" -- \
-ksp_threads 4
# Reproducible: 4 ranks × 1 thread
RAYON_NUM_THREADS=1 mpirun -n 4 cargo run --example mpi_parallel_demo --features "mpi rayon" -- \
-ksp_reproducible -ksp_threads 1
For performance studies across MPI-only, Rayon-only, and hybrid builds, run the
mpi_rayon_suite benchmark via cargo bench (see scripts/bench_mpi_rayon.sh)
to compare ILU and ASM preconditioner workloads on small/medium/large matrices.
row_ptr/col_idx/values access and sparse
kernels (spgemm, CSR Galerkin triple product)Add to your Cargo.toml:
[dependencies]
kryst = "1.0"
[features]
default = [] # Opt in to exactly the features you need
rayon = ["dep:rayon", "dep:num_cpus"]
mpi = ["dep:mpi"]
logging = ["dep:log"]
complex = ["dep:num-complex"]
simd = [] # Auto-tuned std::simd sparse mat-vec kernels
x86_intrinsics = [] # Optional x86_64 gather/prefetch micro-tuning
Enabling the simd feature activates the runtime SpMV planner, which selects
between the scalar CSR baseline, a gather-based SIMD kernel, and a SELL-C-σ
kernel. Plans are built once per matrix (e.g., during AMG setup) and cached for
deterministic, allocation-free application time.
use kryst::prelude::*;
use kryst::matrix::op::DenseOp;
use faer::Mat;
use std::sync::Arc;
// Create a 100×100 test system
let n = 100;
let mat = Mat::<f64>::from_fn(n, n, |i, j| {
if i == j { 4.0 } else if (i as i32 - j as i32).abs() == 1 { -1.0 } else { 0.0 }
});
let a = Arc::new(DenseOp::<f64>::new(Arc::new(mat)));
let rhs = vec![1.0; n];
let mut solution = vec![0.0; n];
// Configure solver and preconditioner
let mut ksp = KspContext::new();
ksp.set_type(SolverType::Gmres)?
.set_pc_type(PcType::Jacobi, None)?
.set_operators(a.clone(), None);
ksp.rtol = 1e-8;
ksp.maxits = 1000;
// Setup once then solve
ksp.setup()?;
let stats = ksp.solve(&rhs, &mut solution)?;
println!(
"Converged in {} iterations with residual {:.2e}",
stats.iterations,
stats.final_residual
);
Reuse factorization and workspace across multiple solves by calling setup() once:
let mut ksp = KspContext::new();
ksp.set_type(SolverType::Cg)?
.set_pc_type(PcType::Jacobi, None)?
.set_operators(a.clone(), None);
ksp.setup()?; // perform factorization and allocate workspace
for rhs in rhs_set.iter() {
let mut x = vec![0.0; n];
ksp.solve(rhs, &mut x)?;
}
use kryst::context::ksp_context::KspContext;
use kryst::config::options::{KspOptions, PcOptions};
let mut ksp_opts = KspOptions::default();
ksp_opts.ksp_type = Some("cg".into());
let mut pc_opts = PcOptions::default();
pc_opts.pc_chain = Some("jacobi,chebyshev".into());
pc_opts.chebyshev_degree = Some(5);
let mut ksp = KspContext::new();
ksp.set_from_options(&ksp_opts, &pc_opts)?
.set_operators(a.clone(), None);
ksp.setup()?;
let stats = ksp.solve(&rhs, &mut solution)?;
use kryst::context::ksp_context::{KspContext, SolverType};
use kryst::context::pc_context::PcType;
use kryst::config::options::PcOptions;
let mut pc_opts = PcOptions::default();
pc_opts.amg_levels = Some(4);
pc_opts.amg_strength_threshold = Some(0.25);
pc_opts.amg_nu_pre = Some(2); // Pre-smoothing steps
pc_opts.amg_nu_post = Some(1); // Post-smoothing steps
let mut ksp = KspContext::new();
ksp.set_type(SolverType::Gmres)?
.set_pc_type(PcType::Amg, Some(&pc_opts))?
.set_operators(a.clone(), None);
ksp.setup()?;
let stats = ksp.solve(&rhs, &mut solution)?;
use kryst::{IterationMonitor, ParameterTuner};
use std::time::Duration;
// Monitor convergence behavior
let mut monitor = IterationMonitor::new();
// In practice, integrate monitor with solver iteration callbacks
// Automated parameter tuning
let mut tuner = ParameterTuner::new();
tuner.set_solver_types(vec![SolverType::Cg, SolverType::Gmres]);
tuner.set_pc_types(vec![PcType::Jacobi, PcType::Chebyshev, PcType::Amg]);
tuner.set_tolerances(vec![1e-6, 1e-8]);
tuner.set_max_config_time(Duration::from_secs(30));
let (best_config, all_results) = tuner.tune_parameters(&matrix, &rhs, 5).unwrap();
println!("Best configuration: {:?}", best_config);
use kryst::config::options::{parse_all_options, KspOptions, PcOptions};
use kryst::context::ksp_context::KspContext;
// Parse command-line options
let args: Vec<String> = std::env::args().collect();
let (ksp_opts, pc_opts) = parse_all_options(&args)?;
// Configure from options
let mut ksp = KspContext::new();
ksp.set_from_all_options(&ksp_opts, &pc_opts)?
.set_operators(a.clone(), None);
ksp.setup()?;
let stats = ksp.solve(&rhs, &mut solution)?;
Run your program with PETSc-style options:
# Basic solver configuration
./my_program -ksp_type gmres -ksp_rtol 1e-8 -pc_type jacobi
# Direct solvers
./my_program -ksp_type preonly -pc_type lu # Direct LU solver
./my_program -ksp_type preonly -pc_type qr # Direct QR solver
# Advanced preconditioning
./my_program -ksp_type cg -pc_type amg -amg_nu_pre 2 -amg_nu_post 1
./my_program -ksp_type gmres -pc_chain "jacobi,chebyshev" -chebyshev_degree 5
# Show all available options
./my_program -help
-ksp_type <solver> - Solver type: cg, pcg, gmres, fgmres, bicgstab, cgs, qmr, tfqmr, minres, cgnr, preonly-ksp_rtol <float> - Relative convergence tolerance (default: 1e-5)-ksp_atol <float> - Absolute convergence tolerance (default: 1e-50)-ksp_dtol <float> - Divergence tolerance (default: 1e5)-ksp_max_it <int> - Maximum number of iterations (default: 10000)-ksp_gmres_restart <int> - GMRES restart parameter (default: 50)-ksp_pc_side <side> - Preconditioning side: left, right, symmetric-ksp_reproducible - Enable deterministic reductions; forces rank-ordered MPI sums and stable intra-rank chunking.-pc_type <pc> - Preconditioner type: jacobi, blockjacobi, sor, none-pc_type <pc> - ILU variants: ilu0, ilu, ilut, ilutp, ilup-pc_ilu_levels <int> - ILU fill levels (default: 0)-pc_ilut_drop_tol <float> - ILUT drop tolerance (default: 1e-3)-pc_ilut_max_fill <int> - ILUT maximum fill per row (default: 10)-pc_type chebyshev - Enhanced Chebyshev with eigenvalue estimation-chebyshev_degree <int> - Polynomial degree (default: 3)-pc_type amg - Algebraic multigrid with smoothing control-amg_levels <int> - Number of AMG levels (default: 4)-amg_strength_threshold <float> - Strong connection threshold (default: 0.25)-amg_nu_pre <int> - Pre-smoothing steps (default: 1)-amg_nu_post <int> - Post-smoothing steps (default: 1)-pc_amg - shorthand alias for -pc_type amg.-pc_amg_coarsen <rs|hmis|pmis|falgout> - Coarsening strategy (maps to AMGConfig::coarsen_type).-pc_amg_interp <classical|direct|multipass|extended|standard> - Interpolation/extended-smoothing variant.-pc_amg_smoother <jacobi|gs|gsr|sgs|hgs|l1jacobi|chebyshev> - Smoother applied on each level.-pc_amg_smoother_steps <int> and -pc_amg_smoother_omega <float> control smoothing sweeps/relaxation weight.-pc_amg_truncation_factor <float> / -pc_amg_interp_maxnnz <int> trim interpolation fill.-pc_amg_rap_truncation_factor <float> / -pc_amg_rap_truncation_abs <float> / -pc_amg_rap_maxnnz <int> prune RAP entries.-pc_amg_keep_transpose <bool> / -pc_amg_keep_pivot_in_rap <bool> control symmetry-preserving entries.-pc_amg_require_spd <bool> / -pc_amg_print_setup <bool> control SPD enforcement and setup printing.Example AMG invocation:
./solve \
-pc_amg \
-pc_amg_levels 6 \
-pc_amg_strength_threshold 0.25 \
-pc_amg_coarsen hmis \
-pc_amg_interp extended \
-pc_amg_smoother chebyshev \
-pc_amg_smoother_steps 2 \
-pc_amg_smoother_omega 0.8 \
-pc_amg_truncation_factor 0.2 \
-pc_amg_interp_maxnnz 8 \
-pc_amg_rap_truncation_factor 0.05 \
-pc_amg_rap_truncation_abs 0.0 \
-pc_amg_rap_maxnnz 16 \
-pc_amg_keep_transpose true \
-pc_amg_keep_pivot_in_rap true \
-pc_amg_require_spd true \
-pc_amg_print_setup true
-pc_chain <string> - Sequential preconditioner chain (e.g., "jacobi,chebyshev")-pc_type asm - Additive Schwarz Method-pc_type approxinv - Approximate inverse preconditioner-pc_type ilu selects Kryst's HYPRE-inspired incomplete LU family (Ilu). -pc_type ilut/-pc_type ilutp run the lighter-weight row-filter ILUT or pivoting ILUTP preconditioners, while
-pc_type blockjacobi with -pc_local <ilu|ilut|ilutp> wraps a local ILU variant inside MPI
block-Jacobi. Setting -pc_type ilu with -pc_ilu_type ilut runs the canonical ILU threshold
factorization; Ilu::create_specialized may route that variant to crate::preconditioner::ilut::Ilut
for simplicity/efficiency.
| CLI flag | Config field | Notes |
|---|---|---|
| `-pc_ilu_type <ilu0 | milu0 | iluk |
-pc_ilu_level_of_fill <int> |
IluConfig::level_of_fill |
Controls level-of-fill for ILUK (typical 0–5). |
-pc_ilu_max_fill_per_row <int> |
IluConfig::max_fill_per_row |
Per-row fill cap for ILUK/ILUT; 10–50 keeps memory bounded. |
-pc_ilu_offdiag_drop_tolerance <float> |
IluConfig::offdiag_drop_tolerance |
Drop entries outside LU blocks. |
-pc_ilu_schur_drop_tolerance <float> |
IluConfig::schur_drop_tolerance |
For future Schur complements (currently dormant). |
| `-pc_ilu_triangular_solve <exact | jacobi | gauss_seidel>` |
-pc_ilu_lower_jacobi_iters <int> / -pc_ilu_upper_jacobi_iters <int> |
Jacobi iteration counts | Only used when the triangular solve is iterative. |
-pc_ilu_tolerance <float> / -pc_ilu_max_iterations <int> |
Iterative solve controls | Defaults 1e-6 & 1; iterative delivers residual-based refinement. |
-pc_ilu_parallel_factorization / -pc_ilu_parallel_trisolve / -pc_ilu_parallel_chunk_size <int> |
IluConfig::enable_parallel_*, parallel_chunk_size |
Enable experimental rayon paths; chunk size typically 16–256. |
-pc_ilut_drop_tol <float> |
IluConfig::drop_tolerance (row-filter ILUT) |
Simple heuristic ILUT drop threshold (1e-3–1e-6). |
-pc_ilut_max_fill <int> |
IluConfig::max_fill_per_row (row-filter ILUT) |
Limits kept entries per row (10–100). |
-pc_ilut_perm_tol <float> |
Pivot tolerance for row-filter ILUT | Not used by canonical Ilu but available for the lightweight ILUT preconditioner. |
-pc_ilutp_max_fill <int> / -pc_ilutp_drop_tol <float> / -pc_ilutp_perm_tol <float> |
Ilutp parameters |
Controls density, drop tolerance, and pivoting aggressiveness for ILUTP. |
Environment variables mirror the flags: KRYST_PC_ILU_TYPE, KRYST_PC_ILU_LEVEL_OF_FILL, KRYST_PC_ILU_MAX_FILL_PER_ROW, KRYST_PC_ILU_OFFDIAG_DROP_TOL, KRYST_PC_ILU_SCHUR_DROP_TOL, KRYST_PC_ILU_TRI_SOLVE, KRYST_PC_ILU_LOWER_JACOBI_ITERS, KRYST_PC_ILU_UPPER_JACOBI_ITERS, KRYST_PC_ILU_PARALLEL_FACTORIZATION, KRYST_PC_ILU_PARALLEL_TRISOLVE, KRYST_PC_ILU_PARALLEL_CHUNK_SIZE, plus KRYST_PC_ILUT_DROP_TOL, KRYST_PC_ILUT_MAX_FILL, KRYST_PC_ILUT_PERM_TOL, KRYST_PC_ILUTP_MAX_FILL, KRYST_PC_ILUTP_DROP_TOL, and KRYST_PC_ILUTP_PERM_TOL. Command-line flags override environment variables, which in turn override the built-in defaults.
-pc_type ilu -pc_ilu_type ilu0 -pc_ilu_triangular_solve exact
-pc_type ilu -pc_ilu_type ilut -pc_ilut_drop_tol 1e-5 -pc_ilut_max_fill 50
-pc_type ilutp -pc_ilutp_max_fill 20 -pc_ilutp_drop_tol 1e-4 -pc_ilutp_perm_tol 0.1
-pc_type blockjacobi -pc_local ilu -pc_ilu_type ilu0 -pc_ilu_level_of_fill 1
The first line compares Jacobi vs ILU(0) on examples/poisson_spd_ilu0_vs_jacobi.rs; the second
shows ILUT tuning. The third line mirrors the convection–diffusion ILUTP demo
(examples/convection_diffusion_ilutp.rs), and the last line is the MPI block-Jacobi + ILU(0)
toy from examples/mpi_poisson_block_jacobi_ilu.rs.
-pc_type lu - Direct LU factorization via SuperLU-pc_type qr - Direct QR factorization-asm_overlap <int> - ASM subdomain overlap (default: 1)-asm_type <type> - ASM variant: restrict, interpolate, basic# Enhanced Chebyshev preconditioning
-ksp_type cg -pc_type chebyshev -chebyshev_degree 6
# AMG with custom smoothing
-ksp_type gmres -pc_type amg -amg_nu_pre 2 -amg_nu_post 1
# Composite preconditioning (PC-chaining)
-ksp_type cg -pc_chain "jacobi,chebyshev" -chebyshev_degree 4
# High-accuracy direct solve
-ksp_type preonly -pc_type lu
# BiCGStab with threshold ILU
-ksp_type bicgstab -pc_type ilut -pc_ilut_drop_tol 1e-4
# GMRES with additive Schwarz
-ksp_type gmres -pc_type asm -asm_overlap 2
Track solver convergence with real-time monitoring:
use kryst::utils::monitor::IterationMonitor;
use kryst::context::ksp_context::{KspContext, SolverType};
use kryst::context::pc_context::PcType;
use std::sync::{Arc, Mutex};
use std::time::Duration;
// Create and configure monitor
let mut monitor = IterationMonitor::new();
monitor.enable_csv_logging("convergence_history.csv").unwrap();
// Configure solver with monitoring callback
let monitor_ref = Arc::new(Mutex::new(monitor));
let monitor_clone = Arc::clone(&monitor_ref);
let mut ksp = KspContext::new();
ksp.set_type(SolverType::Gmres)?
.set_pc_type(PcType::Jacobi, None)?
.set_operators(a.clone(), None);
// Add monitoring callback
ksp.add_monitor(move |iter, residual| {
if let Ok(mut mon) = monitor_clone.lock() {
mon.record_iteration(iter, residual, None);
}
});
// Solve with monitoring
ksp.setup()?;
let stats = ksp.solve(&rhs, &mut solution)?;
// Analyze convergence
if let Ok(mon) = monitor_ref.lock() {
let convergence_stats = mon.get_statistics();
println!("Total iterations: {}", convergence_stats.total_iterations);
println!("Average convergence rate: {:.4}", convergence_stats.avg_convergence_rate);
println!("Final residual: {:.2e}", convergence_stats.final_residual);
// Check for convergence issues
if mon.recent_convergence_rate(5).unwrap_or(1.0) > 0.9 {
println!("Warning: Slow convergence detected");
}
}
Optimize solver/preconditioner combinations automatically:
use kryst::utils::tuning::{ParameterTuner, ParameterConfig};
use kryst::context::ksp_context::SolverType;
use kryst::context::pc_context::PcType;
use std::time::Duration;
let mut tuner = ParameterTuner::new();
// Configure search space
tuner.set_solver_types(vec![SolverType::Cg, SolverType::Gmres, SolverType::BiCgStab])
.set_pc_types(vec![PcType::Jacobi, PcType::Chebyshev, PcType::Amg])
.set_tolerances(vec![1e-6, 1e-8, 1e-10])
.set_max_config_time(Duration::from_secs(60));
// Add PC-chain configurations for composite preconditioning
tuner.add_pc_chains(vec![
"jacobi,chebyshev".to_string(),
"jacobi,ilu0".to_string(),
]);
// Run automated tuning
let (best_config, all_results) = tuner.tune_parameters(&matrix, &rhs, 10).unwrap();
println!("Best configuration found:");
println!(" Solver: {:?}", best_config.solver_type);
println!(" Preconditioner: {:?}", best_config.pc_type);
println!(" Tolerance: {:.2e}", best_config.rtol);
if let Some(chain) = &best_config.pc_chain {
println!(" PC Chain: {}", chain);
}
println!(" Converged: {}", all_results.iter().find(|r| r.config.solver_type == best_config.solver_type).unwrap().converged);
// Export results for further analysis
tuner.export_results("tuning_results.txt").unwrap();
let summary = tuner.get_summary();
println!("Success rate: {:.1}%", summary.get("convergence_rate").unwrap_or(&0.0) * 100.0);
use kryst::utils::monitor::IterationMonitor;
use std::time::Duration;
let mut monitor = IterationMonitor::new();
monitor.start_solve();
// Record some iterations
monitor.record_iteration(0, 1.0, None);
monitor.record_iteration(1, 0.5, Some(Duration::from_millis(10)));
monitor.record_iteration(2, 0.25, Some(Duration::from_millis(12)));
// Mark convergence
monitor.mark_converged("Relative tolerance achieved");
// Get detailed statistics
let stats = monitor.get_statistics();
println!("Convergence statistics:");
println!(" Total iterations: {}", stats.total_iterations);
println!(" Average convergence rate: {:.4}", stats.avg_convergence_rate);
println!(" Best convergence rate: {:.4}", stats.best_convergence_rate);
println!(" Average iteration time: {:.3}ms", stats.avg_iteration_time.as_secs_f64() * 1000.0);
// Check recent convergence behavior
if let Some(recent_rate) = monitor.recent_convergence_rate(3) {
println!("Recent convergence rate (last 3 iterations): {:.4}", recent_rate);
}
// Set up real-time monitoring callbacks
let mut ksp = KspContext::new();
ksp.add_monitor(|iter, residual| {
println!("Iteration {}: residual = {:.3e}", iter, residual);
// Custom monitoring logic
if iter > 0 && iter % 10 == 0 {
println!(" Checkpoint: {} iterations completed", iter);
}
});
Enable detailed timing and performance information:
[dependencies]
kryst = { version = "1.0", features = ["logging"] }
Run with environment variables for detailed profiling:
# Trace-level logging shows detailed stage timing
RUST_LOG=trace cargo run --features=logging
# Debug-level shows major operations
RUST_LOG=debug cargo run --features=logging
# Info-level shows high-level progress
RUST_LOG=info cargo run --features=logging
Profiling output includes:
-pc_type lu and -pc_type qrM⁻¹ = diag(A)⁻¹Enhanced polynomial preconditioning implementation based on eigenvalue estimation:
use kryst::preconditioner::chebyshev::Chebyshev;
use kryst::config::options::PcOptions;
use kryst::context::pc_context::PcType;
// Enhanced Chebyshev with automatic eigenvalue estimation
let mut pc_opts = PcOptions::default();
pc_opts.chebyshev_degree = Some(6); // Higher degree for better approximation
ksp.set_pc_type(PcType::Chebyshev, Some(&pc_opts))?;
Features:
Advanced Algebraic Multigrid with configurable smoothing:
use kryst::preconditioner::amg::Amg;
use kryst::config::options::PcOptions;
use kryst::context::pc_context::PcType;
// Enhanced AMG with smoothing control
let mut pc_opts = PcOptions::default();
pc_opts.amg_levels = Some(5); // Multigrid levels
pc_opts.amg_strength_threshold = Some(0.5); // Strong connection threshold
pc_opts.amg_nu_pre = Some(2); // Pre-smoothing steps
pc_opts.amg_nu_post = Some(1); // Post-smoothing steps
ksp.set_pc_type(PcType::Amg, Some(&pc_opts))?;
Features:
PC-chaining allows sequential application of multiple preconditioners:
use kryst::config::options::{KspOptions, PcOptions};
// Example 1: Jacobi + Chebyshev combination
let mut pc_opts = PcOptions::default();
pc_opts.pc_chain = Some("jacobi,chebyshev".to_string());
pc_opts.chebyshev_degree = Some(4);
ksp.set_from_options(&KspOptions::default(), &pc_opts)?;
// Example 2: Multi-stage preconditioning
let mut pc_opts = PcOptions::default();
pc_opts.pc_chain = Some("jacobi,ilu0,chebyshev".to_string());
ksp.set_from_options(&KspOptions::default(), &pc_opts)?;
// Example 3: Domain decomposition + multigrid
let mut pc_opts = PcOptions::default();
pc_opts.pc_chain = Some("asm,amg".to_string());
pc_opts.amg_nu_pre = Some(1);
ksp.set_from_options(&KspOptions::default(), &pc_opts)?;
Features:
ParameterTunerfaer::Mat<T> integrationMatVec interface for custom matrix implementationsThe library includes comprehensive demonstration programs:
# Options and CLI interface demonstration
cargo run --example options_demo -- -ksp_type gmres -pc_type jacobi -ksp_rtol 1e-8
# Direct solver usage
cargo run --example dense_direct
# Matrix market file demonstration
cargo run --example matrix_market_demo
# Convergence behavior analysis
cargo run --example convergence_demo
# Iteration monitoring demonstration
cargo run --example monitor -- --features=logging
# HYPRE-style ILU demonstration
cargo run --example hypre_ilu_demo
# MPI parallel examples (requires MPI)
mpirun -n 4 cargo run --example mpi_parallel_demo --features mpi
Note: Matrix Market example files (*.mtx) are excluded from the published crate to stay within size limits. The matrix_market_demo example will auto-generate test data if example files are not found.
# Enhanced Chebyshev preconditioning
cargo run --example options_demo -- -ksp_type cg -pc_type chebyshev -chebyshev_degree 6
# AMG with custom smoothing parameters
cargo run --example options_demo -- -ksp_type gmres -pc_type amg -amg_nu_pre 3 -amg_nu_post 2
# Composite preconditioning with PC-chaining
cargo run --example options_demo -- -ksp_type cg -pc_chain "jacobi,chebyshev" -chebyshev_degree 4
# High-precision direct solve
cargo run --example options_demo -- -ksp_type preonly -pc_type lu
# Complex preconditioner combinations
cargo run --example options_demo -- -ksp_type fgmres -pc_type ilut -pc_ilut_drop_tol 1e-5
Performance benchmarks are available via:
cargo bench
Benchmark categories include:
Sample benchmark results (varies by system and problem):
solver_comparison/gmres time: 45.2 ms (convergence: 23 iterations)
solver_comparison/bicgstab time: 38.7 ms (convergence: 31 iterations)
solver_comparison/cg time: 22.1 ms (convergence: 18 iterations)
pc_effectiveness/jacobi time: 156 ms (convergence: 89 iterations)
pc_effectiveness/amg time: 67.3 ms (convergence: 12 iterations)
pc_chaining/jacobi+cheby time: 43.8 ms (convergence: 15 iterations)
use kryst::{LinearSolver, MatVec, Preconditioner, SolveStats, KError};
struct MyCustomSolver {
tolerance: f64,
max_iterations: usize,
}
impl<M, V> LinearSolver<M, V> for MyCustomSolver
where
M: MatVec<V>,
V: Clone,
{
fn solve(
&mut self,
matrix: &M,
preconditioner: Option<&dyn Preconditioner<M, V>>,
rhs: &V,
solution: &mut V
) -> Result<SolveStats, KError> {
// Custom solver implementation
// Return solve statistics
Ok(SolveStats {
iterations: 0,
residual_norm: 0.0,
converged: true,
})
}
}
use kryst::{Preconditioner, PcSide, KError};
struct MyCustomPreconditioner {
// Preconditioner data structures
factorization: Option<Vec<f64>>,
}
impl<M, V> Preconditioner<M, V> for MyCustomPreconditioner {
fn setup(&mut self, matrix: &M) -> Result<(), KError> {
// Preconditioner setup/factorization phase
// Store factorization data
Ok(())
}
fn apply(&self, side: PcSide, x: &V, y: &mut V) -> Result<(), KError> {
// Apply M⁻¹x → y (or x M⁻¹ → y for right preconditioning)
match side {
PcSide::Left => {
// Left preconditioning: solve Mz = x, return z in y
},
PcSide::Right => {
// Right preconditioning: solve zM = x, return z in y
},
}
Ok(())
}
}
use kryst::core::traits::MatVec;
use kryst::error::KError;
struct LaplacianOperator {
n: usize, // Grid size
h: f64, // Grid spacing
}
impl MatVec<Vec<f64>> for LaplacianOperator {
fn matvec(&self, x: &Vec<f64>, y: &mut Vec<f64>) -> Result<(), KError> {
// Implement matrix-vector product y = Ax
// For 1D Laplacian: -u''(x) ≈ -(u[i+1] - 2u[i] + u[i-1])/h²
for i in 0..self.n {
if i == 0 || i == self.n - 1 {
y[i] = x[i]; // Boundary conditions
} else {
y[i] = (-x[i-1] + 2.0*x[i] - x[i+1]) / (self.h * self.h);
}
}
Ok(())
}
fn size(&self) -> (usize, usize) {
(self.n, self.n)
}
}
// Usage with KspContext
use std::sync::Arc;
let laplacian = Arc::new(LaplacianOperator { n: 1000, h: 0.001 });
let mut ksp = KspContext::new();
ksp.set_type(SolverType::Cg)?
.set_pc_type(PcType::Jacobi, None)?
.set_operators(laplacian.clone(), None);
// Can use matrix-free operator directly
let rhs = vec![1.0; laplacian.n];
let mut sol = vec![0.0; laplacian.n];
ksp.setup()?;
let stats = ksp.solve(&rhs, &mut sol)?;
Run the comprehensive test suite:
# All tests
cargo test
# Specific test categories
cargo test --lib solver
cargo test --lib preconditioner
cargo test --lib context
cargo test --lib utils
# Integration tests
cargo test test_phase_iii_iv_integration
cargo test test_options_integration
cargo test test_preconditioner_integration
# With specific features
cargo test --features "rayon"
cargo test --features "mpi"
cargo test --features "logging"
# Performance testing
cargo test --release
The matrix feature matrix and MPI/Rayon test plan live in
docs/matrix_features.md. Use them to validate communicator reductions,
distributed SpMV/halo exchange, and Rayon-local kernels for
backend-faer + mpi + rayon builds.
Use the following steps as a minimal MPI validation recipe (local or CI):
mpirun -n 2 cargo test --features "mpi backend-faer"
mpirun -n 2 cargo test --features "mpi rayon backend-faer"
New Features:
Breaking Changes:
Recommended Upgrades:
// Old approach
ksp.set_pc_type(PcType::Chebyshev, None)?;
// Enhanced approach (optional)
let mut pc_opts = PcOptions::default();
pc_opts.chebyshev_degree = Some(6);
ksp.set_pc_type(PcType::Chebyshev, Some(&pc_opts))?;
New Monitoring Capabilities:
// Add iteration monitoring
use kryst::utils::monitor::IterationMonitor;
let mut monitor = IterationMonitor::new();
ksp.add_monitor(|iter, residual| {
println!("Iteration {}: {:.2e}", iter, residual);
});
// Add automated parameter tuning
use kryst::utils::tuning::ParameterTuner;
let mut tuner = ParameterTuner::new();
let (best_config, _) = tuner.tune_parameters(&matrix, &rhs, 5).unwrap();
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Clone the repository:
git clone https://github.com/tmathis720/kryst.git
cd kryst
Install Rust (stable toolchain recommended):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Optional: Install MPI for distributed features:
# Ubuntu/Debian
sudo apt-get install libopenmpi-dev
# macOS
brew install open-mpi
Run tests and benchmarks:
cargo test
cargo bench
cargo test --features "mpi" # If MPI is available
scripts/ci_checks.sh – runs cargo fmt --all -- --check, cargo clippy --all-targets --all-features, and cargo test --all-features.scripts/ub_paranoia.sh – executes ASan-enabled tests on the nightly toolchain for the buffer pool and dot engines.scripts/miri_reduction.sh – runs the same focused suite under cargo miri (nightly) to catch UB in the unsafe utilities.cargo fmtcargo clippygit checkout -b feature/amazing-featurecargo testcargo fmt && cargo clippygit commit -m 'Add amazing feature'git push origin feature/amazing-featurekryst provides a comprehensive, high-performance linear algebra toolkit for the Rust ecosystem, with particular focus on iterative methods for large-scale scientific computing applications. The library combines the mathematical rigor of established numerical libraries like PETSc with the safety and performance characteristics of Rust, making it ideal for research, scientific computing, and production applications requiring robust linear system solvers.