Crates.io | blas-array2 |
lib.rs | blas-array2 |
version | |
source | src |
created_at | 2024-07-14 08:01:07.967789 |
updated_at | 2024-07-14 08:01:07.967789 |
description | Implementation of parameter-optional BLAS2 and BLAS3 wrapper by `ndarray::Array` (`Ix1` or `Ix2`). |
homepage | |
repository | |
max_upload_size | |
id | 1302822 |
size | 0 |
Implementation of parameter-optional BLAS wrapper by ndarray::Array
(Ix1
or Ix2
).
scipy.linalg.blas
convention. Shape of matrix, and information of leading dimension will be checked properly. These values are automatically parsed in program, so users do not need to give these values.blas-sys
, for example.GEMM<F> where F: BLASFloat
for f32
, f64
, Complex<f32>
, Complex<f64>
types, in one generic (template) class. The same to SYRK
or GEMV
, etc.ndarray
allows (for correctness and API convenience only, but may require explicit raw data clone among DRAM if data is not stored contiguously in any dimension).For simple illustration to this package, we perform $C = A B$ (dgemm
):
use blas_array2::prelude::*;
use ndarray::prelude::*;
let a = array![[1.0, 2.0, 3.0], [3.0, 4.0, 5.0]];
let b = array![[-1.0, -2.0], [-3.0, -4.0], [-5.0, -6.0]];
let c_out = DGEMM::default().a(a.view()).b(b.view()).run().unwrap().into_owned();
println!("{:7.3?}", c_out);
Important points are
::default()
to initialize struct;.a
, .b
are setter functions;.run().unwrap()
will perform computation;.into_owned()
will return result matrix as Array2<f64>
.For complicated situation, we perform $C = A B^\dagger$ by SGEMM = GEMM<f32>
:
use blas_array2::prelude::*;
use ndarray::prelude::*;
let a = array![[1.0, 2.0, 3.0], [3.0, 4.0, 5.0]];
let b = array![[-1.0, -2.0], [-3.0, -4.0], [-5.0, -6.0]];
let mut c = Array::ones((3, 3).f());
let c_out = GEMM::<f32>::default()
.a(a.slice(s![.., ..2]))
.b(b.view())
.c(c.slice_mut(s![0..3;2, ..]))
.transb('T')
.beta(1.5)
.run()
.unwrap();
// one can get the result as an owned array
// but the result may not refer to the same memory location as `c`
println!("{:4.3?}", c_out.into_owned());
// this modification on `c` is actually performed in-place
// so if `c` is pre-defined, not calling `into_owned` could be more efficient
println!("{:4.3?}", c);
Important points are
.c
is (optional) output setter, which consumes ArrayViewMut2<f64>
; this matrix will be modified in-place;.transb
, .beta
are optional setters; default of transb
is 'N'
, while default of beta
is zero, which are the same convention to scipy's implementation to python interface of BLAS. You may change these default values by feeding values into optional setters.c_out.into_owned()
returns output (submatrix if c
was sliced when passed into setter) as Array2<f64>
. Note that this output does not share the same memory address to mut c
.c_out.view()
or c_out.view_mut()
returns view of c
; these views share the same memory address to mut c
.c
directly. DGEMM operation is performed inplace if output matrix c
is given.To make clear of the code above, this code spinnet performs matrix multiplication in-place
c = alpha * a * transpose(b) + beta * c
where
alpha = 1.0 (by default)
beta = 1.5
a = [[1.0, 2.0, ___],
[3.0, 4.0, ___]]
(sliced by `s![.., ..2]`)
b = [[-1.0, -2.0],
[-3.0, -4.0],
[-5.0, -6.0]]
c = [[1.0, 1.0, 1.0],
[___, ___, ___],
[1.0, 1.0, 1.0]]
(Column-major, sliced by `s![0..3;2, ..]`)
Output of c
is
[[-3.500, -9.500, -15.500],
[ 1.000, 1.000, 1.000],
[-9.500, -23.500, -37.500]]
This package haven't been deployed. After implementing all functions in BLAS1/2/3, a version 0.1 will be released.
For development usage, if there's any difficulties encountered in installation, then please check if blas-sys
installed and linked properly. May be resolved by declaring
RUSTFLAGS="-lopenblas"
if using OpenBLAS as backend.
Though I believe this crate provides many functionalities that interests audiences in scientific computation, there are also some points that this crate is lack of, or is not going to support with by design.
For the features that will not support with,
ndarray
, nalgebra
, matrix
, faer-rs
, rulinalg
, rest_tensors
represent typical matrix implementations in rust. Though some crates are extremely fast (comparable to MKL or OpenBLAS) in linalgs, it seems that ndarray
could better support features in high-dimension tensors and sub-matrices, as well as advanced slicing/view. This is also related to concept of "leading dimension" in BLAS. So ndarray
is choosen to represent matrix in this crate. Matrices defined in other crates should be able to easily casted to ndarray
, without explicit memory clone (by rust's moving or by tensor view of slice from raw-data), and thus not providing BLAS to other types of matrices.derive_build
) to pass optional parameters. In this way, there are also at least two drawbacks: 1) no IDE-time/compile-time check to non-optional parameters, so errors may occur in runtime; this require programmers to use this crate with caution; 2) [no_std]
is not available.For the features that can be added, but currently haven't been added (probably I'm not interested in for the moment I'm writing this crate) and may be implemented in a later time, or may be implemented after someone giving feature requests in issues/PRs.
ndarray
. Efficiency of BLAS1 can be achieved by compiler optimization (especially for serial/single-thread usage), using BLAS may not significantly improve efficiency if your code is mostly computation-bounded instead of memory-bounded.BF16
features.omatcopy
, imatcopy
, gemmt
, gemm3m
, that has already been implemented in both OpenBLAS, MKL and BLIS. These can be implemented by features with cargo, and may be one of the first things to be implemented.blas-sys
for FFI to BLAS, which FFI integer is libc::c_int
. In most cases it is LP64; so currently ILP64 is not supported. I'm not sure if there exist some tasks that LP64 is not enough.scipy.linalg.blas
replacement in rust. So documentation may not be of that important (https://netlib.org/lapack/explore-html/ is farely well documented), but probably it's still required to newcommers.This project is developed as a side project from REST.