matrixmultiply_mt

Crates.io	matrixmultiply_mt
lib.rs	matrixmultiply_mt
version	0.2.1
created_at	2017-01-24 14:00:58.828479+00
updated_at	2022-08-22 10:21:11.621046+00
description	Multithreaded Fork of bluss's matrixmultiply crate. General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides. Uses a microkernel strategy, so that the implementation is easy to parallelize and optimize. `RUSTFLAGS="-C target-cpu=native"` is your friend here.
homepage
repository	https://github.com/millardjn/matrixmultiply_mt
max_upload_size
id	8205
size	108,439

zza (millardjn)

documentation

https://bluss.github.io/matrixmultiply/

README

About

A multithreaded fork of bluss' matrixmultiply crate. General matrix multiplication for f32, f64 matrices. Allows arbitrary row, column strided matrices. Relies heavily on llvm to vectorise the floating point ops.

Tuning

To enable specialised vector instructions for you computer compile using: RUSTFLAGS="-C target-cpu=native" and MATMULFLAGS="flag1, flag2, ..." where one flag is an architecture flag:

arch_generic4x4           // fallback if architecture is unknown, should use x86 sse and ARM Neon
arch_generic4x4fma        // might be useful for newer ARM Neon
arch_penryn               // uses the extra x86_64 xmm registers
arch_sandybridge          // uses AVX
arch_haswell              // uses AVX2

and the rest are optional flags:

ftz_daz                   // (nightly) On x86 this will round denormals to zero to improve performance
prefetch                  // (nightly) Inserts prefetch instructions tuned for recent intel processors
no_multithreading         // disables multithreading

e.g. MATMULFLAGS="arch_sandybridge, ftz_daz"

On nightly, the build script will use CARGO_CFG_TARGET_FEATURE to guess the best architecture flag if one isnt supplied.

Commit count: 103

matrixmultiply_mt

documentation

README

About

Tuning

cargo fmt