tract-linalg
linalg stands for "linear algebra". This is a misnamer. This crates contains
low-level, architecture dependant optimisations used by tract-core.
Functions
- MatMatMul: Extended matrix*matrix product:
- inspired by Gotoblass and BLIS micro kernel approach
- extended for convolution friendly addressing (fused img2col)
- fused output pipeline (min, max, and a few more simple, fast ops)
- f32*f32 -> f32 (à la sgemm)
- i8*i8 -> i32 accumulator -> i32 storage
- i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
- f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
- byte-to-byte lookup table
Implementations
|
generic fallback |
armv6, vfp |
armv7 neon |
armv8 simd |
x64 FMA |
MatMatMul f32 |
|
4x4 |
8x4 |
8x8 |
16x6 |
MatMatMul i8->i8 |
|
|
8x4 |
|
8x8 |
MatMatMul i8->i32 |
|
|
|
|
8x8 |
sigmoid f32 |
|
|
4n |
4n |
|
tanh f32 |
|
|
4n |
4n |
|
byte lookup |
|
|
|
|
|