FMMLA
Floating-point matrix multiply-accumulate
The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2 matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the Effective SVE vector length is at least 256 bits.
ID_AA64ZFR0_EL1.F32MM indicates whether the single-precision variant is implemented.
ID_AA64ZFR0_EL1.F64MM indicates whether the double-precision variant is implemented.
This instruction is illegal when executed in Streaming SVE mode, unless FEAT_SME_FA64 is implemented and enabled.
Green
False
True
SM_0_only
It has encodings from 2 classes:
32-bit element
and
64-bit element
0
1
1
0
0
1
0
0
1
0
1
1
1
1
0
0
1
FMMLA <Zda>.S, <Zn>.S, <Zm>.S
if !IsFeatureImplemented(FEAT_SVE) || !IsFeatureImplemented(FEAT_F32MM) then UNDEFINED;
constant integer esize = 32;
constant integer n = UInt(Zn);
constant integer m = UInt(Zm);
constant integer da = UInt(Zda);
0
1
1
0
0
1
0
0
1
1
1
1
1
1
0
0
1
FMMLA <Zda>.D, <Zn>.D, <Zm>.D
if !IsFeatureImplemented(FEAT_SVE) || !IsFeatureImplemented(FEAT_F64MM) then UNDEFINED;
constant integer esize = 64;
constant integer n = UInt(Zn);
constant integer m = UInt(Zm);
constant integer da = UInt(Zda);
<Zda>
Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn>
Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm>
Is the name of the second source scalable vector register, encoded in the "Zm" field.
CheckNonStreamingSVEEnabled();
constant integer VL = CurrentVL;
constant integer PL = VL DIV 8;
if VL < esize * 4 then UNDEFINED;
constant integer segments = VL DIV (4 * esize);
constant bits(VL) operand1 = Z[n, VL];
constant bits(VL) operand2 = Z[m, VL];
constant bits(VL) operand3 = Z[da, VL];
bits(VL) result = Zeros(VL);
bits(4*esize) op1, op2;
bits(4*esize) res, addend;
for s = 0 to segments-1
op1 = Elem[operand1, s, 4*esize];
op2 = Elem[operand2, s, 4*esize];
addend = Elem[operand3, s, 4*esize];
res = FPMatMulAdd(addend, op1, op2, esize, FPCR);
Elem[result, s, 4*esize] = res;
Z[da, VL] = result;