FMMLA Floating-point matrix multiply-accumulate The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2 matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the Effective SVE vector length is at least 256 bits. ID_AA64ZFR0_EL1.F32MM indicates whether the single-precision variant is implemented. ID_AA64ZFR0_EL1.F64MM indicates whether the double-precision variant is implemented. This instruction is illegal when executed in Streaming SVE mode, unless FEAT_SME_FA64 is implemented and enabled. Green False True SM_0_only It has encodings from 2 classes: 32-bit element and 64-bit element 0 1 1 0 0 1 0 0 1 0 1 1 1 1 0 0 1 FMMLA <Zda>.S, <Zn>.S, <Zm>.S if !IsFeatureImplemented(FEAT_SVE) || !IsFeatureImplemented(FEAT_F32MM) then UNDEFINED; constant integer esize = 32; constant integer n = UInt(Zn); constant integer m = UInt(Zm); constant integer da = UInt(Zda); 0 1 1 0 0 1 0 0 1 1 1 1 1 1 0 0 1 FMMLA <Zda>.D, <Zn>.D, <Zm>.D if !IsFeatureImplemented(FEAT_SVE) || !IsFeatureImplemented(FEAT_F64MM) then UNDEFINED; constant integer esize = 64; constant integer n = UInt(Zn); constant integer m = UInt(Zm); constant integer da = UInt(Zda); <Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field. <Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field. <Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field. CheckNonStreamingSVEEnabled(); constant integer VL = CurrentVL; constant integer PL = VL DIV 8; if VL < esize * 4 then UNDEFINED; constant integer segments = VL DIV (4 * esize); constant bits(VL) operand1 = Z[n, VL]; constant bits(VL) operand2 = Z[m, VL]; constant bits(VL) operand3 = Z[da, VL]; bits(VL) result = Zeros(VL); bits(4*esize) op1, op2; bits(4*esize) res, addend; for s = 0 to segments-1 op1 = Elem[operand1, s, 4*esize]; op2 = Elem[operand2, s, 4*esize]; addend = Elem[operand3, s, 4*esize]; res = FPMatMulAdd(addend, op1, op2, esize, FPCR); Elem[result, s, 4*esize] = res; Z[da, VL] = result;