FMLA (vector)
Floating-point fused multiply-add to accumulator (vector)
This instruction multiplies corresponding floating-point values
in the vectors in the two source SIMD&FP registers,
adds the product to the corresponding vector element of the
destination SIMD&FP register, and writes the result to
the destination SIMD&FP register.
A floating-point exception can be generated by this instruction.
Depending on the settings in FPCR,
the exception results in either a flag being set in FPSR,
or a synchronous exception being generated.
For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1,
CPTR_EL2, and CPTR_EL3 registers,
and the current Security state and Exception level,
an attempt to execute the instruction might be trapped.
It has encodings from 2 classes:
Half-precision
and
Single-precision and double-precision
0
0
0
1
1
1
0
0
1
0
0
0
0
0
1
1
FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
if !IsFeatureImplemented(FEAT_FP16) then UNDEFINED;
constant integer d = UInt(Rd);
constant integer n = UInt(Rn);
constant integer m = UInt(Rm);
constant integer esize = 16;
constant integer datasize = 64 << UInt(Q);
constant integer elements = datasize DIV esize;
0
0
0
1
1
1
0
0
1
1
1
0
0
1
1
FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
if sz:Q == '10' then UNDEFINED;
constant integer d = UInt(Rd);
constant integer n = UInt(Rn);
constant integer m = UInt(Rm);
constant integer esize = 32 << UInt(sz);
constant integer datasize = 64 << UInt(Q);
constant integer elements = datasize DIV esize;
<Vd>
Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<T>
For the half-precision variant: is an arrangement specifier,
<T>
For the single-precision and double-precision variant: is an arrangement specifier,
sz
Q
<T>
0
0
2S
0
1
4S
1
0
RESERVED
1
1
2D
<Vn>
Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm>
Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
constant bits(datasize) operand1 = V[n, datasize];
constant bits(datasize) operand2 = V[m, datasize];
constant bits(datasize) operand3 = V[d, datasize];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, FPCR);
V[d, datasize] = result;