FMLA (by element)
Floating-point fused multiply-add to accumulator (by element)
This instruction multiplies the vector elements
in the first source SIMD&FP register by the specified
value in the second source SIMD&FP register,
and accumulates the results
in the vector elements of the destination SIMD&FP register.
All the values in this instruction are floating-point values.
This instruction can generate a floating-point exception.
Depending on the settings in FPCR,
the exception results in either a flag being set in FPSR
or a synchronous exception being generated.
For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1,
CPTR_EL2, and CPTR_EL3 registers,
and the current Security state and Exception level,
an attempt to execute the instruction might be trapped.
It has encodings from 4 classes:
Scalar, half-precision
,
Scalar, single-precision and double-precision
,
Vector, half-precision
and
Vector, single-precision and double-precision
0
1
0
1
1
1
1
1
0
0
0
0
0
1
0
FMLA <Hd>, <Hn>, <Vm>.H[<index>]
if !IsFeatureImplemented(FEAT_FP16) then UNDEFINED;
constant integer idxdsize = 64 << UInt(H);
constant integer n = UInt(Rn);
constant integer m = UInt(Rm);
constant integer d = UInt(Rd);
constant integer index = UInt(H:L:M);
constant integer esize = 16;
constant integer datasize = esize;
constant integer elements = 1;
0
1
0
1
1
1
1
1
1
0
0
0
1
0
FMLA <V><d>, <V><n>, <Vm>.<Ts>[<index>]
constant integer idxdsize = 64 << UInt(H);
integer index;
constant bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;
constant integer d = UInt(Rd);
constant integer n = UInt(Rn);
constant integer m = UInt(Rmhi:Rm);
constant integer esize = 32 << UInt(sz);
constant integer datasize = esize;
constant integer elements = 1;
0
0
0
1
1
1
1
0
0
0
0
0
1
0
FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]
if !IsFeatureImplemented(FEAT_FP16) then UNDEFINED;
constant integer idxdsize = 64 << UInt(H);
constant integer n = UInt(Rn);
constant integer m = UInt(Rm);
constant integer d = UInt(Rd);
constant integer index = UInt(H:L:M);
constant integer esize = 16;
constant integer datasize = 64 << UInt(Q);
constant integer elements = datasize DIV esize;
0
0
0
1
1
1
1
1
0
0
0
1
0
FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]
if sz:Q == '10' then UNDEFINED;
constant integer idxdsize = 64 << UInt(H);
integer index;
constant bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;
constant integer d = UInt(Rd);
constant integer n = UInt(Rn);
constant integer m = UInt(Rmhi:Rm);
constant integer esize = 32 << UInt(sz);
constant integer datasize = 64 << UInt(Q);
constant integer elements = datasize DIV esize;
<Hd>
Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn>
Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm>
For the half-precision variant: is the name of the second SIMD&FP source register, in the range V0 to V15, encoded in the "Rm" field.
<Vm>
For the single-precision and double-precision variant: is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index>
For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
<index>
For the single-precision and double-precision variant: is the element index,
sz
L
<index>
0
x
UInt(H:L)
1
0
UInt(H)
1
1
RESERVED
<V>
Is a width specifier,
<d>
Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n>
Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Ts>
Is an element size specifier,
<Vd>
Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<T>
For the half-precision variant: is an arrangement specifier,
<T>
For the single-precision and double-precision variant: is an arrangement specifier,
Q
sz
<T>
0
0
2S
0
1
RESERVED
1
0
4S
1
1
2D
<Vn>
Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
constant bits(datasize) operand1 = V[n, datasize];
constant bits(idxdsize) operand2 = V[m, idxdsize];
constant bits(datasize) operand3 = V[d, datasize];
bits(esize) element1;
constant bits(esize) element2 = Elem[operand2, index, esize];
constant boolean merge = elements == 1 && IsMerging(FPCR);
bits(128) result = if merge then V[d, 128] else Zeros(128);
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, FPCR);
V[d, 128] = result;