FDOT (8-bit floating-point to single-precision, by element)
8-bit floating-point dot product to single-precision (vector, by element)
This instruction computes the fused sum-of-products of a group of four 8-bit
floating-point values held in each 32-bit element of the first source
vector and a group of four 8-bit floating-point values in an indexed 32-bit
element of the second source vector. The single-precision sum-of-products
are scaled by 2-UInt(FPMR.LSCALE), before being destructively
added without intermediate rounding to the corresponding single-precision
elements of the destination vector.
The 8-bit floating-point groups within the second source vector are
specified using an immediate index.
The 8-bit floating-point encoding format for the elements of the first
source vector is selected by FPMR.F8S1.
The 8-bit floating-point encoding format for the elements of the second
source vector is selected by FPMR.F8S2.
0
0
0
1
1
1
1
0
0
0
0
0
0
0
FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]
if !IsFeatureImplemented(FEAT_FP8DOT4) then UNDEFINED;
constant integer n = UInt(Rn);
constant integer d = UInt(Rd);
constant integer m = UInt(M:Rm);
constant integer i = UInt(H:L);
constant integer datasize = if Q == '1' then 128 else 64;
constant integer esize = 32;
constant integer elements = datasize DIV esize;
<Vd>
Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Ta>
Is an arrangement specifier,
<Vn>
Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Tb>
Is an arrangement specifier,
<Vm>
Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index>
Is the immediate index of a 32-bit group of four 8-bit values, in the range 0 to 3, encoded in the "H:L" fields.
CheckFPMREnabled(); CheckFPAdvSIMDEnabled64();
constant bits(datasize) operand1 = V[n, datasize];
constant bits(128) operand2 = V[m, 128];
constant bits(datasize) operand3 = V[d, datasize];
bits(datasize) result;
for e = 0 to elements-1
constant bits(esize) op1 = Elem[operand1, e, esize];
constant bits(esize) op2 = Elem[operand2, i, esize];
constant bits(esize) sum = Elem[operand3, e, esize];
Elem[result, e, esize] = FP8DotAddFP(sum, op1, op2, FPCR, FPMR);
V[d, datasize] = result;