SADDW, SADDW2 Signed add wide This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register. The SADDW instruction extracts the second source vector from the lower half of the second source register. The SADDW2 instruction extracts the second source vector from the upper half of the second source register. Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped. If PSTATE.DIT is 1: The execution time of this instruction is independent of: The values of the data supplied in any of its registers. The values of the NZCV flags. The response of this instruction to asynchronous exceptions does not vary based on: The values of the data supplied in any of its registers. The values of the NZCV flags. 0 0 0 1 1 1 0 1 0 0 0 1 0 0 SADDW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb> if size == '11' then UNDEFINED; constant integer d = UInt(Rd); constant integer n = UInt(Rn); constant integer m = UInt(Rm); constant integer esize = 8 << UInt(size); constant integer datasize = 64; constant integer part = UInt(Q); constant integer elements = datasize DIV esize; 2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements, and is Q 2 0 [absent] 1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field. <Ta> Is an arrangement specifier, size <Ta> 00 8H 01 4S 10 2D 11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field. <Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field. <Tb> Is an arrangement specifier, size Q <Tb> 00 0 8B 00 1 16B 01 0 4H 01 1 8H 10 0 2S 10 1 4S 11 x RESERVED
CheckFPAdvSIMDEnabled64(); constant bits(2*datasize) operand1 = V[n, 2*datasize]; constant bits(datasize) operand2 = Vpart[m, part, datasize]; bits(2*datasize) result; integer element1; integer element2; integer sum; for e = 0 to elements-1 element1 = SInt(Elem[operand1, e, 2*esize]); element2 = SInt(Elem[operand2, e, esize]); sum = element1 + element2; Elem[result, e, 2*esize] = sum<2*esize-1:0>; V[d, 2*datasize] = result;