oFCVTNUfcvtnuFCVTNUZFloating-point convert to unsigned integer, rounding to nearest with ties to even (vector) FCVTNUHd, Hn FCVTNUVd, VnFCVTNUVd.T, Vn.TFCVTNUVd.T, Vn.T FCVTNUWd, Hn FCVTNUXd, Hn FCVTNUWd, Sn FCVTNUXd, Sn FCVTNUWd, Dn FCVTNUXd, DnUMLALBumlalbUMLALBMultiply the corresponding even-numbered unsigned elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.UMLALBZda.T, Zn.Tb, Zm.TbUMLALBZda.S, Zn.H, Zm.H[imm]UMLALBZda.D, Zn.S, Zm.S[imm]DCPS3dcps3DCPS3Debug change PE state to EL3 DCPS3 {#imm}CSDBcsdbCSDB'Consumption of speculative data barrierCSDBCADDcaddCADDAdd the real and imaginary components of the integral complex numbers from the first source vector to the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation, equivalent to multiplying the complex numbers in the second source vector by ±CADDZdn.T, Zdn.T, Zm.T, constSQXTNsqxtnSQXTN Signed saturating extract narrow SQXTNVbd, VanSQXTN{2} Vd.Tb, Vn.TaUSUBLusublUSUBLUnsigned subtract longUSUBL{2} Vd.Ta, Vn.Tb, Vm.TbUSHLLushllUSHLL$Unsigned shift left long (immediate)USHLL{2} Vd.Ta, Vn.Tb, #shiftWRFFRwrffrWRFFRJRead the source predicate register and place in the first-fault register ( WRFFRPn.BCPYPWTWNcpypwtwnCPYPWTWN1Memory copy, writes unprivileged and non-temporalCPYPWTWN [Xd]!, [Xs]!, Xn!CPYMWTWN [Xd]!, [Xs]!, Xn!CPYEWTWN [Xd]!, [Xs]!, Xn!SBFIZsbfizSBFIZ SBFIZ -- A64Signed bitfield insert in zerosSBFIZ Wd, Wn, #lsb, #width*SBFM Wd, Wn, #(-lsb MOD 32), #(width-1)SBFIZ Xd, Xn, #lsb, #width*SBFM Xd, Xn, #(-lsb MOD 64), #(width-1)SRSHLRsrshlrSRSHLRShift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Inactive elements in the destination vector register remain unmodified.SRSHLRZdn.T, Pg/M, Zdn.T, Zm.TCSETMcsetmCSETM CSETM -- A64Conditional set maskCSETM Wd, invcondCSINV Wd, WZR, WZR, condCSETM Xd, invcondCSINV Xd, XZR, XZR, condSTLLRstllrSTLLRStore LORelease registerSTLLRWt, [Xn|SP{, #0}]STLLRXt, [Xn|SP{, #0}]SQDECWsqdecwSQDECWkDetermines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQDECWXdn, Wdn{, pattern{, MUL #imm}} SQDECWXdn{, pattern{, MUL #imm}}"SQDECWZdn.S{, pattern{, MUL #imm}}LD1RSBld1rsbLD1RSBLoad a single signed byte from a memory address generated by a 64-bit scalar base address plus an immediate offset which is in the range 0 to 63.%LD1RSB{ Zt.H }, Pg/Z, [Xn|SP{, #imm}]%LD1RSB{ Zt.S }, Pg/Z, [Xn|SP{, #imm}]%LD1RSB{ Zt.D }, Pg/Z, [Xn|SP{, #imm}]UMOVumovUMOV8Unsigned move vector element to general-purpose registerUMOVWd, Vn.Ts[index]UMOVXd, Vn.D[index]FMLALBfmlalbFMLALBM8-bit floating-point multiply-add long to half-precision (vector, by element) FMLALBVd.8H, Vn.16B, Vm.B[index] FMLALTVd.8H, Vn.16B, Vm.B[index]FMLALBVd.8H, Vn.16B, Vm.16BFMLALTVd.8H, Vn.16B, Vm.16BFMLALBZda.H, Zn.B, Zm.BFMLALBZda.H, Zn.B, Zm.B[imm]FMLALBZda.S, Zn.H, Zm.HFMLALBZda.S, Zn.H, Zm.H[imm]MOVNmovnMOVNMove wide with NOTMOVNWd, #imm{, LSL #shift}MOVNXd, #imm{, LSL #shift}SQDECBsqdecbSQDECBjDetermines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQDECBXdn, Wdn{, pattern{, MUL #imm}} SQDECBXdn{, pattern{, MUL #imm}}FCVTXNTfcvtxntFCVTXNT?Convert active double-precision floating-point elements from the source vector to single-precision, rounding to Odd, and place the results in the odd-numbered 32-bit elements of the destination vector, leaving the even-numbered elements unchanged. Inactive elements in the destination vector register remain unmodified.FCVTXNTZd.S, Pg/M, Zn.DUQINCHuqinchUQINCH*Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQINCHWdn{, pattern{, MUL #imm}} UQINCHXdn{, pattern{, MUL #imm}}"UQINCHZdn.H{, pattern{, MUL #imm}}UXTBuxtbUXTBZero-extend the least-significant sub-element of each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.UXTBZd.T, Pg/M, Zn.TUXTHZd.T, Pg/M, Zn.TUXTWZd.D, Pg/M, Zn.D UXTB -- A64Unsigned extend byte UXTB Wd, WnUBFM Wd, Wn, #0, #7FADDQVfaddqvFADDQV(Floating-point addition of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as +0.0.FADDQVVd.T, Pg, Zn.TbLDNF1SWldnf1swLDNF1SWContiguous load with non-faulting behavior of signed words to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector..LDNF1SW{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]MATCHmatchMATCHThis instruction compares each active 8-bit or 16-bit character in the first source vector with all of the characters in the corresponding 128-bit segment of the second source vector. Where the first source element detects any matching characters in the second segment it places true in the corresponding element of the destination predicate, otherwise false. Inactive elements in the destination predicate register are set to zero. Sets the MATCHPd.T, Pg/Z, Zn.T, Zm.TSQRDMLSHsqrdmlshSQRDMLSHVSigned saturating rounding doubling multiply subtract returning high half (by element)SQRDMLSHVd, Vn, Vm.Ts[index] SQRDMLSHVd.T, Vn.T, Vm.Ts[index]SQRDMLSHVd, Vn, VmSQRDMLSHVd.T, Vn.T, Vm.TSQRDMLSHZda.T, Zn.T, Zm.TSQRDMLSHZda.H, Zn.H, Zm.H[imm]SQRDMLSHZda.S, Zn.S, Zm.S[imm]SQRDMLSHZda.D, Zn.D, Zm.D[imm]ORRorrORR(Bitwise inclusive OR (vector, immediate) ORRVd.T, #imm8{, LSL #amount}ORRVd.T, #imm8{, LSL #amount}ORRVd.T, Vn.T, Vm.TORRWd|WSP, Wn, #immORRXd|SP, Xn, #immORRWd, Wn, Wm{, shift #amount}ORRXd, Xn, Xm{, shift #amount}ORRPd.B, Pg/Z, Pn.B, Pm.BORRZdn.T, Pg/M, Zdn.T, Zm.TORRZdn.T, Zdn.T, #constORRZd.D, Zn.D, Zm.DLDTRSBldtrsbLDTRSB(Load register signed byte (unprivileged)LDTRSBWt, [Xn|SP{, #simm}]LDTRSBXt, [Xn|SP{, #simm}]UQADDuqaddUQADDUnsigned saturating addUQADDVd, Vn, VmUQADDVd.T, Vn.T, Vm.TUQADDZdn.T, Pg/M, Zdn.T, Zm.T UQADDZdn.T, Zdn.T, #imm{, shift}UQADDZd.T, Zn.T, Zm.TSQDMULLBsqdmullbSQDMULLBMultiply the corresponding even-numbered signed elements of the first and second source vectors, double and place the results in the overlapping double-width elements of the destination vector. Each result element is saturated to the double-width N-bit element's signed integer range -2SQDMULLBZd.T, Zn.Tb, Zm.TbSQDMULLBZd.S, Zn.H, Zm.H[imm]SQDMULLBZd.D, Zn.S, Zm.S[imm]CSNEGcsnegCSNEGConditional select negationCSNEGWd, Wn, Wm, condCSNEGXd, Xn, Xm, condADDaddADDAdd (extended register))ADDWd|WSP, Wn|WSP, Wm{, extend {#amount}}'ADDXd|SP, Xn|SP, Rm{, extend {#amount}} ADDWd|WSP, Wn|WSP, #imm{, shift}ADDXd|SP, Xn|SP, #imm{, shift}ADDWd, Wn, Wm{, shift #amount}ADDXd, Xn, Xm{, shift #amount}ADD Dd, Dn, DmADDVd.T, Vn.T, Vm.T-ADD{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T-ADD{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.TADDZdn.T, Pg/M, Zdn.T, Zm.TADDZdn.T, Zdn.T, #imm{, shift}ADDZd.T, Zn.T, Zm.T/ADD ZA.T[Wv, offs{, VGx2}], { Zm1.T-Zm2.T }/ADD ZA.T[Wv, offs{, VGx4}], { Zm1.T-Zm4.T }5ADD ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, Zm.T5ADD ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, Zm.T@ADD ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, { Zm1.T-Zm2.T }@ADD ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, { Zm1.T-Zm4.T }STGPstgpSTGP*Store Allocation Tag and pair of registersSTGPXt1, Xt2, [Xn|SP], #immSTGPXt1, Xt2, [Xn|SP, #imm]!STGPXt1, Xt2, [Xn|SP{, #imm}]STLXRBstlxrbSTLXRB%Store-release exclusive register byteSTLXRBWs, Wt, [Xn|SP{, #0}]LDSETBldsetbLDSETB0Atomic bit set on byte in memory, without return)STSETBWs, [Xn|SP]LDSETB Ws, WZR, [Xn|SP]+STSETLBWs, [Xn|SP]LDSETLB Ws, WZR, [Xn|SP]LD4Qld4qLD4Q3Contiguous load four-quadword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,ALD4Q{ Zt1.Q, Zt2.Q, Zt3.Q, Zt4.Q }, Pg/Z, [Xn|SP{, #imm, MUL VL}]=LD4Q{ Zt1.Q, Zt2.Q, Zt3.Q, Zt4.Q }, Pg/Z, [Xn|SP, Xm, LSL #4]SQSHRNTsqshrntSQSHRNT8Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's signed integer range -2SQSHRNTZd.T, Zn.Tb, #constLDUMINHlduminhLDUMINH=Atomic unsigned minimum on halfword in memory, without return+STUMINHWs, [Xn|SP]LDUMINH Ws, WZR, [Xn|SP]-STUMINLHWs, [Xn|SP]LDUMINLH Ws, WZR, [Xn|SP]TRN2trn2TRN2Transpose vectors (secondary)TRN2Vd.T, Vn.T, Vm.TURECPEurecpeURECPEUnsigned reciprocal estimateURECPEVd.T, Vn.TURECPEZd.S, Pg/M, Zn.SLDUMAXABldumaxabLDUMAXAB)Atomic unsigned maximum on byte in memoryLDUMAXABWs, Wt, [Xn|SP]LDUMAXALBWs, Wt, [Xn|SP]LDUMAXBWs, Wt, [Xn|SP]LDUMAXLBWs, Wt, [Xn|SP]ZIPQ2zipq2ZIPQ2Interleave alternating elements from high halves of the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated.ZIPQ2Zd.T, Zn.T, Zm.TFMLALLTBfmlalltbFMLALLTB This 8-bit floating-point multiply-add long-long instruction widens the third 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2FMLALLTBZda.S, Zn.B, Zm.BFMLALLTBZda.S, Zn.B, Zm.B[imm]MADDPTmaddptMADDPTMultiply-add checked pointerMADDPTXd, Xn, Xm, XaBRBbrbBRB BRB -- A64Branch record buffer BRB brb_opSYS #1, C7, C2, #op2{, Xt}LDRldrLDR,Load SIMD&FP register (immediate offset)&LDRBt, [Xn|SP], #simmLDRHt, [Xn|SP], #simmLDRSt, [Xn|SP], #simmLDRDt, [Xn|SP], #simmLDRQt, [Xn|SP], #simmLDRBt, [Xn|SP, #simm]!LDRHt, [Xn|SP, #simm]!LDRSt, [Xn|SP, #simm]!LDRDt, [Xn|SP, #simm]!LDRQt, [Xn|SP, #simm]!LDRBt, [Xn|SP{, #pimm}]LDRHt, [Xn|SP{, #pimm}]LDRSt, [Xn|SP{, #pimm}]LDRDt, [Xn|SP{, #pimm}]LDRQt, [Xn|SP{, #pimm}]LDRWt, [Xn|SP], #simmLDRXt, [Xn|SP], #simmLDRWt, [Xn|SP, #simm]!LDRXt, [Xn|SP, #simm]!LDRWt, [Xn|SP{, #pimm}]LDRXt, [Xn|SP{, #pimm}] LDRSt, label LDRDt, label LDRQt, label LDRWt, label LDRXt, labelLDRPt, [Xn|SP{, #imm, MUL VL}](LDRBt, [Xn|SP, (Wm|Xm), extend {amount}] LDRBt, [Xn|SP, Xm{, LSL amount}]*LDRHt, [Xn|SP, (Wm|Xm){, extend {amount}}]*LDRSt, [Xn|SP, (Wm|Xm){, extend {amount}}]*LDRDt, [Xn|SP, (Wm|Xm){, extend {amount}}]*LDRQt, [Xn|SP, (Wm|Xm){, extend {amount}}]*LDRWt, [Xn|SP, (Wm|Xm){, extend {amount}}]*LDRXt, [Xn|SP, (Wm|Xm){, extend {amount}}]LDRZt, [Xn|SP{, #imm, MUL VL}].LDR ZA[Wv, offs], [Xn|SP{, #offs, MUL VL}]LDR ZT0, [Xn|SP]LDFF1SBldff1sbLDFF1SBFGather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.%LDFF1SB{ Zt.S }, Pg/Z, [Zn.S{, #imm}]%LDFF1SB{ Zt.D }, Pg/Z, [Zn.D{, #imm}]$LDFF1SB{ Zt.H }, Pg/Z, [Xn|SP{, Xm}]$LDFF1SB{ Zt.S }, Pg/Z, [Xn|SP{, Xm}]$LDFF1SB{ Zt.D }, Pg/Z, [Xn|SP{, Xm}])LDFF1SB{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod])LDFF1SB{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod]$LDFF1SB{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]BRKAbrkaBRKARSets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.BRKAPd.B, Pg/ZM, Pn.B RETAASPPCR retaasppcr RETAASPPCRTReturn from subroutine, with enhanced pointer authentication return using a register RETAASPPCRXm RETABSPPCRXmSADALPsadalpSADALP'Signed add and accumulate long pairwiseSADALPVd.Ta, Vn.TbSADALPZda.T, Pg/M, Zn.TbTRN1trn1TRN1Transpose vectors (primary)TRN1Vd.T, Vn.T, Vm.TTRN1Pd.T, Pn.T, Pm.TTRN2Pd.T, Pn.T, Pm.TTRN1Zd.T, Zn.T, Zm.TTRN1Zd.Q, Zn.Q, Zm.QTRN2Zd.T, Zn.T, Zm.TTRN2Zd.Q, Zn.Q, Zm.QURHADDurhaddURHADDUnsigned rounding halving addURHADDVd.T, Vn.T, Vm.TURHADDZdn.T, Pg/M, Zdn.T, Zm.TUBFIZubfizUBFIZ UBFIZ -- A64!Unsigned bitfield insert in zerosUBFIZ Wd, Wn, #lsb, #width*UBFM Wd, Wn, #(-lsb MOD 32), #(width-1)UBFIZ Xd, Xn, #lsb, #width*UBFM Xd, Xn, #(-lsb MOD 64), #(width-1)IRGirgIRGInsert random tagIRGXd|SP, Xn|SP{, Xm}SQSUBsqsubSQSUBSigned saturating subtractSQSUBVd, Vn, VmSQSUBVd.T, Vn.T, Vm.TSQSUBZdn.T, Pg/M, Zdn.T, Zm.T SQSUBZdn.T, Zdn.T, #imm{, shift}SQSUBZd.T, Zn.T, Zm.TUABDLTuabdltUABDLT+Compute the absolute difference between the odd-numbered unsigned integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.UABDLTZd.T, Zn.Tb, Zm.TbNGCngcNGC NGC -- A64Negate with carry NGC Wd, WmSBC Wd, WZR, Wm NGC Xd, XmSBC Xd, XZR, XmAUTDBautdbAUTDB&Authenticate data address, using key BAUTDBXd, Xn|SPAUTDZBXdLDTRBldtrbLDTRB!Load register byte (unprivileged)LDTRBWt, [Xn|SP{, #simm}]STTRBsttrbSTTRB"Store register byte (unprivileged)STTRBWt, [Xn|SP{, #simm}]CPYFPTcpyfptCPYFPT7Memory copy forward-only, reads and writes unprivilegedCPYFPT [Xd]!, [Xs]!, Xn!CPYFMT [Xd]!, [Xs]!, Xn!CPYFET [Xd]!, [Xs]!, Xn!LDAXPldaxpLDAXP(Load-acquire exclusive pair of registersLDAXPWt1, Wt2, [Xn|SP{, #0}]LDAXPXt1, Xt2, [Xn|SP{, #0}]SQNEGsqnegSQNEGSigned saturating negate SQNEGVd, VnSQNEGVd.T, Vn.TSQNEGZd.T, Pg/M, Zn.TTBLQtblqTBLQ(For each 128-bit destination vector segment, reads each element of the corresponding second source (index) vector segment and uses its value to select an indexed element from the corresponding first source (table) vector segment. The indexed table element is placed in the element of the destination vector that corresponds to the index vector element. If an index value is greater than or equal to the number of elements in a 128-bit vector segment then it places zero in the corresponding destination vector element. This instruction is unpredicated.TBLQZd.T, { Zn.T }, Zm.TUZPuzpUZPConcatenate every fourth element from each of the four source vectors and place them in the corresponding elements of the four destination vectors.#UZP{ Zd1.T-Zd4.T }, { Zn1.T-Zn4.T }#UZP{ Zd1.Q-Zd4.Q }, { Zn1.Q-Zn4.Q }UZP{ Zd1.T-Zd2.T }, Zn.T, Zm.TUZP{ Zd1.Q-Zd2.Q }, Zn.Q, Zm.QGCSPOPCXgcspopcxGCSPOPCXGCSPOPCX -- A64=Guarded Control Stack pop and compare exception return recordGCSPOPCXSYS #0, C7, C7, #5{, Xt}BF1CVTbf1cvtBF1CVTConvert each 8-bit floating-point element of the source vector to BFloat16 while downscaling the value, and place the results in the corresponding 16-bit elements of the destination vectors. BF1CVT scales the values by 2BF1CVT{ Zd1.H-Zd2.H }, Zn.BBF2CVT{ Zd1.H-Zd2.H }, Zn.BBF1CVTZd.H, Zn.BBF2CVTZd.H, Zn.BFCVTNfcvtnFCVTN9Floating-point convert to lower precision narrow (vector)FCVTN{2} Vd.Tb, Vn.TaFCVTNVd.Ta, Vn.Tb, Vm.TbFCVTN{2} Vd.Ta, Vn.4S, Vm.4SFCVTNZd.B, { Zn1.H-Zn2.H }FCVTNZd.B, { Zn1.S-Zn4.S }FCVTNZd.H, { Zn1.S-Zn2.S }LDXPldxpLDXP Load exclusive pair of registersLDXPWt1, Wt2, [Xn|SP{, #0}]LDXPXt1, Xt2, [Xn|SP{, #0}]SQXTNBsqxtnbSQXTNBSaturate the signed integer value in each source element to half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.SQXTNBZd.T, Zn.TbUVDOTuvdotUVDOTThe unsigned integer vertical dot product instruction computes the vertical dot product of the corresponding two unsigned 16-bit integer values held in the two first source vectors and two unsigned 16-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product results are destructively added to the corresponding 32-bit element of the ZA single-vector groups.<UVDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<UVDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]<UVDOT ZA.D[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]EORSeorsEORS"Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the EORSPd.B, Pg/Z, Pn.B, Pm.BLDEORAHldeorahLDEORAH)Atomic exclusive-OR on halfword in memoryLDEORAHWs, Wt, [Xn|SP]LDEORALHWs, Wt, [Xn|SP]LDEORHWs, Wt, [Xn|SP]LDEORLHWs, Wt, [Xn|SP]UQRSHLuqrshlUQRSHL2Unsigned saturating rounding shift left (register)UQRSHLVd, Vn, VmUQRSHLVd.T, Vn.T, Vm.TUQRSHLZdn.T, Pg/M, Zdn.T, Zm.TSTTRsttrSTTRStore register (unprivileged)STTRWt, [Xn|SP{, #simm}]STTRXt, [Xn|SP{, #simm}]UQSUBuqsubUQSUBUnsigned saturating subtractUQSUBVd, Vn, VmUQSUBVd.T, Vn.T, Vm.TUQSUBZdn.T, Pg/M, Zdn.T, Zm.T UQSUBZdn.T, Zdn.T, #imm{, shift}UQSUBZd.T, Zn.T, Zm.TCMHIcmhiCMHI Compare unsigned higher (vector)CMHI Dd, Dn, DmCMHIVd.T, Vn.T, Vm.TSM3TT1Bsm3tt1bSM3TT1BSM3TT1BSM3TT1BVd.4S, Vn.4S, Vm.S[imm2]UABDLuabdlUABDL!Unsigned absolute difference longUABDL{2} Vd.Ta, Vn.Tb, Vm.TbSABDLBsabdlbSABDLB&Compute the absolute difference between even-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SABDLBZd.T, Zn.Tb, Zm.TbCPYFPWTNcpyfpwtnCPYFPWTNLMemory copy forward-only, writes unprivileged, reads and writes non-temporalCPYFPWTN [Xd]!, [Xs]!, Xn!CPYFMWTN [Xd]!, [Xs]!, Xn!CPYFEWTN [Xd]!, [Xs]!, Xn!FMLALLTTfmlallttFMLALLTT This 8-bit floating-point multiply-add long-long instruction widens the fourth 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2FMLALLTTZda.S, Zn.B, Zm.BFMLALLTTZda.S, Zn.B, Zm.B[imm]FAMAXfamaxFAMAXFloating-point absolute maximumFAMAXVd.T, Vn.T, Vm.TFAMAXVd.T, Vn.T, Vm.T:FAMAX{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }:FAMAX{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }FAMAXZdn.T, Pg/M, Zdn.T, Zm.TMLSmlsMLS7Multiply-subtract from accumulator (vector, by element)MLSVd.T, Vn.T, Vm.Ts[index]MLSVd.T, Vn.T, Vm.TMLSZda.T, Pg/M, Zn.T, Zm.TMLSZda.H, Zn.H, Zm.H[imm]MLSZda.S, Zn.S, Zm.S[imm]MLSZda.D, Zn.D, Zm.D[imm]BFMAXNMbfmaxnmBFMAXNMDetermine the maximum number value of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.1BFMAXNM{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, Zm.H1BFMAXNM{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, Zm.H<BFMAXNM{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, { Zm1.H-Zm2.H }<BFMAXNM{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, { Zm1.H-Zm4.H }BFMAXNMZdn.H, Pg/M, Zdn.H, Zm.HCNTcntCNT Count bits CNTWd, Wn CNTXd, Xn CNTVd.T, Vn.TCNTZd.T, Pg/M, Zn.TFCLAMPfclampFCLAMP|Clamp each floating-point element in the two or four destination vectors to between the floating-point minimum value in the corresponding element of the first source vector and the floating-point maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.!FCLAMP{ Zd1.T-Zd2.T }, Zn.T, Zm.T!FCLAMP{ Zd1.T-Zd4.T }, Zn.T, Zm.TFCLAMPZd.T, Zn.T, Zm.TSTPstpSTP#Store pair of SIMD&FP registersSTPSt1, St2, [Xn|SP], #immSTPDt1, Dt2, [Xn|SP], #immSTPQt1, Qt2, [Xn|SP], #immSTPSt1, St2, [Xn|SP, #imm]!STPDt1, Dt2, [Xn|SP, #imm]!STPQt1, Qt2, [Xn|SP, #imm]!STPSt1, St2, [Xn|SP{, #imm}]STPDt1, Dt2, [Xn|SP{, #imm}]STPQt1, Qt2, [Xn|SP{, #imm}]STPWt1, Wt2, [Xn|SP], #immSTPXt1, Xt2, [Xn|SP], #immSTPWt1, Wt2, [Xn|SP, #imm]!STPXt1, Xt2, [Xn|SP, #imm]!STPWt1, Wt2, [Xn|SP{, #imm}]STPXt1, Xt2, [Xn|SP{, #imm}]STNPstnpSTNP;Store pair of SIMD&FP registers, with non-temporal hintSTNPSt1, St2, [Xn|SP{, #imm}]STNPDt1, Dt2, [Xn|SP{, #imm}]STNPQt1, Qt2, [Xn|SP{, #imm}]STNPWt1, Wt2, [Xn|SP{, #imm}]STNPXt1, Xt2, [Xn|SP{, #imm}]SSUBWssubwSSUBWSigned subtract wideSSUBW{2} Vd.Ta, Vn.Ta, Vm.TbWFEwfeWFEWait for eventWFESMINQVsminqvSMINQV%Signed minimum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as the maximum signed integer for the element size.SMINQVVd.T, Pg, Zn.TbST4Wst4wST4W0Contiguous store four-word structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,?ST4W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, Pg, [Xn|SP{, #imm, MUL VL}];ST4W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, Pg, [Xn|SP, Xm, LSL #2]CMPcmpCMPCMP (extended register) -- A64Compare (extended register)#CMP Wn|WSP, Wm{, extend {#amount}})SUBS WZR, Wn|WSP, Wm{, extend {#amount}}"CMP Xn|SP, Rm{, extend {#amount}}(SUBS XZR, Xn|SP, Rm{, extend {#amount}}CMP (immediate) -- A64Compare (immediate)CMP Wn|WSP, #imm{, shift} SUBS WZR, Wn|WSP, #imm{, shift}CMP Xn|SP, #imm{, shift}SUBS XZR, Xn|SP, #imm{, shift}CMP (shifted register) -- A64Compare (shifted register)CMP Wn, Wm{, shift #amount}"SUBS WZR, Wn, Wm{, shift #amount}CMP Xn, Xm{, shift #amount}"SUBS XZR, Xn, Xm{, shift #amount}ADDHNBaddhnbADDHNB.Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.ADDHNBZd.T, Zn.Tb, Zm.TbFMINfminFMINFloating-point minimum (vector) FMINVd.T, Vn.T, Vm.TFMINVd.T, Vn.T, Vm.TFMINHd, Hn, HmFMINSd, Sn, SmFMINDd, Dn, Dm.FMIN{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T.FMIN{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T9FMIN{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }9FMIN{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }FMINZdn.T, Pg/M, Zdn.T, constFMINZdn.T, Pg/M, Zdn.T, Zm.TUQDECBuqdecbUQDECB)Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQDECBWdn{, pattern{, MUL #imm}} UQDECBXdn{, pattern{, MUL #imm}}UADDLBuaddlbUADDLBAdd the corresponding even-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.UADDLBZd.T, Zn.Tb, Zm.TbSHRNBshrnbSHRNBcShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.SHRNBZd.T, Zn.Tb, #constCMTSTcmtstCMTST*Compare bitwise test bits nonzero (vector)CMTST Dd, Dn, DmCMTSTVd.T, Vn.T, Vm.TFTSSELftsselFTSSELThe FTSSELZd.T, Zn.T, Zm.TLD1RDld1rdLD1RDLoad a single doubleword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 8 in the range 0 to 504.$LD1RD{ Zt.D }, Pg/Z, [Xn|SP{, #imm}]BFMbfmBFM Bitfield moveBFMWd, Wn, #immr, #immsBFMXd, Xn, #immr, #immsCLSclsCLS Count leading sign bits (vector) CLSVd.T, Vn.T CLSWd, Wn CLSXd, XnCLSZd.T, Pg/M, Zn.TLDEORBldeorbLDEORB5Atomic exclusive-OR on byte in memory, without return)STEORBWs, [Xn|SP]LDEORB Ws, WZR, [Xn|SP]+STEORLBWs, [Xn|SP]LDEORLB Ws, WZR, [Xn|SP]SQSHRNBsqshrnbSQSHRNB<Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's signed integer range -2SQSHRNBZd.T, Zn.Tb, #constSM3TT2Bsm3tt2bSM3TT2BSM3TT2BSM3TT2BVd.4S, Vn.4S, Vm.S[imm2]MOVAZmovazMOVAZThe instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size. The tile slices are zeroed after moving their contents to the destination vectors..MOVAZ{ Zd1.B-Zd2.B }, ZA0HV.B[Ws, offs1:offs2].MOVAZ{ Zd1.H-Zd2.H }, ZAnHV.H[Ws, offs1:offs2].MOVAZ{ Zd1.S-Zd2.S }, ZAnHV.S[Ws, offs1:offs2].MOVAZ{ Zd1.D-Zd2.D }, ZAnHV.D[Ws, offs1:offs2].MOVAZ{ Zd1.B-Zd4.B }, ZA0HV.B[Ws, offs1:offs4].MOVAZ{ Zd1.H-Zd4.H }, ZAnHV.H[Ws, offs1:offs4].MOVAZ{ Zd1.S-Zd4.S }, ZAnHV.S[Ws, offs1:offs4].MOVAZ{ Zd1.D-Zd4.D }, ZAnHV.D[Ws, offs1:offs4],MOVAZ{ Zd1.D-Zd2.D }, ZA.D[Wv, offs{, VGx2}],MOVAZ{ Zd1.D-Zd4.D }, ZA.D[Wv, offs{, VGx4}]MOVAZZd.B, ZA0HV.B[Ws, offs]MOVAZZd.H, ZAnHV.H[Ws, offs]MOVAZZd.S, ZAnHV.S[Ws, offs]MOVAZZd.D, ZAnHV.D[Ws, offs]MOVAZZd.Q, ZAnHV.Q[Ws, offs]UMULLTumulltUMULLTMultiply the corresponding odd-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.UMULLTZd.T, Zn.Tb, Zm.TbUMULLTZd.S, Zn.H, Zm.H[imm]UMULLTZd.D, Zn.S, Zm.S[imm]LDADDHldaddhLDADDH0Atomic add on halfword in memory, without return)STADDHWs, [Xn|SP]LDADDH Ws, WZR, [Xn|SP]+STADDLHWs, [Xn|SP]LDADDLH Ws, WZR, [Xn|SP]STNT1Hstnt1hSTNT1H Contiguous store non-temporal of halfwords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 3STNT1H{ Zt1.H-Zt2.H }, PNg, [Xn|SP{, #imm, MUL VL}]3STNT1H{ Zt1.H-Zt4.H }, PNg, [Xn|SP{, #imm, MUL VL}]/STNT1H{ Zt1.H-Zt2.H }, PNg, [Xn|SP, Xm, LSL #1]/STNT1H{ Zt1.H-Zt4.H }, PNg, [Xn|SP, Xm, LSL #1]4STNT1H{ Zt1.H, Zt2.H }, PNg, [Xn|SP{, #imm, MUL VL}]BSTNT1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg, [Xn|SP{, #imm, MUL VL}]0STNT1H{ Zt1.H, Zt2.H }, PNg, [Xn|SP, Xm, LSL #1]>STNT1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg, [Xn|SP, Xm, LSL #1] STNT1H{ Zt.S }, Pg, [Zn.S{, Xm}] STNT1H{ Zt.D }, Pg, [Zn.D{, Xm}]+STNT1H{ Zt.H }, Pg, [Xn|SP{, #imm, MUL VL}]'STNT1H{ Zt.H }, Pg, [Xn|SP, Xm, LSL #1]SETGPNsetgpnSETGPN)Memory set with tag setting, non-temporalSETGPN [Xd]!, Xn!, XsSETGMN [Xd]!, Xn!, XsSETGEN [Xd]!, Xn!, XsFCVTMUfcvtmuFCVTMUSFloating-point convert to unsigned integer, rounding toward minus infinity (vector) FCVTMUHd, Hn FCVTMUVd, VnFCVTMUVd.T, Vn.TFCVTMUVd.T, Vn.T FCVTMUWd, Hn FCVTMUXd, Hn FCVTMUWd, Sn FCVTMUXd, Sn FCVTMUWd, Dn FCVTMUXd, DnABSabsABSAbsolute value ABSWd, Wn ABSXd, Xn ABS Dd, Dn ABSVd.T, Vn.TABSZd.T, Pg/M, Zn.TSMULHsmulhSMULHSigned multiply highSMULHXd, Xn, XmSMULHZdn.T, Pg/M, Zdn.T, Zm.TSMULHZd.T, Zn.T, Zm.TSSUBWTssubwtSSUBWT Subtract the even-numbered signed elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.SSUBWTZd.T, Zn.T, Zm.TbUQRSHLRuqrshlrUQRSHLRShift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2UQRSHLRZdn.T, Pg/M, Zdn.T, Zm.TWFITwfitWFITWait for interrupt with timeoutWFITXtZEROzeroZERO;The instruction zeroes two or four ZA single-vector groups. ZERO ZA.D[Wv, offs, VGx2]ZERO ZA.D[Wv, offs, VGx4]ZERO ZA.D[Wv, offs1:offs2]#ZERO ZA.D[Wv, offs1:offs2, VGx2]#ZERO ZA.D[Wv, offs1:offs2, VGx4]ZERO ZA.D[Wv, offs1:offs4]#ZERO ZA.D[Wv, offs1:offs4, VGx2]#ZERO ZA.D[Wv, offs1:offs4, VGx4] ZERO{ mask } ZERO{ ZT0 }SETPTNsetptnSETPTN)Memory set, unprivileged and non-temporalSETPTN [Xd]!, Xn!, XsSETMTN [Xd]!, Xn!, XsSETETN [Xd]!, Xn!, XsUMINPuminpUMINPUnsigned minimum pairwiseUMINPVd.T, Vn.T, Vm.TUMINPZdn.T, Pg/M, Zdn.T, Zm.TUMLSLLumlsllUMLSLLThis unsigned integer multiply-subtract long-long instruction multiplies each unsigned 8-bit or 16-bit element in the one, two, or four first source vectors with each unsigned 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively subtracts these values from the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups. 0UMLSLL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B[index]0UMLSLL ZA.D[Wv, offs1:offs4], Zn.H, Zm.H[index]CUMLSLL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CUMLSLL ZA.D[Wv, offs1:offs4{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CUMLSLL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]CUMLSLL ZA.D[Wv, offs1:offs4{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]+UMLSLL ZA.T[Wv, offs1:offs4], Zn.Tb, Zm.Tb?UMLSLL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, Zm.Tb?UMLSLL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, Zm.TbKUMLSLL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, { Zm1.Tb-Zm2.Tb }KUMLSLL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, { Zm1.Tb-Zm4.Tb }LDAXRldaxrLDAXRLoad-acquire exclusive registerLDAXRWt, [Xn|SP{, #0}]LDAXRXt, [Xn|SP{, #0}]SWPAHswpahSWPAHSwap halfword in memorySWPAHWs, Wt, [Xn|SP]SWPALHWs, Wt, [Xn|SP]SWPHWs, Wt, [Xn|SP]SWPLHWs, Wt, [Xn|SP]EORTBeortbEORTB?Interleaving exclusive OR between the odd-numbered elements of the first source vector register and the even-numbered elements of the second source vector register, placing the result in the odd-numbered elements of the destination vector, leaving the even-numbered elements unchanged. This instruction is unpredicated.EORTBZd.T, Zn.T, Zm.TSABDLTsabdltSABDLT!Compute the absolute difference between odd-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in overlapping double-width elements of the destination vector. This instruction is unpredicated.SABDLTZd.T, Zn.Tb, Zm.TbEXTQextqEXTQFor each 128-bit vector segment of the result, copy the indexed byte up to and including the last byte of the corresponding first source vector segment to the bottom of the result segment, then fill the remainder of the result segment starting from the first byte of the corresponding second source vector segment. The result segments are destructively placed in the corresponding first source vector segment. This instruction is unpredicated.EXTQZdn.B, Zdn.B, Zm.B, #immSETGPsetgpSETGPMemory set with tag settingSETGP [Xd]!, Xn!, XsSETGM [Xd]!, Xn!, XsSETGE [Xd]!, Xn!, XsSHLLshllSHLL!Shift left long (by element size)SHLL{2} Vd.Ta, Vn.Tb, #shiftSMAXPsmaxpSMAXPSigned maximum pairwiseSMAXPVd.T, Vn.T, Vm.TSMAXPZdn.T, Pg/M, Zdn.T, Zm.TUCLAMPuclampUCLAMPjClamp each unsigned element in the two or four destination vectors to between the unsigned minimum value in the corresponding element of the first source vector and the unsigned maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.!UCLAMP{ Zd1.T-Zd2.T }, Zn.T, Zm.T!UCLAMP{ Zd1.T-Zd4.T }, Zn.T, Zm.TUCLAMPZd.T, Zn.T, Zm.TCMPLTcmpltCMPLTCMPLT (vectors)KCompare active signed integer elements in the first source vector being less than corresponding signed elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the CMPLT Pd.T, Pg/Z, Zm.T, Zn.TCMPGT Pd.T, Pg/Z, Zn.T, Zm.TCLRBHBclrbhbCLRBHBClear branch historyCLRBHBDECBdecbDECBDetermines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination.DECBXdn{, pattern{, MUL #imm}}DECDXdn{, pattern{, MUL #imm}}DECHXdn{, pattern{, MUL #imm}}DECWXdn{, pattern{, MUL #imm}}MOVZmovzMOVZMove wide with zeroMOVZWd, #imm{, LSL #shift}MOVZXd, #imm{, LSL #shift}LDURBldurbLDURBLoad register byte (unscaled)LDURBWt, [Xn|SP{, #simm}]CPYFPWNcpyfpwnCPYFPWN-Memory copy forward-only, writes non-temporalCPYFPWN [Xd]!, [Xs]!, Xn!CPYFMWN [Xd]!, [Xs]!, Xn!CPYFEWN [Xd]!, [Xs]!, Xn!BITbitBITBitwise insert if trueBITVd.T, Vn.T, Vm.TUQCVTNuqcvtnUQCVTNSaturate the unsigned integer value in each element of the group of two source vectors to half the original source element width, and place the two-way interleaved results in the half-width destination elements.UQCVTNZd.H, { Zn1.S-Zn2.S }UQCVTNZd.T, { Zn1.Tb-Zn4.Tb }MRRSmrrsMRRS>Move System register to two adjacent general-purpose registers,MRRSXt, Xt+1, (systemreg|Sop0_op1_Cn_Cm_op2)STXRBstxrbSTXRBStore exclusive register byteSTXRBWs, Wt, [Xn|SP{, #0}]LDTRHldtrhLDTRH%Load register halfword (unprivileged)LDTRHWt, [Xn|SP{, #simm}]CPYPWNcpypwnCPYPWN Memory copy, writes non-temporalCPYPWN [Xd]!, [Xs]!, Xn!CPYMWN [Xd]!, [Xs]!, Xn!CPYEWN [Xd]!, [Xs]!, Xn!SMLALTsmlaltSMLALTMultiply the corresponding odd-numbered signed elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.SMLALTZda.T, Zn.Tb, Zm.TbSMLALTZda.S, Zn.H, Zm.H[imm]SMLALTZda.D, Zn.S, Zm.S[imm]UADDLPuaddlpUADDLPUnsigned add long pairwiseUADDLPVd.Ta, Vn.TbUMLSLumlslUMLSL4Unsigned multiply-subtract long (vector, by element) $UMLSL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]UMLSL{2} Vd.Ta, Vn.Tb, Vm.Tb0UMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CUMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CUMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])UMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<UMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<UMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGUMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GUMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }SUVDOTsuvdotSUVDOTThe signed by unsigned integer vertical dot product instruction computes the vertical dot product of the corresponding signed 8-bit elements from the four first source vectors and four unsigned 8-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product result is destructively added to the corresponding 32-bit element of the ZA single-vector groups.<SUVDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]UDFudfUDFPermanently undefined UDF #immSTGstgSTGStore Allocation TagSTGXt|SP, [Xn|SP], #simmSTGXt|SP, [Xn|SP, #simm]!STGXt|SP, [Xn|SP{, #simm}]BFMLALTbfmlaltBFMLALTThis BFloat16 floating-point multiply-add long instruction widens the odd-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLALTZda.S, Zn.H, Zm.HBFMLALTZda.S, Zn.H, Zm.H[imm]UQINCDuqincdUQINCD*Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQINCDWdn{, pattern{, MUL #imm}} UQINCDXdn{, pattern{, MUL #imm}}"UQINCDZdn.D{, pattern{, MUL #imm}}DSBdsbDSBData synchronization barrierDSB (option|#imm) DSBoptionnXSSSUBLBTssublbtSSUBLBTSubtract the odd-numbered signed elements of the second source vector from the even-numbered signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SSUBLBTZd.T, Zn.Tb, Zm.TbBLRAAZblraazBLRAAZ9Branch with link to register, with pointer authenticationBLRAAZXnBLRAAXn, Xm|SPBLRABZXnBLRABXn, Xm|SPBMOPAbmopaBMOPAwThis instruction works with 32-bit element ZA tile. This instruction generates an outer product of the first source SVL#BMOPAZAda.S, Pn/M, Pm/M, Zn.S, Zm.SDRPSdrpsDRPSDebug restore PE stateDRPSLDNF1Hldnf1hLDNF1HContiguous load with non-faulting behavior of unsigned halfwords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.-LDNF1H{ Zt.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}]-LDNF1H{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]-LDNF1H{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]SQDMLSLBsqdmlslbSQDMLSLBMultiply then double the corresponding even-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2SQDMLSLBZda.T, Zn.Tb, Zm.TbSQDMLSLBZda.S, Zn.H, Zm.H[imm]SQDMLSLBZda.D, Zn.S, Zm.S[imm]SQRDMLAHsqrdmlahSQRDMLAHXSigned saturating rounding doubling multiply accumulate returning high half (by element)SQRDMLAHVd, Vn, Vm.Ts[index] SQRDMLAHVd.T, Vn.T, Vm.Ts[index]SQRDMLAHVd, Vn, VmSQRDMLAHVd.T, Vn.T, Vm.TSQRDMLAHZda.T, Zn.T, Zm.TSQRDMLAHZda.H, Zn.H, Zm.H[imm]SQRDMLAHZda.S, Zn.S, Zm.S[imm]SQRDMLAHZda.D, Zn.D, Zm.D[imm]CBNZcbnzCBNZCompare and branch on nonzero CBNZWt, label CBNZXt, labelSRSHRsrshrSRSHR'Signed rounding shift right (immediate)SRSHR Dd, Dn, #shiftSRSHRVd.T, Vn.T, #shiftSRSHRZdn.T, Pg/M, Zdn.T, #constBFMLAbfmlaBFMLANMultiply the corresponding active BFloat16 elements of the first and second source vectors and add to elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.BFMLAZda.H, Pg/M, Zn.H, Zm.HBFMLAZda.H, Zn.H, Zm.H[imm]<BFMLA ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<BFMLA ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]5BFMLA ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5BFMLA ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@BFMLA ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@BFMLA ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }SMULLBsmullbSMULLBMultiply the corresponding even-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SMULLBZd.T, Zn.Tb, Zm.TbSMULLBZd.S, Zn.H, Zm.H[imm]SMULLBZd.D, Zn.S, Zm.S[imm]BCAXbcaxBCAXBit clear and exclusive-OR"BCAXVd.16B, Vn.16B, Vm.16B, Va.16BBCAXZdn.D, Zdn.D, Zm.D, Zk.DURSQRTEursqrteURSQRTE(Unsigned reciprocal square root estimateURSQRTEVd.T, Vn.TURSQRTEZd.S, Pg/M, Zn.SPMOVpmovPMOVCopy a packed bitmap, where bit value 0b1 represents TRUE and bit value 0b0 represents FALSE, from a portion of the source vector register to elements of the destination SVE predicate register. PMOVPd.B, ZnPMOVPd.D, Zn{[imm]}PMOVPd.H, Zn{[imm]}PMOVPd.S, Zn{[imm]} PMOVZd, Pn.BPMOVZd{[imm]}, Pn.DPMOVZd{[imm]}, Pn.HPMOVZd{[imm]}, Pn.SST4Qst4qST4Q4Contiguous store four-quadword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,?ST4Q{ Zt1.Q, Zt2.Q, Zt3.Q, Zt4.Q }, Pg, [Xn|SP{, #imm, MUL VL}];ST4Q{ Zt1.Q, Zt2.Q, Zt3.Q, Zt4.Q }, Pg, [Xn|SP, Xm, LSL #4]CPYFPTNcpyfptnCPYFPTNHMemory copy forward-only, reads and writes unprivileged and non-temporalCPYFPTN [Xd]!, [Xs]!, Xn!CPYFMTN [Xd]!, [Xs]!, Xn!CPYFETN [Xd]!, [Xs]!, Xn! CPYFPWTRN cpyfpwtrn CPYFPWTRNAMemory copy forward-only, writes unprivileged, reads non-temporalCPYFPWTRN [Xd]!, [Xs]!, Xn!CPYFMWTRN [Xd]!, [Xs]!, Xn!CPYFEWTRN [Xd]!, [Xs]!, Xn!ATatAT AT -- A64Address translate AT at_op, XtSYS #op1, C7, Cm, #op2, XtRSUBHNBrsubhnbRSUBHNBTSubtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant rounded half of the result in the even-numbered half-width destination elements, while setting the odd-numbered half-width destination elements to zero. This instruction is unpredicated.RSUBHNBZd.T, Zn.Tb, Zm.TbBRAAZbraazBRAAZ/Branch to register, with pointer authenticationBRAAZXn BRAAXn, Xm|SPBRABZXn BRABXn, Xm|SPCSINCcsincCSINCConditional select incrementCSINCWd, Wn, Wm, condCSINCXd, Xn, Xm, condLD2Hld2hLD2H1Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,3LD2H{ Zt1.H, Zt2.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}]/LD2H{ Zt1.H, Zt2.H }, Pg/Z, [Xn|SP, Xm, LSL #1]TBXtbxTBXTable vector lookup extensionTBXVd.Ta, { Vn.16B }, Vm.Ta%TBXVd.Ta, { Vn.16B, Vn+1.16B }, Vm.Ta/TBXVd.Ta, { Vn.16B, Vn+1.16B, Vn+2.16B }, Vm.Ta9TBXVd.Ta, { Vn.16B, Vn+1.16B, Vn+2.16B, Vn+3.16B }, Vm.TaTBXZd.T, Zn.T, Zm.TCPYPNcpypnCPYPN*Memory copy, reads and writes non-temporalCPYPN [Xd]!, [Xs]!, Xn!CPYMN [Xd]!, [Xs]!, Xn!CPYEN [Xd]!, [Xs]!, Xn!PMULLTpmulltPMULLTPolynomial multiply over [0, 1] the corresponding odd-numbered elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.PMULLTZd.T, Zn.Tb, Zm.TbPMULLTZd.Q, Zn.D, Zm.DCLASTBclastbCLASTB\From the source vector register extract the last active element, and then zero-extend that element to destructively place in the destination and first source general-purpose register. If there are no active elements then destructively zero-extend the least significant element-size bits of the destination and first source general-purpose register.CLASTBRdn, Pg, Rdn, Zm.TCLASTBVdn, Pg, Vdn, Zm.TCLASTBZdn.T, Pg, Zdn.T, Zm.TPSELpselPSEL{If the indexed element of the second source predicate is true, place the contents of the first source predicate register into the destination predicate register, otherwise set the destination predicate to all-false. The indexed element is determined by the sum of a general-purpose index register and an immediate, modulo the number of elements. Does not set the condition flags.PSELPd, Pn, Pm.T[Wv, imm]TBZtbzTBZTest bit and branch if zeroTBZRt, #imm, labelTSTtstTSTTST (immediate) -- A64Test bits (immediate) TST Wn, #immANDS WZR, Wn, #imm TST Xn, #immANDS XZR, Xn, #immTST (shifted register) -- A64Test (shifted register)TST Wn, Wm{, shift #amount}"ANDS WZR, Wn, Wm{, shift #amount}TST Xn, Xm{, shift #amount}"ANDS XZR, Xn, Xm{, shift #amount}SABALTsabaltSABALTCompute the absolute difference between odd-numbered signed elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.SABALTZda.T, Zn.Tb, Zm.TbRCWCASPrcwcaspRCWCASP4Read check write compare and swap quadword in memory&RCWCASPXs, X(s+1), Xt, X(t+1), [Xn|SP]'RCWCASPAXs, X(s+1), Xt, X(t+1), [Xn|SP](RCWCASPALXs, X(s+1), Xt, X(t+1), [Xn|SP]'RCWCASPLXs, X(s+1), Xt, X(t+1), [Xn|SP]LDCLRPldclrpLDCLRP&Atomic bit clear on quadword in memoryLDCLRPXt1, Xt2, [Xn|SP]LDCLRPAXt1, Xt2, [Xn|SP]LDCLRPALXt1, Xt2, [Xn|SP]LDCLRPLXt1, Xt2, [Xn|SP]SWPswpSWP!Swap word or doubleword in memorySWPWs, Wt, [Xn|SP]SWPAWs, Wt, [Xn|SP]SWPALWs, Wt, [Xn|SP]SWPLWs, Wt, [Xn|SP]SWPXs, Xt, [Xn|SP]SWPAXs, Xt, [Xn|SP]SWPALXs, Xt, [Xn|SP]SWPLXs, Xt, [Xn|SP]LD1Wld1wLD1WContiguous load of unsigned words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.3LD1W{ Zt1.S-Zt2.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]3LD1W{ Zt1.S-Zt4.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]/LD1W{ Zt1.S-Zt2.S }, PNg/Z, [Xn|SP, Xm, LSL #2]/LD1W{ Zt1.S-Zt4.S }, PNg/Z, [Xn|SP, Xm, LSL #2]4LD1W{ Zt1.S, Zt2.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]BLD1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]0LD1W{ Zt1.S, Zt2.S }, PNg/Z, [Xn|SP, Xm, LSL #2]>LD1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg/Z, [Xn|SP, Xm, LSL #2]"LD1W{ Zt.S }, Pg/Z, [Zn.S{, #imm}]"LD1W{ Zt.D }, Pg/Z, [Zn.D{, #imm}]+LD1W{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1W{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1W{ Zt.Q }, Pg/Z, [Xn|SP{, #imm, MUL VL}]'LD1W{ Zt.S }, Pg/Z, [Xn|SP, Xm, LSL #2]'LD1W{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #2]'LD1W{ Zt.Q }, Pg/Z, [Xn|SP, Xm, LSL #2])LD1W{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod #2])LD1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #2]&LD1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod]&LD1W{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod])LD1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #2]!LD1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]6LD1W{ ZAtHV.S[Ws, offs] }, Pg/Z, [Xn|SP{, Xm, LSL #2}]LD1RQWld1rqwLD1RQWLoad four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address.%LD1RQW{ Zt.S }, Pg/Z, [Xn|SP{, #imm}])LD1RQW{ Zt.S }, Pg/Z, [Xn|SP, Xm, LSL #2]PSBpsbPSB!Profiling synchronization barrier PSB CSYNCSTLXPstlxpSTLXP)Store-release exclusive pair of registers STLXPWs, Wt1, Wt2, [Xn|SP{, #0}] STLXPWs, Xt1, Xt2, [Xn|SP{, #0}]LDSMINHldsminhLDSMINH;Atomic signed minimum on halfword in memory, without return+STSMINHWs, [Xn|SP]LDSMINH Ws, WZR, [Xn|SP]-STSMINLHWs, [Xn|SP]LDSMINLH Ws, WZR, [Xn|SP]MSRRmsrrMSRR>Move two adjacent general-purpose registers to System register.MSRR (systemreg|Sop0_op1_Cn_Cm_op2), Xt, Xt+1SRSHLsrshlSRSHL%Signed rounding shift left (register)SRSHL Dd, Dn, DmSRSHLVd.T, Vn.T, Vm.T/SRSHL{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T/SRSHL{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T:SRSHL{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }:SRSHL{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }SRSHLZdn.T, Pg/M, Zdn.T, Zm.TLDARHldarhLDARHLoad-acquire register halfwordLDARHWt, [Xn|SP{, #0}]FCMGTfcmgtFCMGT,Floating-point compare greater than (vector)FCMGTHd, Hn, HmFCMGTVd, Vn, VmFCMGTVd.T, Vn.T, Vm.TFCMGTVd.T, Vn.T, Vm.TFCMGTHd, Hn, #0.0FCMGTVd, Vn, #0.0FCMGTVd.T, Vn.T, #0.0FCMGTVd.T, Vn.T, #0.0CPYPWTRNcpypwtrnCPYPWTRN4Memory copy, writes unprivileged, reads non-temporalCPYPWTRN [Xd]!, [Xs]!, Xn!CPYMWTRN [Xd]!, [Xs]!, Xn!CPYEWTRN [Xd]!, [Xs]!, Xn!CLASTAclastaCLASTAFrom the source vector register extract the element after the last active element, or if the last active element is the final element extract element zero, and then zero-extend that element to destructively place in the destination and first source general-purpose register. If there are no active elements then destructively zero-extend the least significant element-size bits of the destination and first source general-purpose register.CLASTARdn, Pg, Rdn, Zm.TCLASTAVdn, Pg, Vdn, Zm.TCLASTAZdn.T, Pg, Zdn.T, Zm.TLDGMldgmLDGMLoad tag multipleLDGMXt, [Xn|SP]FJCVTZSfjcvtzsFJCVTZSMFloating-point Javascript convert to signed fixed-point, rounding toward zero FJCVTZSWd, DnMOVImoviMOVIMove immediate (vector)MOVIVd.T, #imm8{, LSL #0}MOVIVd.T, #imm8{, LSL #amount}MOVIVd.T, #imm8{, LSL #amount}MOVIVd.T, #imm8, MSL #amount MOVIDd, #immMOVIVd.2D, #immUSMOPSusmopsUSMOPS>The 8-bit integer variant works with a 32-bit element ZA tile.$USMOPSZAda.S, Pn/M, Pm/M, Zn.B, Zm.B$USMOPSZAda.D, Pn/M, Pm/M, Zn.H, Zm.HUQSHRNuqshrnUQSHRN2Unsigned saturating shift right narrow (immediate)UQSHRNVbd, Van, #shiftUQSHRN{2} Vd.Tb, Vn.Ta, #shiftLD3Bld3bLD3B1Contiguous load three-byte structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,:LD3B{ Zt1.B, Zt2.B, Zt3.B }, Pg/Z, [Xn|SP{, #imm, MUL VL}].LD3B{ Zt1.B, Zt2.B, Zt3.B }, Pg/Z, [Xn|SP, Xm]FRINTNfrintnFRINTNGFloating-point round to integral, to nearest with ties to even (vector)FRINTNVd.T, Vn.TFRINTNVd.T, Vn.T FRINTNHd, Hn FRINTNSd, Sn FRINTNDd, Dn&FRINTN{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }&FRINTN{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }UADDVuaddvUADDVUnsigned add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register. Narrow elements are first zero-extended to 64 bits. Inactive elements in the source vector are treated as zero.UADDVDd, Pg, Zn.TFNEGfnegFNEGFloating-point negate (vector)FNEGVd.T, Vn.TFNEGVd.T, Vn.T FNEGHd, Hn FNEGSd, Sn FNEGDd, DnFNEGZd.T, Pg/M, Zn.TST3Qst3qST3Q6Contiguous store three-quadword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,8ST3Q{ Zt1.Q, Zt2.Q, Zt3.Q }, Pg, [Xn|SP{, #imm, MUL VL}]4ST3Q{ Zt1.Q, Zt2.Q, Zt3.Q }, Pg, [Xn|SP, Xm, LSL #4]SADDVsaddvSADDVSigned add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register. Narrow elements are first sign-extended to 64 bits. Inactive elements in the source vector are treated as zero.SADDVDd, Pg, Zn.TCPPcppCPP CPP -- A640Cache prefetch prediction restriction by context CPP RCTX, XtSYS #3, C7, C3, #7, XtLDNPldnpLDNP:Load pair of SIMD&FP registers, with non-temporal hintLDNPSt1, St2, [Xn|SP{, #imm}]LDNPDt1, Dt2, [Xn|SP{, #imm}]LDNPQt1, Qt2, [Xn|SP{, #imm}]LDNPWt1, Wt2, [Xn|SP{, #imm}]LDNPXt1, Xt2, [Xn|SP{, #imm}]FCVTAUfcvtauFCVTAUZFloating-point convert to unsigned integer, rounding to nearest with ties to away (vector) FCVTAUHd, Hn FCVTAUVd, VnFCVTAUVd.T, Vn.TFCVTAUVd.T, Vn.T FCVTAUWd, Hn FCVTAUXd, Hn FCVTAUWd, Sn FCVTAUXd, Sn FCVTAUWd, Dn FCVTAUXd, DnCNTBcntbCNTBDetermines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then places the result in the scalar destination.CNTBXd{, pattern{, MUL #imm}}CNTDXd{, pattern{, MUL #imm}}CNTHXd{, pattern{, MUL #imm}}CNTWXd{, pattern{, MUL #imm}}FCVTXfcvtxFCVTXCConvert active double-precision floating-point elements from the source vector to single-precision, rounding to Odd, and place the results in the even-numbered 32-bit elements of the destination vector, while setting the odd-numbered elements to zero. Inactive elements in the destination vector register remain unmodified.FCVTXZd.S, Pg/M, Zn.DSQCVTUNsqcvtunSQCVTUNSaturate the signed integer value in each element of the group of two source vectors to unsigned integer value that is half the original source element width, and place the two-way interleaved results in the half-width destination elements.SQCVTUNZd.H, { Zn1.S-Zn2.S }SQCVTUNZd.T, { Zn1.Tb-Zn4.Tb }CHKFEATchkfeatCHKFEATCheck feature status CHKFEAT X16LD1RQBld1rqbLD1RQBLoad sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address.%LD1RQB{ Zt.B }, Pg/Z, [Xn|SP{, #imm}]!LD1RQB{ Zt.B }, Pg/Z, [Xn|SP, Xm]EOReorEORBitwise exclusive-OR (vector) EORVd.T, Vn.T, Vm.TEORWd|WSP, Wn, #immEORXd|SP, Xn, #immEORWd, Wn, Wm{, shift #amount}EORXd, Xn, Xm{, shift #amount}EORPd.B, Pg/Z, Pn.B, Pm.BEORZdn.T, Pg/M, Zdn.T, Zm.TEORZdn.T, Zdn.T, #constEORZd.D, Zn.D, Zm.DSQINCWsqincwSQINCWkDetermines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQINCWXdn, Wdn{, pattern{, MUL #imm}} SQINCWXdn{, pattern{, MUL #imm}}"SQINCWZdn.S{, pattern{, MUL #imm}}FRINTPfrintpFRINTP?Floating-point round to integral, toward plus infinity (vector)FRINTPVd.T, Vn.TFRINTPVd.T, Vn.T FRINTPHd, Hn FRINTPSd, Sn FRINTPDd, Dn&FRINTP{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }&FRINTP{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }RMIFrmifRMIFRotate, mask insert flagsRMIFXn, #shift, #maskSETGPTsetgptSETGPT)Memory set with tag setting, unprivilegedSETGPT [Xd]!, Xn!, XsSETGMT [Xd]!, Xn!, XsSETGET [Xd]!, Xn!, XsFRINTMfrintmFRINTM@Floating-point round to integral, toward minus infinity (vector)FRINTMVd.T, Vn.TFRINTMVd.T, Vn.T FRINTMHd, Hn FRINTMSd, Sn FRINTMDd, Dn&FRINTM{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }&FRINTM{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }LDSMAXBldsmaxbLDSMAXB7Atomic signed maximum on byte in memory, without return+STSMAXBWs, [Xn|SP]LDSMAXB Ws, WZR, [Xn|SP]-STSMAXLBWs, [Xn|SP]LDSMAXLB Ws, WZR, [Xn|SP]MRSmrsMRS0Move System register to general-purpose register%MRSXt, (systemreg|Sop0_op1_Cn_Cm_op2)INSinsINS1Insert vector element from another vector elementINSVd.Ts[index1], Vn.Ts[index2]INSVd.Ts[index], RnUBFMubfmUBFMUnsigned bitfield moveUBFMWd, Wn, #immr, #immsUBFMXd, Xn, #immr, #immsCSETcsetCSET CSET -- A64Conditional setCSET Wd, invcondCSINC Wd, WZR, WZR, condCSET Xd, invcondCSINC Xd, XZR, XZR, condSLIsliSLI!Shift left and insert (immediate)SLI Dd, Dn, #shiftSLIVd.T, Vn.T, #shiftSLIZd.T, Zn.T, #constADDQVaddqvADDQVUnsigned addition of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as zero.ADDQVVd.T, Pg, Zn.TbSMNEGLsmneglSMNEGL SMNEGL -- A64Signed multiply-negate longSMNEGL Xd, Wn, WmSMSUBL Xd, Wn, Wm, XZRLD4Wld4wLD4W/Contiguous load four-word structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,ALD4W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]=LD4W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, Pg/Z, [Xn|SP, Xm, LSL #2]FRECPSfrecpsFRECPSFloating-point reciprocal stepFRECPSHd, Hn, HmFRECPSVd, Vn, VmFRECPSVd.T, Vn.T, Vm.TFRECPSVd.T, Vn.T, Vm.TFRECPSZd.T, Zn.T, Zm.TUMINQVuminqvUMINQV)Unsigned minimum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as the maximum unsigned integer for the element size.UMINQVVd.T, Pg, Zn.TbBFCLAMPbfclampBFCLAMPjClamp each BFloat16 element in the two or four destination vectors to between the BFloat16 minimum value in the corresponding element of the first source vector and the BFloat16 maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors."BFCLAMP{ Zd1.H-Zd2.H }, Zn.H, Zm.H"BFCLAMP{ Zd1.H-Zd4.H }, Zn.H, Zm.HBFCLAMPZd.H, Zn.H, Zm.HSADDLPsaddlpSADDLPSigned add long pairwiseSADDLPVd.Ta, Vn.Tb AUTIBSPPCR autibsppcr AUTIBSPPCR9Authenticate return address using key B, using a register AUTIBSPPCRXnDCPS2dcps2DCPS2Debug change PE state to EL2 DCPS2 {#imm}PACMpacmPACMPointer authentication modifierPACMCTERMEQctermeqCTERMEQDetect termination conditions in serialized vector loops. Tests whether the comparison between the scalar source operands holds true and if not tests the state of the  CTERMEQRn, Rm CTERMNERn, RmSVDOTsvdotSVDOTThe signed integer vertical dot product instruction computes the vertical dot product of the corresponding two signed 16-bit integer values held in the two first source vectors and two signed 16-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product results are destructively added to the corresponding 32-bit element of the ZA single-vector groups.<SVDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<SVDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]<SVDOT ZA.D[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]BFVDOTbfvdotBFVDOTThe instruction computes the sum-of-products of each vertical pair of BFloat16 values in the corresponding elements of the two first source vectors with the pair of BFloat16 values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are destructively added to the corresponding single-precision elements of the two ZA single-vector groups.<BFVDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]FCMLEfcmleFCMLE:Floating-point compare less than or equal to zero (vector)FCMLEHd, Hn, #0.0FCMLEVd, Vn, #0.0FCMLEVd.T, Vn.T, #0.0FCMLEVd.T, Vn.T, #0.0FCMLE (vectors)hCompare active floating-point elements in the first source vector being less than or equal to corresponding elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.FCMLE Pd.T, Pg/Z, Zm.T, Zn.TFCMGE Pd.T, Pg/Z, Zn.T, Zm.TDUPdupDUP,Duplicate vector element to vector or scalarDUPVd, Vn.T[index]DUPVd.T, Vn.Ts[index] DUPVd.T, RnDUPZd.T, #imm{, shift}DUPZd.T, Rn|SPDUPZd.T, Zn.T[imm] SQRSHRUNB sqrshrunb SQRSHRUNBAShift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2SQRSHRUNBZd.T, Zn.Tb, #constPRFWprfwPRFWGather prefetch of words from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive addresses are not prefetched from memory.PRFWprfop, Pg, [Zn.S{, #imm}]PRFWprfop, Pg, [Zn.D{, #imm}]&PRFWprfop, Pg, [Xn|SP{, #imm, MUL VL}]"PRFWprfop, Pg, [Xn|SP, Xm, LSL #2]$PRFWprfop, Pg, [Xn|SP, Zm.S, mod #2]$PRFWprfop, Pg, [Xn|SP, Zm.D, mod #2]$PRFWprfop, Pg, [Xn|SP, Zm.D, LSL #2]SQXTUNBsqxtunbSQXTUNBSaturate the signed integer value in each source element to an unsigned integer value that is half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.SQXTUNBZd.T, Zn.TbLD2Dld2dLD2D3Contiguous load two-doubleword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,3LD2D{ Zt1.D, Zt2.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]/LD2D{ Zt1.D, Zt2.D }, Pg/Z, [Xn|SP, Xm, LSL #3]LD4Hld4hLD4H3Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,ALD4H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}]=LD4H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, Pg/Z, [Xn|SP, Xm, LSL #1] AUTIA171615 autia171615 AUTIA171615-Authenticate instruction address, using key A AUTIA171615LDAPURBldapurbLDAPURB*Load-acquire RCpc register byte (unscaled)LDAPURBWt, [Xn|SP{, #simm}]LDRSHldrshLDRSH)Load register signed halfword (immediate)LDRSHWt, [Xn|SP], #simmLDRSHXt, [Xn|SP], #simmLDRSHWt, [Xn|SP, #simm]!LDRSHXt, [Xn|SP, #simm]!LDRSHWt, [Xn|SP{, #pimm}]LDRSHXt, [Xn|SP{, #pimm}],LDRSHWt, [Xn|SP, (Wm|Xm){, extend {amount}}],LDRSHXt, [Xn|SP, (Wm|Xm){, extend {amount}}]UQDECWuqdecwUQDECW*Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQDECWWdn{, pattern{, MUL #imm}} UQDECWXdn{, pattern{, MUL #imm}}"UQDECWZdn.S{, pattern{, MUL #imm}}MADPTmadptMADPTMultiply with overflow check the elements of the first and second source vectors and add with pointer check to elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector.MADPTZdn.D, Zm.D, Za.DSQDECHsqdechSQDECHkDetermines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQDECHXdn, Wdn{, pattern{, MUL #imm}} SQDECHXdn{, pattern{, MUL #imm}}"SQDECHZdn.H{, pattern{, MUL #imm}}DGHdghDGHData gathering hintDGHWHILELTwhileltWHILELTGenerate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, signed scalar operand is less than the second scalar operand and false thereafter up to the highest numbered element.WHILELTPd.T, Rn, RmWHILELTPNd.T, Xn, Xm, vlWHILELT{ Pd1.T, Pd2.T }, Xn, XmSMSTOPsmstopSMSTOP SMSTOP -- A64ADisables access to Streaming SVE mode and SME architectural stateSMSTOP {option}MSR pstatefield, #0RCWCASrcwcasRCWCAS6Read check write compare and swap doubleword in memoryRCWCASXs, Xt, [Xn|SP]RCWCASAXs, Xt, [Xn|SP]RCWCASALXs, Xt, [Xn|SP]RCWCASLXs, Xt, [Xn|SP]LDEORldeorLDEOR3Atomic exclusive-OR on word or doubleword in memory LDEORWs, Wt, [Xn|SP]LDEORAWs, Wt, [Xn|SP]LDEORALWs, Wt, [Xn|SP]LDEORLWs, Wt, [Xn|SP]LDEORXs, Xt, [Xn|SP]LDEORAXs, Xt, [Xn|SP]LDEORALXs, Xt, [Xn|SP]LDEORLXs, Xt, [Xn|SP]'STEORWs, [Xn|SP]LDEOR Ws, WZR, [Xn|SP])STEORLWs, [Xn|SP]LDEORL Ws, WZR, [Xn|SP]'STEORXs, [Xn|SP]LDEOR Xs, XZR, [Xn|SP])STEORLXs, [Xn|SP]LDEORL Xs, XZR, [Xn|SP]BFMLSLTbfmlsltBFMLSLTThis BFloat16 floating-point multiply-subtract long instruction widens the odd-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLSLTZda.S, Zn.H, Zm.HBFMLSLTZda.S, Zn.H, Zm.H[imm]SUMLALLsumlallSUMLALLaThis signed by unsigned integer multiply-add long-long instruction multiplies each signed 8-bit element in the one, two, or four first source vectors with each unsigned 8-bit indexed element of the second source vector, widens each product to 32-bits and destructively adds these values to the corresponding 32-bit elements of the ZA quad-vector groups.0SUMLALL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B[index]CSUMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CSUMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]<SUMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B<SUMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.BNORnorNOR1Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.NORPd.B, Pg/Z, Pn.B, Pm.BPMULpmulPMULPolynomial multiplyPMULVd.T, Vn.T, Vm.TPMULZd.B, Zn.B, Zm.BTCANCELtcancelTCANCELCancel current transaction TCANCEL #immMOVAmovaMOVAThe instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.-MOVA{ Zd1.B-Zd2.B }, ZA0HV.B[Ws, offs1:offs2]-MOVA{ Zd1.H-Zd2.H }, ZAnHV.H[Ws, offs1:offs2]-MOVA{ Zd1.S-Zd2.S }, ZAnHV.S[Ws, offs1:offs2]-MOVA{ Zd1.D-Zd2.D }, ZAnHV.D[Ws, offs1:offs2]-MOVA{ Zd1.B-Zd4.B }, ZA0HV.B[Ws, offs1:offs4]-MOVA{ Zd1.H-Zd4.H }, ZAnHV.H[Ws, offs1:offs4]-MOVA{ Zd1.S-Zd4.S }, ZAnHV.S[Ws, offs1:offs4]-MOVA{ Zd1.D-Zd4.D }, ZAnHV.D[Ws, offs1:offs4]+MOVA{ Zd1.D-Zd2.D }, ZA.D[Wv, offs{, VGx2}]+MOVA{ Zd1.D-Zd4.D }, ZA.D[Wv, offs{, VGx4}]!MOVAZd.B, Pg/M, ZA0HV.B[Ws, offs]!MOVAZd.H, Pg/M, ZAnHV.H[Ws, offs]!MOVAZd.S, Pg/M, ZAnHV.S[Ws, offs]!MOVAZd.D, Pg/M, ZAnHV.D[Ws, offs]!MOVAZd.Q, Pg/M, ZAnHV.Q[Ws, offs]1MOVA ZA0HV.B[Ws, offs1:offs2], { Zn1.B-Zn2.B }-MOVAZAdHV.H[Ws, offs1:offs2], { Zn1.H-Zn2.H }-MOVAZAdHV.S[Ws, offs1:offs2], { Zn1.S-Zn2.S }-MOVAZAdHV.D[Ws, offs1:offs2], { Zn1.D-Zn2.D }1MOVA ZA0HV.B[Ws, offs1:offs4], { Zn1.B-Zn4.B }-MOVAZAdHV.H[Ws, offs1:offs4], { Zn1.H-Zn4.H }-MOVAZAdHV.S[Ws, offs1:offs4], { Zn1.S-Zn4.S }-MOVAZAdHV.D[Ws, offs1:offs4], { Zn1.D-Zn4.D }/MOVA ZA.D[Wv, offs{, VGx2}], { Zn1.D-Zn2.D }/MOVA ZA.D[Wv, offs{, VGx4}], { Zn1.D-Zn4.D }%MOVA ZA0HV.B[Ws, offs], Pg/M, Zn.B!MOVAZAdHV.H[Ws, offs], Pg/M, Zn.H!MOVAZAdHV.S[Ws, offs], Pg/M, Zn.S!MOVAZAdHV.D[Ws, offs], Pg/M, Zn.D!MOVAZAdHV.Q[Ws, offs], Pg/M, Zn.QBICbicBIC%Bitwise bit clear (vector, immediate)BICVd.T, #imm8{, LSL #amount}BICVd.T, #imm8{, LSL #amount}BICVd.T, Vn.T, Vm.TBICWd, Wn, Wm{, shift #amount}BICXd, Xn, Xm{, shift #amount}BICPd.B, Pg/Z, Pn.B, Pm.BBICZdn.T, Pg/M, Zdn.T, Zm.TBICZd.D, Zn.D, Zm.DBIC (immediate)CBitwise clear bits using immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.BIC Zdn.T, Zdn.T, #constAND Zdn.T, Zdn.T, #(-const - 1)GCSPOPXgcspopxGCSPOPXGCSPOPX -- A641Guarded Control Stack pop exception return recordGCSPOPXSYS #0, C7, C7, #6{, Xt}SXTWsxtwSXTW SXTW -- A64Sign extend word SXTW Xd, WnSBFM Xd, Xn, #0, #31LDPSWldpswLDPSW"Load pair of registers signed wordLDPSWXt1, Xt2, [Xn|SP], #immLDPSWXt1, Xt2, [Xn|SP, #imm]!LDPSWXt1, Xt2, [Xn|SP{, #imm}]COSPcospCOSP COSP -- A649Clear other speculative prediction restriction by contextCOSP RCTX, XtSYS #3, C7, C3, #6, XtSTRHstrhSTRH#Store register halfword (immediate)STRHWt, [Xn|SP], #simmSTRHWt, [Xn|SP, #simm]!STRHWt, [Xn|SP{, #pimm}]+STRHWt, [Xn|SP, (Wm|Xm){, extend {amount}}]FMSBfmsbFMSBZMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.FMSBZdn.T, Pg/M, Zm.T, Za.TLD3Rld3rLD3RMLoad single 3-element structure and replicate to all lanes of three registers$LD3R {Vt.T, Vt2.T, Vt3.T }, [Xn|SP])LD3R {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], imm(LD3R {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], XmSQRSHRNTsqrshrntSQRSHRNT6Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's signed integer range -2SQRSHRNTZd.T, Zn.Tb, #constORRSorrsORRS"Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the ORRSPd.B, Pg/Z, Pn.B, Pm.BSQDMLALBsqdmlalbSQDMLALBMultiply then double the corresponding even-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2SQDMLALBZda.T, Zn.Tb, Zm.TbSQDMLALBZda.S, Zn.H, Zm.H[imm]SQDMLALBZda.D, Zn.S, Zm.S[imm]ADDSaddsADDS&Add (extended register), setting flags&ADDSWd, Wn|WSP, Wm{, extend {#amount}}%ADDSXd, Xn|SP, Rm{, extend {#amount}}ADDSWd, Wn|WSP, #imm{, shift}ADDSXd, Xn|SP, #imm{, shift}ADDSWd, Wn, Wm{, shift #amount}ADDSXd, Xn, Xm{, shift #amount}LDUMAXHldumaxhLDUMAXH=Atomic unsigned maximum on halfword in memory, without return+STUMAXHWs, [Xn|SP]LDUMAXH Ws, WZR, [Xn|SP]-STUMAXLHWs, [Xn|SP]LDUMAXLH Ws, WZR, [Xn|SP]LD4ld4LD44Load multiple 4-element structures to four registers*LD4 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP]/LD4 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], imm.LD4 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], Xm1LD4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP]1LD4 {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP]1LD4 {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP]1LD4 {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP]5LD4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], #45LD4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], Xm5LD4 {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP], #85LD4 {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP], Xm6LD4 {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP], #165LD4 {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP], Xm6LD4 {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP], #325LD4 {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP], XmUMLALLumlallUMLALL~This unsigned integer multiply-add long-long instruction multiplies each unsigned 8-bit or 16-bit element in the one, two, or four first source vectors with each unsigned 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively adds these values to the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups. 0UMLALL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B[index]0UMLALL ZA.D[Wv, offs1:offs4], Zn.H, Zm.H[index]CUMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CUMLALL ZA.D[Wv, offs1:offs4{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CUMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]CUMLALL ZA.D[Wv, offs1:offs4{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]+UMLALL ZA.T[Wv, offs1:offs4], Zn.Tb, Zm.Tb?UMLALL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, Zm.Tb?UMLALL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, Zm.TbKUMLALL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, { Zm1.Tb-Zm2.Tb }KUMLALL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, { Zm1.Tb-Zm4.Tb }CPYPWTcpypwtCPYPWT Memory copy, writes unprivilegedCPYPWT [Xd]!, [Xs]!, Xn!CPYMWT [Xd]!, [Xs]!, Xn!CPYEWT [Xd]!, [Xs]!, Xn!URSHLRurshlrURSHLRShift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Inactive elements in the destination vector register remain unmodified.URSHLRZdn.T, Pg/M, Zdn.T, Zm.TUSMLALLusmlallUSMLALLaThis unsigned by signed integer multiply-add long-long instruction multiplies each unsigned 8-bit element in the one, two, or four first source vectors with each signed 8-bit indexed element of the second source vector, widens each product to 32-bits and destructively adds these values to the corresponding 32-bit elements of the ZA quad-vector groups.0USMLALL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B[index]CUSMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CUSMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index])USMLALL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B<USMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B<USMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.BGUSMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, { Zm1.B-Zm2.B }GUSMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, { Zm1.B-Zm4.B }ST2st2ST26Store multiple 2-element structures from two registersST2 {Vt.T, Vt2.T }, [Xn|SP]!ST2 {Vt.T, Vt2.T }, [Xn|SP], imm ST2 {Vt.T, Vt2.T }, [Xn|SP], Xm#ST2 {Vt.B, Vt2.B }[index], [Xn|SP]#ST2 {Vt.H, Vt2.H }[index], [Xn|SP]#ST2 {Vt.S, Vt2.S }[index], [Xn|SP]#ST2 {Vt.D, Vt2.D }[index], [Xn|SP]'ST2 {Vt.B, Vt2.B }[index], [Xn|SP], #2'ST2 {Vt.B, Vt2.B }[index], [Xn|SP], Xm'ST2 {Vt.H, Vt2.H }[index], [Xn|SP], #4'ST2 {Vt.H, Vt2.H }[index], [Xn|SP], Xm'ST2 {Vt.S, Vt2.S }[index], [Xn|SP], #8'ST2 {Vt.S, Vt2.S }[index], [Xn|SP], Xm(ST2 {Vt.D, Vt2.D }[index], [Xn|SP], #16'ST2 {Vt.D, Vt2.D }[index], [Xn|SP], XmINCDincdINCDDetermines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. INCDZdn.D{, pattern{, MUL #imm}} INCHZdn.H{, pattern{, MUL #imm}} INCWZdn.S{, pattern{, MUL #imm}}SYSsysSYSSystem instructionSYS #op1, Cn, Cm, #op2{, Xt}CTZctzCTZCount trailing zeros CTZWd, Wn CTZXd, XnSTZGstzgSTZGStore Allocation Tag, zeroingSTZGXt|SP, [Xn|SP], #simmSTZGXt|SP, [Xn|SP, #simm]!STZGXt|SP, [Xn|SP{, #simm}]ERETAAeretaaERETAA-Exception return, with pointer authenticationERETAAERETABLDRSBldrsbLDRSB%Load register signed byte (immediate) LDRSBWt, [Xn|SP], #simmLDRSBXt, [Xn|SP], #simmLDRSBWt, [Xn|SP, #simm]!LDRSBXt, [Xn|SP, #simm]!LDRSBWt, [Xn|SP{, #pimm}]LDRSBXt, [Xn|SP{, #pimm}]*LDRSBWt, [Xn|SP, (Wm|Xm), extend {amount}]"LDRSBWt, [Xn|SP, Xm{, LSL amount}]*LDRSBXt, [Xn|SP, (Wm|Xm), extend {amount}]"LDRSBXt, [Xn|SP, Xm{, LSL amount}]FCVTfcvtFCVT)Floating-point convert precision (scalar) FCVTSd, Hn FCVTDd, Hn FCVTHd, Sn FCVTDd, Sn FCVTHd, Dn FCVTSd, DnFCVT{ Zd1.S-Zd2.S }, Zn.HFCVTZd.B, { Zn1.H-Zn2.H }FCVTZd.B, { Zn1.S-Zn4.S }FCVTZd.H, { Zn1.S-Zn2.S }FCVTZd.S, Pg/M, Zn.HFCVTZd.D, Pg/M, Zn.HFCVTZd.H, Pg/M, Zn.SFCVTZd.D, Pg/M, Zn.SFCVTZd.H, Pg/M, Zn.DFCVTZd.S, Pg/M, Zn.DST64BVst64bvST64BV3Single-copy atomic 64-byte store with status resultST64BVXs, Xt, [Xn|SP]RETAAretaaRETAA3Return from subroutine, with pointer authenticationRETAARETABFCCMPEfccmpeFCCMPE5Floating-point conditional signaling compare (scalar)FCCMPEHn, Hm, #nzcv, condFCCMPESn, Sm, #nzcv, condFCCMPEDn, Dm, #nzcv, condFCVTPUfcvtpuFCVTPURFloating-point convert to unsigned integer, rounding toward plus infinity (vector) FCVTPUHd, Hn FCVTPUVd, VnFCVTPUVd.T, Vn.TFCVTPUVd.T, Vn.T FCVTPUWd, Hn FCVTPUXd, Hn FCVTPUWd, Sn FCVTPUXd, Sn FCVTPUWd, Dn FCVTPUXd, DnSMINVsminvSMINVSigned minimum across vector SMINVVd, Vn.TSMINVVd, Pg, Zn.TSBCsbcSBCSubtract with carry SBCWd, Wn, Wm SBCXd, Xn, XmHISTSEGhistsegHISTSEG*This instruction compares each 8-bit byte element of the first source vector with all of the elements in the corresponding 128-bit segment of the second source vector and places the count of matching elements in the corresponding element of the destination vector. This instruction is unpredicated.HISTSEGZd.B, Zn.B, Zm.BBCbcBCBranch consistent conditionallyBC.cond labelLD2Rld2rLD2RKLoad single 2-element structure and replicate to all lanes of two registersLD2R {Vt.T, Vt2.T }, [Xn|SP]"LD2R {Vt.T, Vt2.T }, [Xn|SP], imm!LD2R {Vt.T, Vt2.T }, [Xn|SP], XmLD1RQHld1rqhLD1RQHLoad eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address.%LD1RQH{ Zt.H }, Pg/Z, [Xn|SP{, #imm}])LD1RQH{ Zt.H }, Pg/Z, [Xn|SP, Xm, LSL #1]FEXPAfexpaFEXPAThe FEXPAZd.T, Zn.TADDVaddvADDVAdd across vector ADDVVd, Vn.TNOPnopNOP No operationNOPLD3Dld3dLD3D7Contiguous load three-doubleword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,:LD3D{ Zt1.D, Zt2.D, Zt3.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]6LD3D{ Zt1.D, Zt2.D, Zt3.D }, Pg/Z, [Xn|SP, Xm, LSL #3] PACIA171615 pacia171615 PACIA171615@Pointer Authentication Code for instruction address, using key A PACIA171615ST2Qst2qST2Q2Contiguous store two-quadword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,1ST2Q{ Zt1.Q, Zt2.Q }, Pg, [Xn|SP{, #imm, MUL VL}]-ST2Q{ Zt1.Q, Zt2.Q }, Pg, [Xn|SP, Xm, LSL #4]UABDuabdUABD%Unsigned absolute difference (vector)UABDVd.T, Vn.T, Vm.TUABDZdn.T, Pg/M, Zdn.T, Zm.TST2Hst2hST2H2Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,1ST2H{ Zt1.H, Zt2.H }, Pg, [Xn|SP{, #imm, MUL VL}]-ST2H{ Zt1.H, Zt2.H }, Pg, [Xn|SP, Xm, LSL #1]CMPLScmplsCMPLSCMPLS (vectors)[Compare active unsigned integer elements in the first source vector being lower than or same as corresponding unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the CMPLS Pd.T, Pg/Z, Zm.T, Zn.TCMPHS Pd.T, Pg/Z, Zn.T, Zm.TBRKPBSbrkpbsBRKPBSeIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKPBSPd.B, Pg/Z, Pn.B, Pm.BANDandANDBitwise AND (vector) ANDVd.T, Vn.T, Vm.TANDWd|WSP, Wn, #immANDXd|SP, Xn, #immANDWd, Wn, Wm{, shift #amount}ANDXd, Xn, Xm{, shift #amount}ANDPd.B, Pg/Z, Pn.B, Pm.BANDZdn.T, Pg/M, Zdn.T, Zm.TANDZdn.T, Zdn.T, #constANDZd.D, Zn.D, Zm.DFMAXQVfmaxqvFMAXQV,Floating-point maximum of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as -Infinity.FMAXQVVd.T, Pg, Zn.TbFNMLSfnmlsFNMLScMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.FNMLSZda.T, Pg/M, Zn.T, Zm.TFACGEfacgeFACGE>Floating-point absolute compare greater than or equal (vector)FACGEHd, Hn, HmFACGEVd, Vn, VmFACGEVd.T, Vn.T, Vm.TFACGEVd.T, Vn.T, Vm.TLDURSBldursbLDURSB$Load register signed byte (unscaled)LDURSBWt, [Xn|SP{, #simm}]LDURSBXt, [Xn|SP{, #simm}]FNMSUBfnmsubFNMSUB7Floating-point negated fused multiply-subtract (scalar)FNMSUBHd, Hn, Hm, HaFNMSUBSd, Sn, Sm, SaFNMSUBDd, Dn, Dm, DaPNEXTpnextPNEXT]An instruction used to construct a loop which iterates over all true elements in the vector select predicate register. If all elements in the first source predicate register are false it determines the first true element in the vector select predicate register, otherwise it determines the next true element in the vector select predicate register that follows the last true element in the first source predicate register. All elements of the destination predicate register are set to false, except the element corresponding to the determined vector select element, if any, which is set to true. Sets the PNEXTPdn.T, Pv, Pdn.TSM4Esm4eSM4E SM4 encodeSM4EVd.4S, Vn.4SSM4EZdn.S, Zdn.S, Zm.SFCVTNSfcvtnsFCVTNSXFloating-point convert to signed integer, rounding to nearest with ties to even (vector) FCVTNSHd, Hn FCVTNSVd, VnFCVTNSVd.T, Vn.TFCVTNSVd.T, Vn.T FCVTNSWd, Hn FCVTNSXd, Hn FCVTNSWd, Sn FCVTNSXd, Sn FCVTNSWd, Dn FCVTNSXd, DnPACIApaciaPACIA@Pointer Authentication Code for instruction address, using key APACIAXd, Xn|SPPACIZAXd PACIA1716PACIASPPACIAZSDIVsdivSDIV Signed divideSDIVWd, Wn, WmSDIVXd, Xn, XmSDIVZdn.T, Pg/M, Zdn.T, Zm.THINThintHINTHint instruction HINT #immCLREXclrexCLREXClear exclusive CLREX {#imm}SQSUBRsqsubrSQSUBR'Subtract active signed elements of the first source vector from corresponding signed elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Each result element is saturated to the N-bit element's signed integer range -2SQSUBRZdn.T, Pg/M, Zdn.T, Zm.TSUBPTsubptSUBPTSubtract checked pointer$SUBPTXd|SP, Xn|SP, Xm{, LSL #amount}SUBPTZdn.D, Pg/M, Zdn.D, Zm.DSUBPTZd.D, Zn.D, Zm.DUQSHRNTuqshrntUQSHRNTAShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2UQSHRNTZd.T, Zn.Tb, #constUUNPKHIuunpkhiUUNPKHIUnpack elements from the lowest or highest half of the source vector and then zero-extend them to place in elements of twice their size within the destination vector. This instruction is unpredicated.UUNPKHIZd.T, Zn.TbUUNPKLOZd.T, Zn.TbPSSBBpssbbPSSBB PSSBB -- A64)Physical speculative store bypass barrierPSSBBDSB #4DVPdvpDVP DVP -- A64,Data value prediction restriction by context DVP RCTX, XtSYS #3, C7, C3, #5, XtLDTRSHldtrshLDTRSH,Load register signed halfword (unprivileged)LDTRSHWt, [Xn|SP{, #simm}]LDTRSHXt, [Xn|SP{, #simm}]UQSHLRuqshlrUQSHLRShift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2UQSHLRZdn.T, Pg/M, Zdn.T, Zm.TLDEORABldeorabLDEORAB%Atomic exclusive-OR on byte in memoryLDEORABWs, Wt, [Xn|SP]LDEORALBWs, Wt, [Xn|SP]LDEORBWs, Wt, [Xn|SP]LDEORLBWs, Wt, [Xn|SP]SQSHRUNBsqshrunbSQSHRUNBCShift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2SQSHRUNBZd.T, Zn.Tb, #constBFMLSLBbfmlslbBFMLSLBThis BFloat16 floating-point multiply-subtract long instruction widens the even-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLSLBZda.S, Zn.H, Zm.HBFMLSLBZda.S, Zn.H, Zm.H[imm]TSTARTtstartTSTARTStart transactionTSTARTXtSETPsetpSETP Memory setSETP [Xd]!, Xn!, XsSETM [Xd]!, Xn!, XsSETE [Xd]!, Xn!, XsCCMPccmpCCMPConditional compare (immediate)CCMPWn, #imm, #nzcv, condCCMPXn, #imm, #nzcv, condCCMPWn, Wm, #nzcv, condCCMPXn, Xm, #nzcv, condUHADDuhaddUHADDUnsigned halving addUHADDVd.T, Vn.T, Vm.TUHADDZdn.T, Pg/M, Zdn.T, Zm.TF1CVTLf1cvtlF1CVTL78-bit floating-point convert to half-precision (vector)F1CVTL{2} Vd.8H, Vn.TaF2CVTL{2} Vd.8H, Vn.TaF1CVTL{ Zd1.H-Zd2.H }, Zn.BF2CVTL{ Zd1.H-Zd2.H }, Zn.BRADDHNTraddhntRADDHNT2Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant rounded half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.RADDHNTZd.T, Zn.Tb, Zm.TbSM4EKEYsm4ekeySM4EKEYSM4 keySM4EKEYVd.4S, Vn.4S, Vm.4SSM4EKEYZd.S, Zn.S, Zm.SSYSPsyspSYSP128-bit system instruction$SYSP #op1, Cn, Cm, #op2{, Xt1, Xt2}NOTSnotsNOTSNOTSBitwise invert each active element of the source predicate, and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the NOTS Pd.B, Pg/Z, Pn.BEORS Pd.B, Pg/Z, Pn.B, Pg.BLD1ld1LD1MLoad multiple single-element structures to one, two, three, or four registersLD1 {Vt.T }, [Xn|SP]LD1 {Vt.T, Vt2.T }, [Xn|SP]#LD1 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP]*LD1 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP]LD1 {Vt.T }, [Xn|SP], immLD1 {Vt.T }, [Xn|SP], Xm!LD1 {Vt.T, Vt2.T }, [Xn|SP], imm LD1 {Vt.T, Vt2.T }, [Xn|SP], Xm(LD1 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], imm'LD1 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], Xm/LD1 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], imm.LD1 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], XmLD1 {Vt.B }[index], [Xn|SP]LD1 {Vt.H }[index], [Xn|SP]LD1 {Vt.S }[index], [Xn|SP]LD1 {Vt.D }[index], [Xn|SP] LD1 {Vt.B }[index], [Xn|SP], #1 LD1 {Vt.B }[index], [Xn|SP], Xm LD1 {Vt.D }[index], [Xn|SP], #8 LD1 {Vt.D }[index], [Xn|SP], Xm LD1 {Vt.H }[index], [Xn|SP], #2 LD1 {Vt.H }[index], [Xn|SP], Xm LD1 {Vt.S }[index], [Xn|SP], #4 LD1 {Vt.S }[index], [Xn|SP], XmSMLSLTsmlsltSMLSLTMultiply the corresponding odd-numbered signed elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.SMLSLTZda.T, Zn.Tb, Zm.TbSMLSLTZda.S, Zn.H, Zm.H[imm]SMLSLTZda.D, Zn.S, Zm.S[imm]ST1Hst1hST1HContiguous store of halfwords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.1ST1H{ Zt1.H-Zt2.H }, PNg, [Xn|SP{, #imm, MUL VL}]1ST1H{ Zt1.H-Zt4.H }, PNg, [Xn|SP{, #imm, MUL VL}]-ST1H{ Zt1.H-Zt2.H }, PNg, [Xn|SP, Xm, LSL #1]-ST1H{ Zt1.H-Zt4.H }, PNg, [Xn|SP, Xm, LSL #1]2ST1H{ Zt1.H, Zt2.H }, PNg, [Xn|SP{, #imm, MUL VL}]@ST1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg, [Xn|SP{, #imm, MUL VL}].ST1H{ Zt1.H, Zt2.H }, PNg, [Xn|SP, Xm, LSL #1]<ST1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg, [Xn|SP, Xm, LSL #1] ST1H{ Zt.S }, Pg, [Zn.S{, #imm}] ST1H{ Zt.D }, Pg, [Zn.D{, #imm}])ST1H{ Zt.T }, Pg, [Xn|SP{, #imm, MUL VL}]%ST1H{ Zt.T }, Pg, [Xn|SP, Xm, LSL #1]'ST1H{ Zt.S }, Pg, [Xn|SP, Zm.S, mod #1]'ST1H{ Zt.D }, Pg, [Xn|SP, Zm.D, mod #1]$ST1H{ Zt.D }, Pg, [Xn|SP, Zm.D, mod]$ST1H{ Zt.S }, Pg, [Xn|SP, Zm.S, mod]'ST1H{ Zt.D }, Pg, [Xn|SP, Zm.D, LSL #1]ST1H{ Zt.D }, Pg, [Xn|SP, Zm.D]4ST1H{ ZAtHV.H[Ws, offs] }, Pg, [Xn|SP{, Xm, LSL #1}]UMULHumulhUMULHUnsigned multiply highUMULHXd, Xn, XmUMULHZdn.T, Pg/M, Zdn.T, Zm.TUMULHZd.T, Zn.T, Zm.TTLBItlbiTLBI TLBI -- A64TLB invalidate operationTLBI tlbi_op{, Xt}SYS #op1, Cn, Cm, #op2{, Xt}LDRSWldrswLDRSW%Load register signed word (immediate)LDRSWXt, [Xn|SP], #simmLDRSWXt, [Xn|SP, #simm]!LDRSWXt, [Xn|SP{, #pimm}]LDRSWXt, label,LDRSWXt, [Xn|SP, (Wm|Xm){, extend {amount}}]BFCbfcBFC BFC -- A64Bitfield clearBFC Wd, #lsb, #width*BFM Wd, WZR, #(-lsb MOD 32), #(width-1)BFC Xd, #lsb, #width*BFM Xd, XZR, #(-lsb MOD 64), #(width-1)LD3Hld3hLD3H5Contiguous load three-halfword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,:LD3H{ Zt1.H, Zt2.H, Zt3.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}]6LD3H{ Zt1.H, Zt2.H, Zt3.H }, Pg/Z, [Xn|SP, Xm, LSL #1]SCVTFscvtfSCVTF5Signed fixed-point convert to floating-point (vector)SCVTFVd, Vn, #fbitsSCVTFVd.T, Vn.T, #fbits SCVTFHd, Hn SCVTFVd, VnSCVTFVd.T, Vn.TSCVTFVd.T, Vn.TSCVTFHd, Wn, #fbitsSCVTFHd, Xn, #fbitsSCVTFSd, Wn, #fbitsSCVTFSd, Xn, #fbitsSCVTFDd, Wn, #fbitsSCVTFDd, Xn, #fbits SCVTFHd, Wn SCVTFSd, Wn SCVTFDd, Wn SCVTFHd, Xn SCVTFSd, Xn SCVTFDd, Xn%SCVTF{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }%SCVTF{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }SCVTFZd.H, Pg/M, Zn.HSCVTFZd.H, Pg/M, Zn.SSCVTFZd.S, Pg/M, Zn.SSCVTFZd.D, Pg/M, Zn.SSCVTFZd.H, Pg/M, Zn.DSCVTFZd.S, Pg/M, Zn.DSCVTFZd.D, Pg/M, Zn.DLDAXRBldaxrbLDAXRB$Load-acquire exclusive register byteLDAXRBWt, [Xn|SP{, #0}]SBCLTsbcltSBCLTcSubtract the odd-numbered elements of the first source vector and the inverted 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector from the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.SBCLTZda.T, Zn.T, Zm.TSM3SS1sm3ss1SM3SS1SM3SS1 SM3SS1Vd.4S, Vn.4S, Vm.4S, Va.4SSSUBLBssublbSSUBLBSubtract the even-numbered signed elements of the second source vector from the corresponding signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SSUBLBZd.T, Zn.Tb, Zm.TbFNMULfnmulFNMUL'Floating-point multiply-negate (scalar)FNMULHd, Hn, HmFNMULSd, Sn, SmFNMULDd, Dn, DmFMOPSfmopsFMOPSuThe half-precision floating-point sum of outer products and subtract instruction works with a 32-bit element ZA tile.#FMOPSZAda.S, Pn/M, Pm/M, Zn.H, Zm.H#FMOPSZAda.H, Pn/M, Pm/M, Zn.H, Zm.H#FMOPSZAda.S, Pn/M, Pm/M, Zn.S, Zm.S#FMOPSZAda.D, Pn/M, Pm/M, Zn.D, Zm.DWHILELEwhileleWHILELEGenerate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, signed scalar operand is less than or equal to the second scalar operand and false thereafter up to the highest numbered element.WHILELEPd.T, Rn, RmWHILELEPNd.T, Xn, Xm, vlWHILELE{ Pd1.T, Pd2.T }, Xn, XmRCWSSETrcwssetRCWSSET@Read check write software atomic bit set on doubleword in memoryRCWSSETXs, Xt, [Xn|SP]RCWSSETAXs, Xt, [Xn|SP]RCWSSETALXs, Xt, [Xn|SP]RCWSSETLXs, Xt, [Xn|SP]REV32rev32REV32)Reverse elements in 32-bit words (vector)REV32Vd.T, Vn.T REV32Xd, XnSQDMLSLTsqdmlsltSQDMLSLTMultiply then double the corresponding odd-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2SQDMLSLTZda.T, Zn.Tb, Zm.TbSQDMLSLTZda.S, Zn.H, Zm.H[imm]SQDMLSLTZda.D, Zn.S, Zm.S[imm]SHRNshrnSHRNShift right narrow (immediate)SHRN{2} Vd.Tb, Vn.Ta, #shiftSQRSHRUsqrshruSQRSHRU Shift right by an immediate value, the signed integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2$SQRSHRUZd.H, { Zn1.S-Zn2.S }, #const&SQRSHRUZd.T, { Zn1.Tb-Zn4.Tb }, #constSSHLsshlSSHLSigned shift left (register)SSHL Dd, Dn, DmSSHLVd.T, Vn.T, Vm.TLDURSWldurswLDURSW$Load register signed word (unscaled)LDURSWXt, [Xn|SP{, #simm}]BRKbrkBRKBreakpoint instruction BRK #immAUTIAautiaAUTIA-Authenticate instruction address, using key AAUTIAXd, Xn|SPAUTIZAXd AUTIA1716AUTIASPAUTIAZDUPMdupmDUPMUnconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits.DUPMZd.T, #constBRKPBbrkpbBRKPB}If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.BRKPBPd.B, Pg/Z, Pn.B, Pm.BAXFLAGaxflagAXFLAGBConvert floating-point condition flags from Arm to external formatAXFLAGORVorvORVBitwise inclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as zero.ORVVd, Pg, Zn.TUQSHLuqshlUQSHL*Unsigned saturating shift left (immediate)UQSHLVd, Vn, #shiftUQSHLVd.T, Vn.T, #shiftUQSHLVd, Vn, VmUQSHLVd.T, Vn.T, Vm.TUQSHLZdn.T, Pg/M, Zdn.T, #constUQSHLZdn.T, Pg/M, Zdn.T, Zm.TBLRblrBLRBranch with link to registerBLRXnGCSBgcsbGCSBGuarded Control Stack barrier GCSB DSYNCSTLURHstlurhSTLURH*Store-release register halfword (unscaled)STLURHWt, [Xn|SP{, #simm}]ST2Dst2dST2D4Contiguous store two-doubleword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,1ST2D{ Zt1.D, Zt2.D }, Pg, [Xn|SP{, #imm, MUL VL}]-ST2D{ Zt1.D, Zt2.D }, Pg, [Xn|SP, Xm, LSL #3]LDUMINBlduminbLDUMINB9Atomic unsigned minimum on byte in memory, without return+STUMINBWs, [Xn|SP]LDUMINB Ws, WZR, [Xn|SP]-STUMINLBWs, [Xn|SP]LDUMINLB Ws, WZR, [Xn|SP]SXTHsxthSXTH SXTH -- A64Sign extend halfword SXTH Wd, WnSBFM Wd, Wn, #0, #15 SXTH Xd, WnSBFM Xd, Xn, #0, #15ADDPTaddptADDPTAdd checked pointer$ADDPTXd|SP, Xn|SP, Xm{, LSL #amount}ADDPTZdn.D, Pg/M, Zdn.D, Zm.DADDPTZd.D, Zn.D, Zm.D AUTIB171615 autib171615 AUTIB171615-Authenticate instruction address, using key B AUTIB171615MOVTmovtMOVTMove 8 bytes to a general-purpose register from the ZT0 register at the byte offset specified by the immediate index. This instruction is UNDEFINED in Non-debug state.MOVTXt, ZT0[offs]MOVT ZT0[offs], XtMOVT ZT0{[offs, MUL VL]}, ZtRCWSSETPrcwssetpRCWSSETP>Read check write software atomic bit set on quadword in memoryRCWSSETPXt1, Xt2, [Xn|SP]RCWSSETPAXt1, Xt2, [Xn|SP]RCWSSETPALXt1, Xt2, [Xn|SP]RCWSSETPLXt1, Xt2, [Xn|SP]SEVsevSEV Send eventSEVUMOPSumopsUMOPS5This instruction works with a 32-bit element ZA tile.#UMOPSZAda.S, Pn/M, Pm/M, Zn.H, Zm.H#UMOPSZAda.S, Pn/M, Pm/M, Zn.B, Zm.B#UMOPSZAda.D, Pn/M, Pm/M, Zn.H, Zm.HUXTHuxthUXTH UXTH -- A64Unsigned extend halfword UXTH Wd, WnUBFM Wd, Wn, #0, #15UMULLumullUMULL+Unsigned multiply long (vector, by element)$UMULL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]UMULL{2} Vd.Ta, Vn.Tb, Vm.Tb UMULL -- A64Unsigned multiply longUMULL Xd, Wn, WmUMADDL Xd, Wn, Wm, XZRBRKNbrknBRKNIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise leaves the destination and second source predicate unchanged. Does not set the condition flags.BRKNPdm.B, Pg/Z, Pn.B, Pdm.BCPYPTcpyptCPYPT*Memory copy, reads and writes unprivilegedCPYPT [Xd]!, [Xs]!, Xn!CPYMT [Xd]!, [Xs]!, Xn!CPYET [Xd]!, [Xs]!, Xn!FMOPAfmopaFMOPAnThe 8-bit floating-point sum of outer products and accumulate instruction works with a 16-bit element ZA tile.#FMOPAZAda.H, Pn/M, Pm/M, Zn.B, Zm.B#FMOPAZAda.S, Pn/M, Pm/M, Zn.B, Zm.B#FMOPAZAda.S, Pn/M, Pm/M, Zn.H, Zm.H#FMOPAZAda.H, Pn/M, Pm/M, Zn.H, Zm.H#FMOPAZAda.S, Pn/M, Pm/M, Zn.S, Zm.S#FMOPAZAda.D, Pn/M, Pm/M, Zn.D, Zm.DLASTBlastbLASTBIf there is an active element then extract the last active element from the final source vector register. If there are no active elements, extract the highest-numbered element. Then zero-extend and place the extracted element in the destination general-purpose register.LASTBRd, Pg, Zn.TLASTBVd, Pg, Zn.TRCWSETPrcwsetpRCWSETP5Read check write atomic bit set on quadword in memoryRCWSETPXt1, Xt2, [Xn|SP]RCWSETPAXt1, Xt2, [Xn|SP]RCWSETPALXt1, Xt2, [Xn|SP]RCWSETPLXt1, Xt2, [Xn|SP]FMAXVfmaxvFMAXV$Floating-point maximum across vector FMAXVVd, Vn.TFMAXV Sd, Vn.4SFMAXVVd, Pg, Zn.TLDUMINlduminLDUMIN7Atomic unsigned minimum on word or doubleword in memory LDUMINWs, Wt, [Xn|SP]LDUMINAWs, Wt, [Xn|SP]LDUMINALWs, Wt, [Xn|SP]LDUMINLWs, Wt, [Xn|SP]LDUMINXs, Xt, [Xn|SP]LDUMINAXs, Xt, [Xn|SP]LDUMINALXs, Xt, [Xn|SP]LDUMINLXs, Xt, [Xn|SP])STUMINWs, [Xn|SP]LDUMIN Ws, WZR, [Xn|SP]+STUMINLWs, [Xn|SP]LDUMINL Ws, WZR, [Xn|SP])STUMINXs, [Xn|SP]LDUMIN Xs, XZR, [Xn|SP]+STUMINLXs, [Xn|SP]LDUMINL Xs, XZR, [Xn|SP]LD1RODld1rodLD1RODLoad four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address.%LD1ROD{ Zt.D }, Pg/Z, [Xn|SP{, #imm}])LD1ROD{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #3]MVNImvniMVNI Move inverted immediate (vector)MVNIVd.T, #imm8{, LSL #amount}MVNIVd.T, #imm8{, LSL #amount}MVNIVd.T, #imm8, MSL #amountSPLICEspliceSPLICESelect a region from the first source vector and copy it to the lowest-numbered elements of the result. Then set any remaining elements of the result to a copy of the lowest-numbered elements from the second source vector. The region is selected using the first and last true elements in the vector select predicate register. The result is placed destructively in the destination and first source vector, or constructively in the destination vector. SPLICEZd.T, Pv, { Zn1.T, Zn2.T }SPLICEZdn.T, Pv, Zdn.T, Zm.TFCMPfcmpFCMP%Floating-point quiet compare (scalar) FCMPHn, Hm FCMPHn, #0.0 FCMPSn, Sm FCMPSn, #0.0 FCMPDn, Dm FCMPDn, #0.0LDGldgLDGLoad Allocation TagLDGXt, [Xn|SP{, #simm}]ADCSadcsADCSAdd with carry, setting flagsADCSWd, Wn, WmADCSXd, Xn, XmINDEXindexINDEXPopulates the destination vector by setting the first element to the first signed immediate integer operand and monotonically incrementing the value by the second signed immediate integer operand for each subsequent element. This instruction is unpredicated.INDEXZd.T, #imm1, #imm2INDEXZd.T, #imm, RmINDEXZd.T, Rn, #immINDEXZd.T, Rn, RmCMGEcmgeCMGE-Compare signed greater than or equal (vector)CMGE Dd, Dn, DmCMGEVd.T, Vn.T, Vm.TCMGE Dd, Dn, #0CMGEVd.T, Vn.T, #0FADDAfaddaFADDAPFloating-point add a SIMD&FP scalar source and all active lanes of the vector source and place the result destructively in the SIMD&FP scalar source register. Vector elements are processed strictly in order from low to high, with the scalar source providing the initial value. Inactive elements in the source vector are ignored.FADDAVdn, Pg, Vdn, Zm.TMLAPTmlaptMLAPTMultiply with overflow check the elements of the first and second source vectors and add pointer check to elements of the third source (addend) vector. Destructively place the results in the destination and third source (addend) vector.MLAPTZda.D, Zn.D, Zm.DLSLRlslrLSLRReversed shift left active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.LSLRZdn.T, Pg/M, Zdn.T, Zm.TSABDLsabdlSABDLSigned absolute difference longSABDL{2} Vd.Ta, Vn.Tb, Vm.TbZIP1zip1ZIP1Zip vectors (primary)ZIP1Vd.T, Vn.T, Vm.TLDURldurLDUR+Load SIMD&FP register (unscaled offset)LDURBt, [Xn|SP{, #simm}]LDURHt, [Xn|SP{, #simm}]LDURSt, [Xn|SP{, #simm}]LDURDt, [Xn|SP{, #simm}]LDURQt, [Xn|SP{, #simm}]LDURWt, [Xn|SP{, #simm}]LDURXt, [Xn|SP{, #simm}]INSRinsrINSRShift the destination vector left by one element, and then place a copy of the least-significant bits of the general-purpose register in element 0 of the destination vector. This instruction is unpredicated. INSRZdn.T, Rm INSRZdn.T, VmST4st4ST47Store multiple 4-element structures from four registers*ST4 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP]/ST4 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], imm.ST4 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], Xm1ST4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP]1ST4 {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP]1ST4 {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP]1ST4 {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP]5ST4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], #45ST4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], Xm5ST4 {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP], #85ST4 {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP], Xm6ST4 {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP], #165ST4 {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP], Xm6ST4 {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP], #325ST4 {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP], XmCMLEcmleCMLE2Compare signed less than or equal to zero (vector)CMLE Dd, Dn, #0CMLEVd.T, Vn.T, #0FRSQRTSfrsqrtsFRSQRTS*Floating-point reciprocal square root stepFRSQRTSHd, Hn, HmFRSQRTSVd, Vn, VmFRSQRTSVd.T, Vn.T, Vm.TFRSQRTSVd.T, Vn.T, Vm.TFRSQRTSZd.T, Zn.T, Zm.TFCPYfcpyFCPYCopy a floating-point immediate into each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FCPYZd.T, Pg/M, #constFCVTNTfcvtntFCVTNT}Convert each single-precision element of the group of two source vectors to 8-bit floating-point while scaling the value by 2FCVTNTZd.B, { Zn1.S-Zn2.S }FCVTNTZd.H, Pg/M, Zn.SFCVTNTZd.S, Pg/M, Zn.DFMAXNMPfmaxnmpFMAXNMP:Floating-point maximum number of pair of elements (scalar)FMAXNMP Hd, Vn.2HFMAXNMPVd, Vn.TFMAXNMPVd.T, Vn.T, Vm.TFMAXNMPVd.T, Vn.T, Vm.TFMAXNMPZdn.T, Pg/M, Zdn.T, Zm.TSTL1stl1STL1FStore-release a single-element structure from one lane of one registerSTL1 {Vt.D }[index], [Xn|SP]GMIgmiGMITag mask insertGMIXd, Xn|SP, XmCPYPRTWNcpyprtwnCPYPRTWN4Memory copy, reads unprivileged, writes non-temporalCPYPRTWN [Xd]!, [Xs]!, Xn!CPYMRTWN [Xd]!, [Xs]!, Xn!CPYERTWN [Xd]!, [Xs]!, Xn!STURsturSTUR,Store SIMD&FP register (unscaled offset)STURBt, [Xn|SP{, #simm}]STURHt, [Xn|SP{, #simm}]STURSt, [Xn|SP{, #simm}]STURDt, [Xn|SP{, #simm}]STURQt, [Xn|SP{, #simm}]STURWt, [Xn|SP{, #simm}]STURXt, [Xn|SP{, #simm}]SQDMULLsqdmullSQDMULL5Signed saturating doubling multiply long (by element)&SQDMULL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]SQDMULLVad, Vbn, Vm.Ts[index]SQDMULLVad, Vbn, VbmSQDMULL{2} Vd.Ta, Vn.Tb, Vm.TbSETPNsetpnSETPNMemory set, non-temporalSETPN [Xd]!, Xn!, XsSETMN [Xd]!, Xn!, XsSETEN [Xd]!, Xn!, XsFCVTPSfcvtpsFCVTPSPFloating-point convert to signed integer, rounding toward plus infinity (vector) FCVTPSHd, Hn FCVTPSVd, VnFCVTPSVd.T, Vn.TFCVTPSVd.T, Vn.T FCVTPSWd, Hn FCVTPSXd, Hn FCVTPSWd, Sn FCVTPSXd, Sn FCVTPSWd, Dn FCVTPSXd, DnLD1ROHld1rohLD1ROHLoad sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address.%LD1ROH{ Zt.H }, Pg/Z, [Xn|SP{, #imm}])LD1ROH{ Zt.H }, Pg/Z, [Xn|SP, Xm, LSL #1]REV16rev16REV16-Reverse elements in 16-bit halfwords (vector)REV16Vd.T, Vn.T REV16Wd, Wn REV16Xd, XnSTLLRBstllrbSTLLRBStore LORelease register byteSTLLRBWt, [Xn|SP{, #0}] PACIASPPC paciasppc PACIASPPC;Pointer Authentication Code for return address, using key A PACIASPPCRCWSCLRrcwsclrRCWSCLRBRead check write software atomic bit clear on doubleword in memoryRCWSCLRXs, Xt, [Xn|SP]RCWSCLRAXs, Xt, [Xn|SP]RCWSCLRALXs, Xt, [Xn|SP]RCWSCLRLXs, Xt, [Xn|SP]FSQRTfsqrtFSQRT#Floating-point square root (vector)FSQRTVd.T, Vn.TFSQRTVd.T, Vn.T FSQRTHd, Hn FSQRTSd, Sn FSQRTDd, DnFSQRTZd.T, Pg/M, Zn.TLDAPURHldapurhLDAPURH.Load-acquire RCpc register halfword (unscaled)LDAPURHWt, [Xn|SP{, #simm}] SM3PARTW1 sm3partw1 SM3PARTW1 SM3PARTW1SM3PARTW1Vd.4S, Vn.4S, Vm.4SERETeretERETException returnERETSMLALsmlalSMLAL-Signed multiply-add long (vector, by element) $SMLAL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]SMLAL{2} Vd.Ta, Vn.Tb, Vm.Tb0SMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CSMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CSMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])SMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<SMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<SMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGSMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GSMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }UBFXubfxUBFX UBFX -- A64Unsigned bitfield extractUBFX Wd, Wn, #lsb, #width"UBFM Wd, Wn, #lsb, #(lsb+width-1)UBFX Xd, Xn, #lsb, #width"UBFM Xd, Xn, #lsb, #(lsb+width-1)LSLVlslvLSLVLogical shift left variableLSLVWd, Wn, WmLSLVXd, Xn, XmCLZclzCLZ Count leading zero bits (vector) CLZVd.T, Vn.T CLZWd, Wn CLZXd, XnCLZZd.T, Pg/M, Zn.TEONeonEON+Bitwise exclusive-OR NOT (shifted register)EONWd, Wn, Wm{, shift #amount}EONXd, Xn, Xm{, shift #amount}EONKBitwise exclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.EON Zdn.T, Zdn.T, #constEOR Zdn.T, Zdn.T, #(-const - 1)URSHRurshrURSHR)Unsigned rounding shift right (immediate)URSHR Dd, Dn, #shiftURSHRVd.T, Vn.T, #shiftURSHRZdn.T, Pg/M, Zdn.T, #constSABAsabaSABA)Signed absolute difference and accumulateSABAVd.T, Vn.T, Vm.TSABAZda.T, Zn.T, Zm.TCOMPACTcompactCOMPACTRead the active elements from the source vector and pack them into the lowest-numbered elements of the destination vector. Then set any remaining elements of the destination vector to zero.COMPACTZd.T, Pg, Zn.TFRINTIfrintiFRINTI!Round to an integral floating-point value with the specified rounding option from each active floating-point element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified. FRINTIZd.T, Pg/M, Zn.TFRINTXZd.T, Pg/M, Zn.TFRINTAZd.T, Pg/M, Zn.TFRINTNZd.T, Pg/M, Zn.TFRINTZZd.T, Pg/M, Zn.TFRINTMZd.T, Pg/M, Zn.TFRINTPZd.T, Pg/M, Zn.TFRINTIVd.T, Vn.TFRINTIVd.T, Vn.T FRINTIHd, Hn FRINTISd, Sn FRINTIDd, DnRSHRNTrshrntRSHRNT]Shift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.RSHRNTZd.T, Zn.Tb, #constLDSMINABldsminabLDSMINAB'Atomic signed minimum on byte in memoryLDSMINABWs, Wt, [Xn|SP]LDSMINALBWs, Wt, [Xn|SP]LDSMINBWs, Wt, [Xn|SP]LDSMINLBWs, Wt, [Xn|SP]WHILELSwhilelsWHILELSGenerate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, unsigned scalar operand is lower or same as the second scalar operand and false thereafter up to the highest numbered element.WHILELSPd.T, Rn, RmWHILELSPNd.T, Xn, Xm, vlWHILELS{ Pd1.T, Pd2.T }, Xn, XmLD1SBld1sbLD1SB)Gather load of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector. #LD1SB{ Zt.S }, Pg/Z, [Zn.S{, #imm}]#LD1SB{ Zt.D }, Pg/Z, [Zn.D{, #imm}],LD1SB{ Zt.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}],LD1SB{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}],LD1SB{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}] LD1SB{ Zt.H }, Pg/Z, [Xn|SP, Xm] LD1SB{ Zt.S }, Pg/Z, [Xn|SP, Xm] LD1SB{ Zt.D }, Pg/Z, [Xn|SP, Xm]'LD1SB{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod]'LD1SB{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod]"LD1SB{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]BTIbtiBTIBranch target identificationBTI {targets}LDIAPPldiappLDIAPP+Load-Acquire RCpc ordered pair of registersLDIAPPWt1, Wt2, [Xn|SP], #8LDIAPPWt1, Wt2, [Xn|SP]LDIAPPXt1, Xt2, [Xn|SP], #16LDIAPPXt1, Xt2, [Xn|SP]TRCITtrcitTRCIT TRCIT -- A64Trace instrumentation TRCIT XtSYS #3, C7, C2, #7, XtFDIVRfdivrFDIVR5Reversed divide active floating-point elements of the second source vector by corresponding floating-point elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.FDIVRZdn.T, Pg/M, Zdn.T, Zm.TUQRSHRuqrshrUQRSHRShift right by an immediate value, the unsigned integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2#UQRSHRZd.H, { Zn1.S-Zn2.S }, #const%UQRSHRZd.T, { Zn1.Tb-Zn4.Tb }, #constGCSSTTRgcssttrGCSSTTR(Guarded Control Stack unprivileged storeGCSSTTRXt, [Xn|SP]EOR3eor3EOR3Three-way exclusive-OR"EOR3Vd.16B, Vn.16B, Vm.16B, Va.16BEOR3Zdn.D, Zdn.D, Zm.D, Zk.DMLAmlaMLA0Multiply-add to accumulator (vector, by element)MLAVd.T, Vn.T, Vm.Ts[index]MLAVd.T, Vn.T, Vm.TMLAZda.T, Pg/M, Zn.T, Zm.TMLAZda.H, Zn.H, Zm.H[imm]MLAZda.S, Zn.S, Zm.S[imm]MLAZda.D, Zn.D, Zm.D[imm]LDURSHldurshLDURSH(Load register signed halfword (unscaled)LDURSHWt, [Xn|SP{, #simm}]LDURSHXt, [Xn|SP{, #simm}]LDNT1SHldnt1shLDNT1SH0Gather load non-temporal of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.#LDNT1SH{ Zt.S }, Pg/Z, [Zn.S{, Xm}]#LDNT1SH{ Zt.D }, Pg/Z, [Zn.D{, Xm}]UZP1uzp1UZP1Unzip vectors (primary)UZP1Vd.T, Vn.T, Vm.TUZP1Pd.T, Pn.T, Pm.TUZP2Pd.T, Pn.T, Pm.TUZP1Zd.T, Zn.T, Zm.TUZP1Zd.Q, Zn.Q, Zm.QUZP2Zd.T, Zn.T, Zm.TUZP2Zd.Q, Zn.Q, Zm.QCPYFPRTNcpyfprtnCPYFPRTNKMemory copy forward-only, reads unprivileged, reads and writes non-temporalCPYFPRTN [Xd]!, [Xs]!, Xn!CPYFMRTN [Xd]!, [Xs]!, Xn!CPYFERTN [Xd]!, [Xs]!, Xn!FRINT32Zfrint32zFRINT32Z;Floating-point round to 32-bit integer toward zero (vector)FRINT32ZVd.T, Vn.TFRINT32ZSd, SnFRINT32ZDd, DnUMLALTumlaltUMLALTMultiply the corresponding odd-numbered unsigned elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.UMLALTZda.T, Zn.Tb, Zm.TbUMLALTZda.S, Zn.H, Zm.H[imm]UMLALTZda.D, Zn.S, Zm.S[imm]SHRNTshrntSHRNT_Shift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.SHRNTZd.T, Zn.Tb, #constCRC32Bcrc32bCRC32BCRC32 checksumCRC32BWd, Wn, WmCRC32HWd, Wn, WmCRC32WWd, Wn, WmCRC32XWd, Wn, Xm SQRDCMLAH sqrdcmlah SQRDCMLAHMultiply without saturation the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of the integral numbers in the first source vector by the corresponding complex number in the second source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation.!SQRDCMLAHZda.T, Zn.T, Zm.T, const&SQRDCMLAHZda.H, Zn.H, Zm.H[imm], const&SQRDCMLAHZda.S, Zn.S, Zm.S[imm], constLDSMINAHldsminahLDSMINAH+Atomic signed minimum on halfword in memoryLDSMINAHWs, Wt, [Xn|SP]LDSMINALHWs, Wt, [Xn|SP]LDSMINHWs, Wt, [Xn|SP]LDSMINLHWs, Wt, [Xn|SP]LDAXRHldaxrhLDAXRH(Load-acquire exclusive register halfwordLDAXRHWt, [Xn|SP{, #0}] PACNBIBSPPC pacnbibsppc PACNBIBSPPCPPointer Authentication Code for return address, using key B, not a branch target PACNBIBSPPCADDVAaddvaADDVAAdd each element of the source vector to the corresponding active element of each vertical slice of a ZA tile. The tile elements are predicated by a pair of governing predicates. An element of a vertical slice is considered active if its corresponding element in the first governing predicate is TRUE and the element corresponding to its vertical slice number in the second governing predicate is TRUE. Inactive elements in the destination tile remain unmodified.ADDVAZAda.S, Pn/M, Pm/M, Zn.SADDVAZAda.D, Pn/M, Pm/M, Zn.DSQINCBsqincbSQINCBjDetermines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQINCBXdn, Wdn{, pattern{, MUL #imm}} SQINCBXdn{, pattern{, MUL #imm}}SQSHLRsqshlrSQSHLRShift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's signed integer range -2SQSHLRZdn.T, Pg/M, Zdn.T, Zm.TUMLALumlalUMLAL/Unsigned multiply-add long (vector, by element) $UMLAL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]UMLAL{2} Vd.Ta, Vn.Tb, Vm.Tb0UMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CUMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CUMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])UMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<UMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<UMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGUMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GUMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }DECDdecdDECDDetermines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. DECDZdn.D{, pattern{, MUL #imm}} DECHZdn.H{, pattern{, MUL #imm}} DECWZdn.S{, pattern{, MUL #imm}}SMINsminSMINSigned minimum (vector) SMINVd.T, Vn.T, Vm.TSMINWd, Wn, #simmSMINXd, Xn, #simm.SMIN{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T.SMIN{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T9SMIN{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }9SMIN{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }SMINWd, Wn, WmSMINXd, Xn, XmSMINZdn.T, Pg/M, Zdn.T, Zm.TSMINZdn.T, Zdn.T, #immSELselSELRead active elements from the two or four first source vectors and inactive elements from the two or four second source vectors and place in the corresponding elements of the two or four destination vectors.9SEL{ Zd1.T-Zd2.T }, PNg, { Zn1.T-Zn2.T }, { Zm1.T-Zm2.T }9SEL{ Zd1.T-Zd4.T }, PNg, { Zn1.T-Zn4.T }, { Zm1.T-Zm4.T }SELPd.B, Pg, Pn.B, Pm.BSELZd.T, Pv, Zn.T, Zm.TSUMOPAsumopaSUMOPA>The 8-bit integer variant works with a 32-bit element ZA tile.$SUMOPAZAda.S, Pn/M, Pm/M, Zn.B, Zm.B$SUMOPAZAda.D, Pn/M, Pm/M, Zn.H, Zm.HFRECPEfrecpeFRECPE"Floating-point reciprocal estimate FRECPEHd, Hn FRECPEVd, VnFRECPEVd.T, Vn.TFRECPEVd.T, Vn.TFRECPEZd.T, Zn.TLD2Bld2bLD2B-Contiguous load two-byte structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,3LD2B{ Zt1.B, Zt2.B }, Pg/Z, [Xn|SP{, #imm, MUL VL}]'LD2B{ Zt1.B, Zt2.B }, Pg/Z, [Xn|SP, Xm]FMINNMfminnmFMINNM&Floating-point minimum number (vector) FMINNMVd.T, Vn.T, Vm.TFMINNMVd.T, Vn.T, Vm.TFMINNMHd, Hn, HmFMINNMSd, Sn, SmFMINNMDd, Dn, Dm0FMINNM{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T0FMINNM{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T;FMINNM{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T };FMINNM{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }FMINNMZdn.T, Pg/M, Zdn.T, constFMINNMZdn.T, Pg/M, Zdn.T, Zm.TUQRSHRNTuqrshrntUQRSHRNT9Shift each unsigned integer value in the source vector elements by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2UQRSHRNTZd.T, Zn.Tb, #constSQRSHLRsqrshlrSQRSHLRShift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's signed integer range -2SQRSHLRZdn.T, Pg/M, Zdn.T, Zm.TUSVDOTusvdotUSVDOTThe unsigned by signed integer vertical dot product instruction computes the vertical dot product of corresponding unsigned 8-bit elements from the four first source vectors and four signed 8-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product result is destructively added to the corresponding 32-bit element of the ZA single-vector groups.<USVDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]SUBHNBsubhnbSUBHNB5Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.SUBHNBZd.T, Zn.Tb, Zm.TbSSBBssbbSSBB SSBB -- A64 Speculative store bypass barrierSSBBDSB #0SMLSLBsmlslbSMLSLBMultiply the corresponding even-numbered signed elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.SMLSLBZda.T, Zn.Tb, Zm.TbSMLSLBZda.S, Zn.H, Zm.H[imm]SMLSLBZda.D, Zn.S, Zm.S[imm]MOVmovMOV!MOV (to/from SP) -- A64Move (to/from SP)MOV Wd|WSP, Wn|WSPADD Wd|WSP, Wn|WSP, #0MOV Xd|SP, Xn|SPADD Xd|SP, Xn|SP, #0$MOV (predicate, predicated, zeroing)Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.MOV Pd.B, Pg/Z, Pn.BAND Pd.B, Pg/Z, Pn.B, Pn.B$MOV (immediate, predicated, zeroing)Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register are set to zero.!MOV Zd.T, Pg/Z, #imm{, shift}!CPY Zd.T, Pg/Z, #imm{, shift}$MOV (immediate, predicated, merging)Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.!MOV Zd.T, Pg/M, #imm{, shift}!CPY Zd.T, Pg/M, #imm{, shift}MOV (scalar, predicated)Move the general-purpose scalar source register to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.MOV Zd.T, Pg/M, Rn|SPCPY Zd.T, Pg/M, Rn|SP$MOV (SIMD&FP scalar, predicated)Move the SIMD & floating-point scalar source register to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.MOV Zd.T, Pg/M, VnCPY Zd.T, Pg/M, VnMOV (scalar) -- A64Move vector element to scalarMOV Vd, Vn.T[index]DUP Vd, Vn.T[index]MOV (immediate, unpredicated)Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is unpredicated.MOV Zd.T, #imm{, shift}DUP Zd.T, #imm{, shift}MOV (scalar, unpredicated)Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This instruction is unpredicated.MOV Zd.T, Rn|SPDUP Zd.T, Rn|SP&MOV (SIMD&FP scalar, unpredicated)Unconditionally broadcast the SIMD&FP scalar into each element of the destination vector. This instruction is unpredicated.MOV Zd.T, Zn.T[imm]DUP Zd.T, Zn.T[imm]MOV Zd.T, VnDUP Zd.T, Zn.T[0]MOVUnconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits.MOV Zd.T, #constDUPM Zd.T, #constMOV (element) -- A64-Move vector element to another vector element!MOV Vd.Ts[index1], Vn.Ts[index2]!INS Vd.Ts[index1], Vn.Ts[index2]MOV (from general) -- A641Move general-purpose register to a vector elementMOV Vd.Ts[index], RnINS Vd.Ts[index], Rn#MOV (tile to vector, two registers)The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.1MOV { Zd1.B-Zd2.B }, ZA0HV.B[Ws, offs1:offs2]1MOVA { Zd1.B-Zd2.B }, ZA0HV.B[Ws, offs1:offs2]1MOV { Zd1.H-Zd2.H }, ZAnHV.H[Ws, offs1:offs2]1MOVA { Zd1.H-Zd2.H }, ZAnHV.H[Ws, offs1:offs2]1MOV { Zd1.S-Zd2.S }, ZAnHV.S[Ws, offs1:offs2]1MOVA { Zd1.S-Zd2.S }, ZAnHV.S[Ws, offs1:offs2]1MOV { Zd1.D-Zd2.D }, ZAnHV.D[Ws, offs1:offs2]1MOVA { Zd1.D-Zd2.D }, ZAnHV.D[Ws, offs1:offs2]$MOV (tile to vector, four registers)The instruction operates on four consecutive horizontal or vertical slices within a named ZA tile of the specified element size.1MOV { Zd1.B-Zd4.B }, ZA0HV.B[Ws, offs1:offs4]1MOVA { Zd1.B-Zd4.B }, ZA0HV.B[Ws, offs1:offs4]1MOV { Zd1.H-Zd4.H }, ZAnHV.H[Ws, offs1:offs4]1MOVA { Zd1.H-Zd4.H }, ZAnHV.H[Ws, offs1:offs4]1MOV { Zd1.S-Zd4.S }, ZAnHV.S[Ws, offs1:offs4]1MOVA { Zd1.S-Zd4.S }, ZAnHV.S[Ws, offs1:offs4]1MOV { Zd1.D-Zd4.D }, ZAnHV.D[Ws, offs1:offs4]1MOVA { Zd1.D-Zd4.D }, ZAnHV.D[Ws, offs1:offs4]$MOV (array to vector, two registers)8The instruction operates on two ZA single-vector groups./MOV { Zd1.D-Zd2.D }, ZA.D[Wv, offs{, VGx2}]/MOVA { Zd1.D-Zd2.D }, ZA.D[Wv, offs{, VGx2}]%MOV (array to vector, four registers)9The instruction operates on four ZA single-vector groups./MOV { Zd1.D-Zd4.D }, ZA.D[Wv, offs{, VGx4}]/MOVA { Zd1.D-Zd4.D }, ZA.D[Wv, offs{, VGx4}]MOV (tile to vector, single)zThe instruction operates on individual horizontal or vertical slices within a named ZA tile of the specified element size. The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of such elements in a vector. The immediate offset is in the range 0 to the number of elements in a 128-bit vector segment minus 1. %MOV Zd.B, Pg/M, ZA0HV.B[Ws, offs]%MOVA Zd.B, Pg/M, ZA0HV.B[Ws, offs]%MOV Zd.H, Pg/M, ZAnHV.H[Ws, offs]%MOVA Zd.H, Pg/M, ZAnHV.H[Ws, offs]%MOV Zd.S, Pg/M, ZAnHV.S[Ws, offs]%MOVA Zd.S, Pg/M, ZAnHV.S[Ws, offs]%MOV Zd.D, Pg/M, ZAnHV.D[Ws, offs]%MOVA Zd.D, Pg/M, ZAnHV.D[Ws, offs]%MOV Zd.Q, Pg/M, ZAnHV.Q[Ws, offs]%MOVA Zd.Q, Pg/M, ZAnHV.Q[Ws, offs]#MOV (vector to tile, two registers)The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.1MOV ZA0HV.B[Ws, offs1:offs2], { Zn1.B-Zn2.B }1MOVA ZA0HV.B[Ws, offs1:offs2], { Zn1.B-Zn2.B }1MOV ZAdHV.H[Ws, offs1:offs2], { Zn1.H-Zn2.H }1MOVA ZAdHV.H[Ws, offs1:offs2], { Zn1.H-Zn2.H }1MOV ZAdHV.S[Ws, offs1:offs2], { Zn1.S-Zn2.S }1MOVA ZAdHV.S[Ws, offs1:offs2], { Zn1.S-Zn2.S }1MOV ZAdHV.D[Ws, offs1:offs2], { Zn1.D-Zn2.D }1MOVA ZAdHV.D[Ws, offs1:offs2], { Zn1.D-Zn2.D }$MOV (vector to tile, four registers)The instruction operates on four consecutive horizontal or vertical slices within a named ZA tile of the specified element size.1MOV ZA0HV.B[Ws, offs1:offs4], { Zn1.B-Zn4.B }1MOVA ZA0HV.B[Ws, offs1:offs4], { Zn1.B-Zn4.B }1MOV ZAdHV.H[Ws, offs1:offs4], { Zn1.H-Zn4.H }1MOVA ZAdHV.H[Ws, offs1:offs4], { Zn1.H-Zn4.H }1MOV ZAdHV.S[Ws, offs1:offs4], { Zn1.S-Zn4.S }1MOVA ZAdHV.S[Ws, offs1:offs4], { Zn1.S-Zn4.S }1MOV ZAdHV.D[Ws, offs1:offs4], { Zn1.D-Zn4.D }1MOVA ZAdHV.D[Ws, offs1:offs4], { Zn1.D-Zn4.D }$MOV (vector to array, two registers)8The instruction operates on two ZA single-vector groups./MOV ZA.D[Wv, offs{, VGx2}], { Zn1.D-Zn2.D }/MOVA ZA.D[Wv, offs{, VGx2}], { Zn1.D-Zn2.D }%MOV (vector to array, four registers)9The instruction operates on four ZA single-vector groups./MOV ZA.D[Wv, offs{, VGx4}], { Zn1.D-Zn4.D }/MOVA ZA.D[Wv, offs{, VGx4}], { Zn1.D-Zn4.D }MOV (vector to tile, single)zThe instruction operates on individual horizontal or vertical slices within a named ZA tile of the specified element size. The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of such elements in a vector. The immediate offset is in the range 0 to the number of elements in a 128-bit vector segment minus 1. %MOV ZA0HV.B[Ws, offs], Pg/M, Zn.B%MOVA ZA0HV.B[Ws, offs], Pg/M, Zn.B%MOV ZAdHV.H[Ws, offs], Pg/M, Zn.H%MOVA ZAdHV.H[Ws, offs], Pg/M, Zn.H%MOV ZAdHV.S[Ws, offs], Pg/M, Zn.S%MOVA ZAdHV.S[Ws, offs], Pg/M, Zn.S%MOV ZAdHV.D[Ws, offs], Pg/M, Zn.D%MOVA ZAdHV.D[Ws, offs], Pg/M, Zn.D%MOV ZAdHV.Q[Ws, offs], Pg/M, Zn.Q%MOVA ZAdHV.Q[Ws, offs], Pg/M, Zn.Q$MOV (inverted wide immediate) -- A64Move (inverted wide immediate) MOV Wd, #immMOVN Wd, #imm16, LSL #shift MOV Xd, #immMOVN Xd, #imm16, LSL #shiftMOV (wide immediate) -- A64Move (wide immediate) MOV Wd, #immMOVZ Wd, #imm16, LSL #shift MOV Xd, #immMOVZ Xd, #imm16, LSL #shiftMOV (vector) -- A64 Move vectorMOV Vd.T, Vn.TORR Vd.T, Vn.T, Vn.TMOV (bitmask immediate) -- A64Move (bitmask immediate)MOV Wd|WSP, #immORR Wd|WSP, WZR, #immMOV Xd|SP, #immORR Xd|SP, XZR, #immMOV (register) -- A64Move (register) MOV Wd, WmORR Wd, WZR, Wm MOV Xd, XmORR Xd, XZR, XmMOVRead all elements from the source predicate and place in the destination predicate. This instruction is unpredicated. Does not set the condition flags.MOV Pd.B, Pn.BORR Pd.B, Pn/Z, Pn.B, Pn.BMOV (vector, unpredicated)7Move vector register. This instruction is unpredicated.MOV Zd.D, Zn.DORR Zd.D, Zn.D, Zn.D$MOV (predicate, predicated, merging)Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register remain unmodified. Does not set the condition flags.MOV Pd.B, Pg/M, Pn.BSEL Pd.B, Pg, Pn.B, Pd.BMOV (vector, predicated)Move elements from the source vector to the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.MOV Zd.T, Pv/M, Zn.TSEL Zd.T, Pv, Zn.T, Zd.TMOV (to general) -- A64/Move vector element to general-purpose registerMOV Wd, Vn.S[index]UMOV Wd, Vn.S[index]MOV Xd, Vn.D[index]UMOV Xd, Vn.D[index]SXTLsxtlSXTLSXTL, SXTL2 -- A64Signed extend longSXTL{2} Vd.Ta, Vn.TbSSHLL{2} Vd.Ta, Vn.Tb, #0WFIwfiWFIWait for interruptWFIRCWCLRrcwclrRCWCLR9Read check write atomic bit clear on doubleword in memoryRCWCLRXs, Xt, [Xn|SP]RCWCLRAXs, Xt, [Xn|SP]RCWCLRALXs, Xt, [Xn|SP]RCWCLRLXs, Xt, [Xn|SP]ST1Wst1wST1WContiguous store of words from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.1ST1W{ Zt1.S-Zt2.S }, PNg, [Xn|SP{, #imm, MUL VL}]1ST1W{ Zt1.S-Zt4.S }, PNg, [Xn|SP{, #imm, MUL VL}]-ST1W{ Zt1.S-Zt2.S }, PNg, [Xn|SP, Xm, LSL #2]-ST1W{ Zt1.S-Zt4.S }, PNg, [Xn|SP, Xm, LSL #2]2ST1W{ Zt1.S, Zt2.S }, PNg, [Xn|SP{, #imm, MUL VL}]@ST1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg, [Xn|SP{, #imm, MUL VL}].ST1W{ Zt1.S, Zt2.S }, PNg, [Xn|SP, Xm, LSL #2]<ST1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg, [Xn|SP, Xm, LSL #2] ST1W{ Zt.S }, Pg, [Zn.S{, #imm}] ST1W{ Zt.D }, Pg, [Zn.D{, #imm}])ST1W{ Zt.T }, Pg, [Xn|SP{, #imm, MUL VL}])ST1W{ Zt.Q }, Pg, [Xn|SP{, #imm, MUL VL}]%ST1W{ Zt.T }, Pg, [Xn|SP, Xm, LSL #2]%ST1W{ Zt.Q }, Pg, [Xn|SP, Xm, LSL #2]'ST1W{ Zt.S }, Pg, [Xn|SP, Zm.S, mod #2]'ST1W{ Zt.D }, Pg, [Xn|SP, Zm.D, mod #2]$ST1W{ Zt.D }, Pg, [Xn|SP, Zm.D, mod]$ST1W{ Zt.S }, Pg, [Xn|SP, Zm.S, mod]'ST1W{ Zt.D }, Pg, [Xn|SP, Zm.D, LSL #2]ST1W{ Zt.D }, Pg, [Xn|SP, Zm.D]4ST1W{ ZAtHV.S[Ws, offs] }, Pg, [Xn|SP{, Xm, LSL #2}] SHA512SU1 sha512su1 SHA512SU1SHA512 schedule update 1SHA512SU1Vd.2D, Vn.2D, Vm.2DFCVTZSfcvtzsFCVTZSKFloating-point convert to signed fixed-point, rounding toward zero (vector)FCVTZSVd, Vn, #fbitsFCVTZSVd.T, Vn.T, #fbits FCVTZSHd, Hn FCVTZSVd, VnFCVTZSVd.T, Vn.TFCVTZSVd.T, Vn.TFCVTZSWd, Hn, #fbitsFCVTZSXd, Hn, #fbitsFCVTZSWd, Sn, #fbitsFCVTZSXd, Sn, #fbitsFCVTZSWd, Dn, #fbitsFCVTZSXd, Dn, #fbits FCVTZSWd, Hn FCVTZSXd, Hn FCVTZSWd, Sn FCVTZSXd, Sn FCVTZSWd, Dn FCVTZSXd, Dn&FCVTZS{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }&FCVTZS{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }FCVTZSZd.H, Pg/M, Zn.HFCVTZSZd.S, Pg/M, Zn.HFCVTZSZd.D, Pg/M, Zn.HFCVTZSZd.S, Pg/M, Zn.SFCVTZSZd.D, Pg/M, Zn.SFCVTZSZd.S, Pg/M, Zn.DFCVTZSZd.D, Pg/M, Zn.DFMLSLBfmlslbFMLSLBThis half-precision floating-point multiply-subtract long instruction widens the even-numbered half-precision elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding half-precision elements in the source vectors. This instruction is unpredicated.FMLSLBZda.S, Zn.H, Zm.HFMLSLBZda.S, Zn.H, Zm.H[imm]FACLTfacltFACLTFACLTCompare active absolute values of floating-point elements in the first source vector being less than corresponding absolute values of elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.FACLT Pd.T, Pg/Z, Zm.T, Zn.TFACGT Pd.T, Pg/Z, Zn.T, Zm.TFMINNMQVfminnmqvFMINNMQV9Floating-point minimum number of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as the default NaN.FMINNMQVVd.T, Pg, Zn.TbFCSELfcselFCSEL*Floating-point conditional select (scalar)FCSELHd, Hn, Hm, condFCSELSd, Sn, Sm, condFCSELDd, Dn, Dm, condST64Bst64bST64B6Single-copy atomic 64-byte store without status resultST64BXt, [Xn|SP {, #0}]UXTLuxtlUXTLUXTL, UXTL2 -- A64Unsigned extend longUXTL{2} Vd.Ta, Vn.TbUSHLL{2} Vd.Ta, Vn.Tb, #0LD1ROWld1rowLD1ROWLoad eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address.%LD1ROW{ Zt.S }, Pg/Z, [Xn|SP{, #imm}])LD1ROW{ Zt.S }, Pg/Z, [Xn|SP, Xm, LSL #2]ADDHNaddhnADDHNAdd returning high narrowADDHN{2} Vd.Tb, Vn.Ta, Vm.TaLDAPURSHldapurshLDAPURSH5Load-acquire RCpc register signed halfword (unscaled)LDAPURSHWt, [Xn|SP{, #simm}]LDAPURSHXt, [Xn|SP{, #simm}]LDFF1SHldff1shLDFF1SHZGather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector. %LDFF1SH{ Zt.S }, Pg/Z, [Zn.S{, #imm}]%LDFF1SH{ Zt.D }, Pg/Z, [Zn.D{, #imm}],LDFF1SH{ Zt.S }, Pg/Z, [Xn|SP{, Xm, LSL #1}],LDFF1SH{ Zt.D }, Pg/Z, [Xn|SP{, Xm, LSL #1}],LDFF1SH{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod #1],LDFF1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #1])LDFF1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod])LDFF1SH{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod],LDFF1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #1]$LDFF1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]LDUMINABlduminabLDUMINAB)Atomic unsigned minimum on byte in memoryLDUMINABWs, Wt, [Xn|SP]LDUMINALBWs, Wt, [Xn|SP]LDUMINBWs, Wt, [Xn|SP]LDUMINLBWs, Wt, [Xn|SP]BFXILbfxilBFXIL BFXIL -- A64&Bitfield extract and insert at low endBFXIL Wd, Wn, #lsb, #width!BFM Wd, Wn, #lsb, #(lsb+width-1)BFXIL Xd, Xn, #lsb, #width!BFM Xd, Xn, #lsb, #(lsb+width-1)LD1RWld1rwLD1RWLoad a single unsigned word from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 4 in the range 0 to 252.$LD1RW{ Zt.S }, Pg/Z, [Xn|SP{, #imm}]$LD1RW{ Zt.D }, Pg/Z, [Xn|SP{, #imm}]BFCVTNbfcvtnBFCVTNHFloating-point convert from single-precision to BFloat16 format (vector)BFCVTN{2} Vd.Ta, Vn.4SBFCVTNZd.B, { Zn1.H-Zn2.H }BFCVTNZd.H, { Zn1.S-Zn2.S }FCVTASfcvtasFCVTASXFloating-point convert to signed integer, rounding to nearest with ties to away (vector) FCVTASHd, Hn FCVTASVd, VnFCVTASVd.T, Vn.TFCVTASVd.T, Vn.T FCVTASWd, Hn FCVTASXd, Hn FCVTASWd, Sn FCVTASXd, Sn FCVTASWd, Dn FCVTASXd, DnPMULLBpmullbPMULLBPolynomial multiply over [0, 1] the corresponding even-numbered elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.PMULLBZd.T, Zn.Tb, Zm.TbPMULLBZd.Q, Zn.D, Zm.DUMADDLumaddlUMADDLUnsigned multiply-add longUMADDLXd, Wn, Wm, XaMOVKmovkMOVKMove wide with keepMOVKWd, #imm{, LSL #shift}MOVKXd, #imm{, LSL #shift}BDEPbdepBDEPPThis instruction scatters the lowest-numbered contiguous bits within each element of the first source vector to the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector, preserving their order, and set the bits corresponding to a zero mask bit to zero. This instruction is unpredicated.BDEPZd.T, Zn.T, Zm.TASRRasrrASRRReversed shift right, preserving the sign bit, active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.ASRRZdn.T, Pg/M, Zdn.T, Zm.TLDCLRAHldclrahLDCLRAH&Atomic bit clear on halfword in memoryLDCLRAHWs, Wt, [Xn|SP]LDCLRALHWs, Wt, [Xn|SP]LDCLRHWs, Wt, [Xn|SP]LDCLRLHWs, Wt, [Xn|SP]CPYPRTcpyprtCPYPRTMemory copy, reads unprivilegedCPYPRT [Xd]!, [Xs]!, Xn!CPYMRT [Xd]!, [Xs]!, Xn!CPYERT [Xd]!, [Xs]!, Xn!SQXTUNsqxtunSQXTUN)Signed saturating extract unsigned narrowSQXTUNVbd, VanSQXTUN{2} Vd.Tb, Vn.TaLD1Dld1dLD1DContiguous load of unsigned doublewords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.3LD1D{ Zt1.D-Zt2.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]3LD1D{ Zt1.D-Zt4.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]/LD1D{ Zt1.D-Zt2.D }, PNg/Z, [Xn|SP, Xm, LSL #3]/LD1D{ Zt1.D-Zt4.D }, PNg/Z, [Xn|SP, Xm, LSL #3]4LD1D{ Zt1.D, Zt2.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]BLD1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]0LD1D{ Zt1.D, Zt2.D }, PNg/Z, [Xn|SP, Xm, LSL #3]>LD1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg/Z, [Xn|SP, Xm, LSL #3]"LD1D{ Zt.D }, Pg/Z, [Zn.D{, #imm}]+LD1D{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1D{ Zt.Q }, Pg/Z, [Xn|SP{, #imm, MUL VL}]'LD1D{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #3]'LD1D{ Zt.Q }, Pg/Z, [Xn|SP, Xm, LSL #3])LD1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #3]&LD1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod])LD1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #3]!LD1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]6LD1D{ ZAtHV.D[Ws, offs] }, Pg/Z, [Xn|SP{, Xm, LSL #3}]SETGPTNsetgptnSETGPTN:Memory set with tag setting, unprivileged and non-temporalSETGPTN [Xd]!, Xn!, XsSETGMTN [Xd]!, Xn!, XsSETGETN [Xd]!, Xn!, XsUSHRushrUSHR Unsigned shift right (immediate)USHR Dd, Dn, #shiftUSHRVd.T, Vn.T, #shiftLDADDldaddLDADD*Atomic add on word or doubleword in memory LDADDWs, Wt, [Xn|SP]LDADDAWs, Wt, [Xn|SP]LDADDALWs, Wt, [Xn|SP]LDADDLWs, Wt, [Xn|SP]LDADDXs, Xt, [Xn|SP]LDADDAXs, Xt, [Xn|SP]LDADDALXs, Xt, [Xn|SP]LDADDLXs, Xt, [Xn|SP]'STADDWs, [Xn|SP]LDADD Ws, WZR, [Xn|SP])STADDLWs, [Xn|SP]LDADDL Ws, WZR, [Xn|SP]'STADDXs, [Xn|SP]LDADD Xs, XZR, [Xn|SP])STADDLXs, [Xn|SP]LDADDL Xs, XZR, [Xn|SP]DCPS1dcps1DCPS1Debug change PE state to EL1 DCPS1 {#imm}STURBsturbSTURBStore register byte (unscaled)STURBWt, [Xn|SP{, #simm}]RORVrorvRORVRotate right variableRORVWd, Wn, WmRORVXd, Xn, XmSRHADDsrhaddSRHADDSigned rounding halving addSRHADDVd.T, Vn.T, Vm.TSRHADDZdn.T, Pg/M, Zdn.T, Zm.TFVDOTBfvdotbFVDOTBThe instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with the lower-numbered horizontal group of two 8-bit floating-point values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are scaled by 2:FVDOTB ZA.S[Wv, offs, VGx4], { Zn1.B-Zn2.B }, Zm.B[index]CMNcmnCMNCMN (extended register) -- A64$Compare negative (extended register)#CMN Wn|WSP, Wm{, extend {#amount}})ADDS WZR, Wn|WSP, Wm{, extend {#amount}}"CMN Xn|SP, Rm{, extend {#amount}}(ADDS XZR, Xn|SP, Rm{, extend {#amount}}CMN (immediate) -- A64Compare negative (immediate)CMN Wn|WSP, #imm{, shift} ADDS WZR, Wn|WSP, #imm{, shift}CMN Xn|SP, #imm{, shift}ADDS XZR, Xn|SP, #imm{, shift}CMN (shifted register) -- A64#Compare negative (shifted register)CMN Wn, Wm{, shift #amount}"ADDS WZR, Wn, Wm{, shift #amount}CMN Xn, Xm{, shift #amount}"ADDS XZR, Xn, Xm{, shift #amount}SSHLLsshllSSHLL"Signed shift left long (immediate)SSHLL{2} Vd.Ta, Vn.Tb, #shiftSBsbSBSpeculation barrierSBFMINQVfminqvFMINQV,Floating-point minimum of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as +Infinity.FMINQVVd.T, Pg, Zn.TbTBLtblTBLTable vector lookupTBLVd.Ta, { Vn.16B }, Vm.Ta%TBLVd.Ta, { Vn.16B, Vn+1.16B }, Vm.Ta/TBLVd.Ta, { Vn.16B, Vn+1.16B, Vn+2.16B }, Vm.Ta9TBLVd.Ta, { Vn.16B, Vn+1.16B, Vn+2.16B, Vn+3.16B }, Vm.TaTBLZd.T, { Zn.T }, Zm.TTBLZd.T, { Zn1.T, Zn2.T }, Zm.TBFMOPSbfmopsBFMOPSoThe BFloat16 floating-point sum of outer products and subtract instruction works with a 32-bit element ZA tile.$BFMOPSZAda.S, Pn/M, Pm/M, Zn.H, Zm.H$BFMOPSZAda.H, Pn/M, Pm/M, Zn.H, Zm.HLDTRldtrLDTRLoad register (unprivileged)LDTRWt, [Xn|SP{, #simm}]LDTRXt, [Xn|SP{, #simm}]SADDLTsaddltSADDLTAdd the corresponding odd-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SADDLTZd.T, Zn.Tb, Zm.TbSXTBsxtbSXTBSign-extend the least-significant sub-element of each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.SXTBZd.T, Pg/M, Zn.TSXTHZd.T, Pg/M, Zn.TSXTWZd.D, Pg/M, Zn.D SXTB -- A64Signed extend byte SXTB Wd, WnSBFM Wd, Wn, #0, #7 SXTB Xd, WnSBFM Xd, Xn, #0, #7UADDLVuaddlvUADDLVUnsigned sum long across vectorUADDLVVd, Vn.TLDUMAXAHldumaxahLDUMAXAH-Atomic unsigned maximum on halfword in memoryLDUMAXAHWs, Wt, [Xn|SP]LDUMAXALHWs, Wt, [Xn|SP]LDUMAXHWs, Wt, [Xn|SP]LDUMAXLHWs, Wt, [Xn|SP]MVNmvnMVN MVN -- A64Bitwise NOT (vector)MVN Vd.T, Vn.TNOT Vd.T, Vn.T MVN -- A64 Bitwise NOTMVN Wd, Wm{, shift #amount}!ORN Wd, WZR, Wm{, shift #amount}MVN Xd, Xm{, shift #amount}!ORN Xd, XZR, Xm{, shift #amount}DUPQdupqDUPQUnconditionally broadcast the indexed element within each 128-bit source vector segment to all elements of the corresponding destination vector segment. This instruction is unpredicated.DUPQZd.T, Zn.T[imm]FRINTXfrintxFRINTXLFloating-point round to integral exact, using current rounding mode (vector)FRINTXVd.T, Vn.TFRINTXVd.T, Vn.T FRINTXHd, Hn FRINTXSd, Sn FRINTXDd, DnLDNT1Bldnt1bLDNT1BContiguous load non-temporal of bytes to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 5LDNT1B{ Zt1.B-Zt2.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}]5LDNT1B{ Zt1.B-Zt4.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}])LDNT1B{ Zt1.B-Zt2.B }, PNg/Z, [Xn|SP, Xm])LDNT1B{ Zt1.B-Zt4.B }, PNg/Z, [Xn|SP, Xm]6LDNT1B{ Zt1.B, Zt2.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}]DLDNT1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}]*LDNT1B{ Zt1.B, Zt2.B }, PNg/Z, [Xn|SP, Xm]8LDNT1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg/Z, [Xn|SP, Xm]"LDNT1B{ Zt.S }, Pg/Z, [Zn.S{, Xm}]"LDNT1B{ Zt.D }, Pg/Z, [Zn.D{, Xm}]-LDNT1B{ Zt.B }, Pg/Z, [Xn|SP{, #imm, MUL VL}]!LDNT1B{ Zt.B }, Pg/Z, [Xn|SP, Xm]SHA1Hsha1hSHA1HSHA1 fixed rotate SHA1HSd, SnCPYFPTWNcpyfptwnCPYFPTWNLMemory copy forward-only, reads and writes unprivileged, writes non-temporalCPYFPTWN [Xd]!, [Xs]!, Xn!CPYFMTWN [Xd]!, [Xs]!, Xn!CPYFETWN [Xd]!, [Xs]!, Xn!FSUBRfsubrFSUBR0Reversed subtract from an immediate each active floating-point element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive elements in the destination vector register remain unmodified.FSUBRZdn.T, Pg/M, Zdn.T, constFSUBRZdn.T, Pg/M, Zdn.T, Zm.TLDNF1Bldnf1bLDNF1BContiguous load with non-faulting behavior of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.-LDNF1B{ Zt.B }, Pg/Z, [Xn|SP{, #imm, MUL VL}]-LDNF1B{ Zt.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}]-LDNF1B{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]-LDNF1B{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]FMAXNMQVfmaxnmqvFMAXNMQV9Floating-point maximum number of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as the default NaN.FMAXNMQVVd.T, Pg, Zn.TbSQINCPsqincpSQINCP Counts the number of true elements in the source predicate and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.SQINCPXdn, Pm.T, WdnSQINCPXdn, Pm.TSQINCPZdn.T, Pm.TCPYPRTNcpyprtnCPYPRTN>Memory copy, reads unprivileged, reads and writes non-temporalCPYPRTN [Xd]!, [Xs]!, Xn!CPYMRTN [Xd]!, [Xs]!, Xn!CPYERTN [Xd]!, [Xs]!, Xn!SMLSLLsmlsllSMLSLLThis signed integer multiply-subtract long-long instruction multiplies each signed 8-bit or 16-bit element in the one, two, or four first source vectors with each signed 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively subtracts these values from the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups. 0SMLSLL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B[index]0SMLSLL ZA.D[Wv, offs1:offs4], Zn.H, Zm.H[index]CSMLSLL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CSMLSLL ZA.D[Wv, offs1:offs4{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CSMLSLL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]CSMLSLL ZA.D[Wv, offs1:offs4{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]+SMLSLL ZA.T[Wv, offs1:offs4], Zn.Tb, Zm.Tb?SMLSLL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, Zm.Tb?SMLSLL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, Zm.TbKSMLSLL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, { Zm1.Tb-Zm2.Tb }KSMLSLL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, { Zm1.Tb-Zm4.Tb }FRINTAfrintaFRINTAGFloating-point round to integral, to nearest with ties to away (vector)FRINTAVd.T, Vn.TFRINTAVd.T, Vn.T FRINTAHd, Hn FRINTASd, Sn FRINTADd, Dn&FRINTA{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }&FRINTA{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }HVChvcHVCHypervisor call HVC #immSTLURstlurSTLUR4Store-release SIMD&FP register (unscaled offset)STLURBt, [Xn|SP{, #simm}]STLURHt, [Xn|SP{, #simm}]STLURSt, [Xn|SP{, #simm}]STLURDt, [Xn|SP{, #simm}]STLURQt, [Xn|SP{, #simm}]STLURWt, [Xn|SP{, #simm}]STLURXt, [Xn|SP{, #simm}]AESDaesdAESDAES single round decryptionAESDVd.16B, Vn.16BAESDZdn.B, Zdn.B, Zm.BFRINT64Zfrint64zFRINT64Z;Floating-point round to 64-bit integer toward zero (vector)FRINT64ZVd.T, Vn.TFRINT64ZSd, SnFRINT64ZDd, DnLD1ROBld1robLD1ROBLoad thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address.%LD1ROB{ Zt.B }, Pg/Z, [Xn|SP{, #imm}]!LD1ROB{ Zt.B }, Pg/Z, [Xn|SP, Xm]SBCLBsbclbSBCLBdSubtract the even-numbered elements of the first source vector and the inverted 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector from the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.SBCLBZda.T, Zn.T, Zm.TWHILEHIwhilehiWHILEHIGenerate a predicate that starting from the highest numbered element is true while the decrementing value of the first, unsigned scalar operand is higher than the second scalar operand and false thereafter down to the lowest numbered element.WHILEHIPd.T, Rn, RmWHILEHIPNd.T, Xn, Xm, vlWHILEHI{ Pd1.T, Pd2.T }, Xn, XmPTESTptestPTEST Sets the  PTESTPg, Pn.BSMOPAsmopaSMOPA5This instruction works with a 32-bit element ZA tile.#SMOPAZAda.S, Pn/M, Pm/M, Zn.H, Zm.H#SMOPAZAda.S, Pn/M, Pm/M, Zn.B, Zm.B#SMOPAZAda.D, Pn/M, Pm/M, Zn.H, Zm.HSQINCHsqinchSQINCHkDetermines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQINCHXdn, Wdn{, pattern{, MUL #imm}} SQINCHXdn{, pattern{, MUL #imm}}"SQINCHZdn.H{, pattern{, MUL #imm}}HISTCNThistcntHISTCNT_This instruction compares each active 32 or 64-bit element of the first source vector with all active elements with an element number less than or equal to its own in the second source vector, and places the count of matching elements in the corresponding element of the destination vector. Inactive elements in the destination vector are set to zero.HISTCNTZd.T, Pg/Z, Zn.T, Zm.TUADDWuaddwUADDWUnsigned add wideUADDW{2} Vd.Ta, Vn.Ta, Vm.TbUABALTuabaltUABALTCompute the absolute difference between odd-numbered unsigned elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.UABALTZda.T, Zn.Tb, Zm.TbFVDOTTfvdottFVDOTTThe instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with the higher-numbered horizontal group of two 8-bit floating-point values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are scaled by 2:FVDOTT ZA.S[Wv, offs, VGx4], { Zn1.B-Zn2.B }, Zm.B[index]BFMMLAbfmmlaBFMMLABBFloat16 floating-point matrix multiply-accumulate into 2x2 matrixBFMMLAVd.4S, Vn.8H, Vm.8HBFMMLAZda.S, Zn.H, Zm.HFRINT32Xfrint32xFRINT32XLFloating-point round to 32-bit integer, using current rounding mode (vector)FRINT32XVd.T, Vn.TFRINT32XSd, SnFRINT32XDd, DnFMLALTfmlaltFMLALTThis 8-bit floating-point multiply-add long instruction widens the odd 8-bit elements in the first and second source vectors to half-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2FMLALTZda.H, Zn.B, Zm.BFMLALTZda.H, Zn.B, Zm.B[imm]FMLALTZda.S, Zn.H, Zm.HFMLALTZda.S, Zn.H, Zm.H[imm]LDPldpLDP"Load pair of SIMD&FP registersLDPSt1, St2, [Xn|SP], #immLDPDt1, Dt2, [Xn|SP], #immLDPQt1, Qt2, [Xn|SP], #immLDPSt1, St2, [Xn|SP, #imm]!LDPDt1, Dt2, [Xn|SP, #imm]!LDPQt1, Qt2, [Xn|SP, #imm]!LDPSt1, St2, [Xn|SP{, #imm}]LDPDt1, Dt2, [Xn|SP{, #imm}]LDPQt1, Qt2, [Xn|SP{, #imm}]LDPWt1, Wt2, [Xn|SP], #immLDPXt1, Xt2, [Xn|SP], #immLDPWt1, Wt2, [Xn|SP, #imm]!LDPXt1, Xt2, [Xn|SP, #imm]!LDPWt1, Wt2, [Xn|SP{, #imm}]LDPXt1, Xt2, [Xn|SP{, #imm}]UADDWBuaddwbUADDWBAdd the even-numbered unsigned elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.UADDWBZd.T, Zn.T, Zm.TbSTZGMstzgmSTZGM&Store Allocation Tag and zero multipleSTZGMXt, [Xn|SP]SMSTARTsmstartSMSTARTSMSTART -- A64@Enables access to Streaming SVE mode and SME architectural stateSMSTART {option}MSR pstatefield, #1RCWSCASPrcwscaspRCWSCASP=Read check write software compare and swap quadword in memory'RCWSCASPXs, X(s+1), Xt, X(t+1), [Xn|SP](RCWSCASPAXs, X(s+1), Xt, X(t+1), [Xn|SP])RCWSCASPALXs, X(s+1), Xt, X(t+1), [Xn|SP](RCWSCASPLXs, X(s+1), Xt, X(t+1), [Xn|SP]FDIVfdivFDIVFloating-point divide (vector)FDIVVd.T, Vn.T, Vm.TFDIVVd.T, Vn.T, Vm.TFDIVHd, Hn, HmFDIVSd, Sn, SmFDIVDd, Dn, DmFDIVZdn.T, Pg/M, Zdn.T, Zm.TUMULLBumullbUMULLBMultiply the corresponding even-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.UMULLBZd.T, Zn.Tb, Zm.TbUMULLBZd.S, Zn.H, Zm.H[imm]UMULLBZd.D, Zn.S, Zm.S[imm]PTRUEptruePTRUE#Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to false otherwise. If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false.PTRUEPd.T{, pattern} PTRUEPNd.TSHA512Hsha512hSHA512HSHA512 hash update part 1SHA512HQd, Qn, Vm.2DSMOVsmovSMOV6Signed move vector element to general-purpose registerSMOVWd, Vn.Ts[index]SMOVXd, Vn.Ts[index]CPYcpyCPYCopy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register are set to zero.CPYZd.T, Pg/Z, #imm{, shift}CPYZd.T, Pg/M, #imm{, shift}CPYZd.T, Pg/M, Rn|SPCPYZd.T, Pg/M, VnRADDHNraddhnRADDHN"Rounding add returning high narrowRADDHN{2} Vd.Tb, Vn.Ta, Vm.TaNBSLnbslNBSLCSelects bits from the first source vector where the corresponding bit in the third source vector is '1', and from the second source vector where the corresponding bit in the third source vector is '0'. The inverted result is placed destructively in the destination and first source vector. This instruction is unpredicated.NBSLZdn.D, Zdn.D, Zm.D, Zk.DF1CVTf1cvtF1CVTConvert each 8-bit floating-point element of the source vector to half-precision while downscaling the value, and place the results in the corresponding 16-bit elements of the destination vectors. F1CVT scales the values by 2F1CVT{ Zd1.H-Zd2.H }, Zn.BF2CVT{ Zd1.H-Zd2.H }, Zn.BF1CVTZd.H, Zn.BF2CVTZd.H, Zn.BFACGTfacgtFACGTCompare active absolute values of floating-point elements in the first source vector with corresponding absolute values of elements in the second source vector, and place the boolean results of the specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.FACGTPd.T, Pg/Z, Zn.T, Zm.TFACGEPd.T, Pg/Z, Zn.T, Zm.TFACGTHd, Hn, HmFACGTVd, Vn, VmFACGTVd.T, Vn.T, Vm.TFACGTVd.T, Vn.T, Vm.TMADmadMAD&Multiply the corresponding active elements of the first and second source vectors and add to elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.MADZdn.T, Pg/M, Zm.T, Za.TSQDMULHsqdmulhSQDMULHDSigned saturating doubling multiply returning high half (by element) SQDMULHVd, Vn, Vm.Ts[index]SQDMULHVd.T, Vn.T, Vm.Ts[index]SQDMULHVd, Vn, VmSQDMULHVd.T, Vn.T, Vm.T1SQDMULH{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T1SQDMULH{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T<SQDMULH{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }<SQDMULH{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }SQDMULHZd.T, Zn.T, Zm.TSQDMULHZd.H, Zn.H, Zm.H[imm]SQDMULHZd.S, Zn.S, Zm.S[imm]SQDMULHZd.D, Zn.D, Zm.D[imm]URSHLurshlURSHL'Unsigned rounding shift left (register)URSHL Dd, Dn, DmURSHLVd.T, Vn.T, Vm.T/URSHL{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T/URSHL{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T:URSHL{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }:URSHL{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }URSHLZdn.T, Pg/M, Zdn.T, Zm.TLD3ld3LD35Load multiple 3-element structures to three registers#LD3 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP](LD3 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], imm'LD3 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], Xm*LD3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP]*LD3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP]*LD3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP]*LD3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP].LD3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], #3.LD3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], Xm.LD3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], #6.LD3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], Xm/LD3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], #12.LD3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], Xm/LD3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], #24.LD3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], XmRDFFRrdffrRDFFRRead the first-fault register ( RDFFRPd.BRDFFRPd.B, Pg/ZFTMADftmadFTMADThe FTMADZdn.T, Zdn.T, Zm.T, #immFABSfabsFABS&Floating-point absolute value (vector)FABSVd.T, Vn.TFABSVd.T, Vn.T FABSHd, Hn FABSSd, Sn FABSDd, DnFABSZd.T, Pg/M, Zn.TLDXRldxrLDXRLoad exclusive registerLDXRWt, [Xn|SP{, #0}]LDXRXt, [Xn|SP{, #0}]LDFF1Dldff1dLDFF1DVGather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.$LDFF1D{ Zt.D }, Pg/Z, [Zn.D{, #imm}]+LDFF1D{ Zt.D }, Pg/Z, [Xn|SP{, Xm, LSL #3}]+LDFF1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #3](LDFF1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod]+LDFF1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #3]#LDFF1D{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]FCMLTfcmltFCMLT.Floating-point compare less than zero (vector)FCMLTHd, Hn, #0.0FCMLTVd, Vn, #0.0FCMLTVd.T, Vn.T, #0.0FCMLTVd.T, Vn.T, #0.0FCMLT (vectors)\Compare active floating-point elements in the first source vector being less than corresponding elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.FCMLT Pd.T, Pg/Z, Zm.T, Zn.TFCMGT Pd.T, Pg/Z, Zn.T, Zm.TFLOGBflogbFLOGBcThis instruction returns the signed integer base 2 logarithm of each floating-point input element |FLOGBZd.T, Pg/M, Zn.TLDNF1SBldnf1sbLDNF1SBContiguous load with non-faulting behavior of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector..LDNF1SB{ Zt.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}].LDNF1SB{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}].LDNF1SB{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]FMULfmulFMUL$Floating-point multiply (by element)FMULHd, Hn, Vm.H[index]FMULVd, Vn, Vm.Ts[index]FMULVd.T, Vn.T, Vm.H[index]FMULVd.T, Vn.T, Vm.Ts[index]FMULVd.T, Vn.T, Vm.TFMULVd.T, Vn.T, Vm.TFMULHd, Hn, HmFMULSd, Sn, SmFMULDd, Dn, DmFMULZdn.T, Pg/M, Zdn.T, constFMULZdn.T, Pg/M, Zdn.T, Zm.TFMULZd.T, Zn.T, Zm.TFMULZd.H, Zn.H, Zm.H[imm]FMULZd.S, Zn.S, Zm.S[imm]FMULZd.D, Zn.D, Zm.D[imm]CINVcinvCINV CINV -- A64Conditional invertCINV Wd, Wn, invcondCSINV Wd, Wn, Wm, condCINV Xd, Xn, invcondCSINV Xd, Xn, Xm, condAESIMCaesimcAESIMCAES inverse mix columnsAESIMCVd.16B, Vn.16BAESIMCZdn.B, Zdn.BBRKBSbrkbsBRKBSSets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKBSPd.B, Pg/Z, Pn.BRCWSWPPrcwswppRCWSWPP(Read check write swap quadword in memoryRCWSWPPXt1, Xt2, [Xn|SP]RCWSWPPAXt1, Xt2, [Xn|SP]RCWSWPPALXt1, Xt2, [Xn|SP]RCWSWPPLXt1, Xt2, [Xn|SP]LDNT1SWldnt1swLDNT1SW,Gather load non-temporal of signed words to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.#LDNT1SW{ Zt.D }, Pg/Z, [Zn.D{, Xm}]LDURHldurhLDURH!Load register halfword (unscaled)LDURHWt, [Xn|SP{, #simm}]UMNEGLumneglUMNEGL UMNEGL -- A64Unsigned multiply-negate longUMNEGL Xd, Wn, WmUMSUBL Xd, Wn, Wm, XZRSTNT1Dstnt1dSTNT1D"Contiguous store non-temporal of doublewords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 3STNT1D{ Zt1.D-Zt2.D }, PNg, [Xn|SP{, #imm, MUL VL}]3STNT1D{ Zt1.D-Zt4.D }, PNg, [Xn|SP{, #imm, MUL VL}]/STNT1D{ Zt1.D-Zt2.D }, PNg, [Xn|SP, Xm, LSL #3]/STNT1D{ Zt1.D-Zt4.D }, PNg, [Xn|SP, Xm, LSL #3]4STNT1D{ Zt1.D, Zt2.D }, PNg, [Xn|SP{, #imm, MUL VL}]BSTNT1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg, [Xn|SP{, #imm, MUL VL}]0STNT1D{ Zt1.D, Zt2.D }, PNg, [Xn|SP, Xm, LSL #3]>STNT1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg, [Xn|SP, Xm, LSL #3] STNT1D{ Zt.D }, Pg, [Zn.D{, Xm}]+STNT1D{ Zt.D }, Pg, [Xn|SP{, #imm, MUL VL}]'STNT1D{ Zt.D }, Pg, [Xn|SP, Xm, LSL #3]SMLALBsmlalbSMLALBMultiply the corresponding even-numbered signed elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.SMLALBZda.T, Zn.Tb, Zm.TbSMLALBZda.S, Zn.H, Zm.H[imm]SMLALBZda.D, Zn.S, Zm.S[imm]NGCSngcsNGCS NGCS -- A64 Negate with carry, setting flags NGCS Wd, WmSBCS Wd, WZR, Wm NGCS Xd, XmSBCS Xd, XZR, XmREVBrevbREVB Reverse the order of 8-bit bytes, 16-bit halfwords or 32-bit words within each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.REVBZd.T, Pg/M, Zn.TREVHZd.T, Pg/M, Zn.TREVWZd.D, Pg/M, Zn.DUMAXQVumaxqvUMAXQVUnsigned maximum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as zero.UMAXQVVd.T, Pg, Zn.TbFMAXfmaxFMAXFloating-point maximum (vector) FMAXVd.T, Vn.T, Vm.TFMAXVd.T, Vn.T, Vm.TFMAXHd, Hn, HmFMAXSd, Sn, SmFMAXDd, Dn, Dm.FMAX{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T.FMAX{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T9FMAX{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }9FMAX{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }FMAXZdn.T, Pg/M, Zdn.T, constFMAXZdn.T, Pg/M, Zdn.T, Zm.TADDVLaddvlADDVL Add the current vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDVLXd|SP, Xn|SP, #immCPYPTWNcpyptwnCPYPTWN?Memory copy, reads and writes unprivileged, writes non-temporalCPYPTWN [Xd]!, [Xs]!, Xn!CPYMTWN [Xd]!, [Xs]!, Xn!CPYETWN [Xd]!, [Xs]!, Xn!LDARldarLDARLoad-acquire registerLDARWt, [Xn|SP{, #0}]LDARXt, [Xn|SP{, #0}]SRSRAsrsraSRSRA6Signed rounding shift right and accumulate (immediate)SRSRA Dd, Dn, #shiftSRSRAVd.T, Vn.T, #shiftSRSRAZda.T, Zn.T, #constSQCVTNsqcvtnSQCVTNSaturate the signed integer value in each element of the group of two source vectors to half the original source element width, and place the two-way interleaved results in the half-width destination elements.SQCVTNZd.H, { Zn1.S-Zn2.S }SQCVTNZd.T, { Zn1.Tb-Zn4.Tb }MSBmsbMSB-Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.MSBZdn.T, Pg/M, Zm.T, Za.TNMATCHnmatchNMATCHThis instruction compares each active 8-bit or 16-bit character in the first source vector with all of the characters in the corresponding 128-bit segment of the second source vector. Where the first source element detects no matching characters in the second segment it places true in the corresponding element of the destination predicate, otherwise false. Inactive elements in the destination predicate register are set to zero. Sets the NMATCHPd.T, Pg/Z, Zn.T, Zm.T CPYFPWTWN cpyfpwtwn CPYFPWTWN>Memory copy forward-only, writes unprivileged and non-temporalCPYFPWTWN [Xd]!, [Xs]!, Xn!CPYFMWTWN [Xd]!, [Xs]!, Xn!CPYFEWTWN [Xd]!, [Xs]!, Xn!SMAXsmaxSMAXSigned maximum (vector) SMAXVd.T, Vn.T, Vm.TSMAXWd, Wn, #simmSMAXXd, Xn, #simm.SMAX{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T.SMAX{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T9SMAX{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }9SMAX{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }SMAXWd, Wn, WmSMAXXd, Xn, XmSMAXZdn.T, Pg/M, Zdn.T, Zm.TSMAXZdn.T, Zdn.T, #immSUNPKHIsunpkhiSUNPKHIUnpack elements from the lowest or highest half of the source vector and then sign-extend them to place in elements of twice their size within the destination vector. This instruction is unpredicated.SUNPKHIZd.T, Zn.TbSUNPKLOZd.T, Zn.TbRPRFMrprfmRPRFMRange prefetch memory"RPRFM (rprfop|#imm6), Xm, [Xn|SP] SQDMLSLBT sqdmlslbt SQDMLSLBTMultiply then double the corresponding even-numbered signed elements of the first and odd-numbered signed elements of the second source vector. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2SQDMLSLBTZda.T, Zn.Tb, Zm.TbUSRAusraUSRA/Unsigned shift right and accumulate (immediate)USRA Dd, Dn, #shiftUSRAVd.T, Vn.T, #shiftUSRAZda.T, Zn.T, #constLDAPURSBldapursbLDAPURSB1Load-acquire RCpc register signed byte (unscaled)LDAPURSBWt, [Xn|SP{, #simm}]LDAPURSBXt, [Xn|SP{, #simm}]BFCVTNTbfcvtntBFCVTNT0Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the results in the odd-numbered 16-bit elements of the destination vector, leaving the even-numbered elements unchanged. Inactive elements in the destination vector register remain unmodified.BFCVTNTZd.H, Pg/M, Zn.SCFINVcfinvCFINVInvert carry flagCFINVSSUBLTBssubltbSSUBLTBSubtract the even-numbered signed elements of the second source vector from the odd-numbered signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SSUBLTBZd.T, Zn.Tb, Zm.TbRCWCLRPrcwclrpRCWCLRP7Read check write atomic bit clear on quadword in memoryRCWCLRPXt1, Xt2, [Xn|SP]RCWCLRPAXt1, Xt2, [Xn|SP]RCWCLRPALXt1, Xt2, [Xn|SP]RCWCLRPLXt1, Xt2, [Xn|SP]FTSMULftsmulFTSMULThe FTSMULZd.T, Zn.T, Zm.TLDAPRHldaprhLDAPRH#Load-acquire RCpc register halfwordLDAPRHWt, [Xn|SP {, #0}]ADDPaddpADDPAdd pair of elements (scalar)ADDP Dd, Vn.2DADDPVd.T, Vn.T, Vm.TADDPZdn.T, Pg/M, Zdn.T, Zm.TFDUPfdupFDUPUnconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is unpredicated.FDUPZd.T, #constCMPEQcmpeqCMPEQCompare active integer elements in the source vector with an immediate, and place the boolean results of the specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the CMPEQPd.T, Pg/Z, Zn.T, #immCMPGTPd.T, Pg/Z, Zn.T, #immCMPGEPd.T, Pg/Z, Zn.T, #immCMPHIPd.T, Pg/Z, Zn.T, #immCMPHSPd.T, Pg/Z, Zn.T, #immCMPLTPd.T, Pg/Z, Zn.T, #immCMPLEPd.T, Pg/Z, Zn.T, #immCMPLOPd.T, Pg/Z, Zn.T, #immCMPLSPd.T, Pg/Z, Zn.T, #immCMPNEPd.T, Pg/Z, Zn.T, #immCMPEQPd.T, Pg/Z, Zn.T, Zm.DCMPGTPd.T, Pg/Z, Zn.T, Zm.DCMPGEPd.T, Pg/Z, Zn.T, Zm.DCMPHIPd.T, Pg/Z, Zn.T, Zm.DCMPHSPd.T, Pg/Z, Zn.T, Zm.DCMPLTPd.T, Pg/Z, Zn.T, Zm.DCMPLEPd.T, Pg/Z, Zn.T, Zm.DCMPLOPd.T, Pg/Z, Zn.T, Zm.DCMPLSPd.T, Pg/Z, Zn.T, Zm.DCMPNEPd.T, Pg/Z, Zn.T, Zm.DCMPEQPd.T, Pg/Z, Zn.T, Zm.TCMPGTPd.T, Pg/Z, Zn.T, Zm.TCMPGEPd.T, Pg/Z, Zn.T, Zm.TCMPHIPd.T, Pg/Z, Zn.T, Zm.TCMPHSPd.T, Pg/Z, Zn.T, Zm.TCMPNEPd.T, Pg/Z, Zn.T, Zm.TRSUBHNTrsubhntRSUBHNT9Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant rounded half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.RSUBHNTZd.T, Zn.Tb, Zm.TbBRKNSbrknsBRKNSIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise leaves the destination and second source predicate unchanged. Sets the BRKNSPdm.B, Pg/Z, Pn.B, Pdm.BSHA1Csha1cSHA1CSHA1 hash update (choose)SHA1CQd, Sn, Vm.4SST3Dst3dST3D8Contiguous store three-doubleword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,8ST3D{ Zt1.D, Zt2.D, Zt3.D }, Pg, [Xn|SP{, #imm, MUL VL}]4ST3D{ Zt1.D, Zt2.D, Zt3.D }, Pg, [Xn|SP, Xm, LSL #3]FMSUBfmsubFMSUB/Floating-point fused multiply-subtract (scalar)FMSUBHd, Hn, Hm, HaFMSUBSd, Sn, Sm, SaFMSUBDd, Dn, Dm, DaST1Bst1bST1BContiguous store of bytes from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.1ST1B{ Zt1.B-Zt2.B }, PNg, [Xn|SP{, #imm, MUL VL}]1ST1B{ Zt1.B-Zt4.B }, PNg, [Xn|SP{, #imm, MUL VL}]%ST1B{ Zt1.B-Zt2.B }, PNg, [Xn|SP, Xm]%ST1B{ Zt1.B-Zt4.B }, PNg, [Xn|SP, Xm]2ST1B{ Zt1.B, Zt2.B }, PNg, [Xn|SP{, #imm, MUL VL}]@ST1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg, [Xn|SP{, #imm, MUL VL}]&ST1B{ Zt1.B, Zt2.B }, PNg, [Xn|SP, Xm]4ST1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg, [Xn|SP, Xm] ST1B{ Zt.S }, Pg, [Zn.S{, #imm}] ST1B{ Zt.D }, Pg, [Zn.D{, #imm}])ST1B{ Zt.T }, Pg, [Xn|SP{, #imm, MUL VL}]ST1B{ Zt.T }, Pg, [Xn|SP, Xm]$ST1B{ Zt.D }, Pg, [Xn|SP, Zm.D, mod]$ST1B{ Zt.S }, Pg, [Xn|SP, Zm.S, mod]ST1B{ Zt.D }, Pg, [Xn|SP, Zm.D],ST1B{ ZA0HV.B[Ws, offs] }, Pg, [Xn|SP{, Xm}]SDIVRsdivrSDIVRSigned reversed divide active elements of the second source vector by corresponding elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.SDIVRZdn.T, Pg/M, Zdn.T, Zm.TSHSUBshsubSHSUBSigned halving subtractSHSUBVd.T, Vn.T, Vm.TSHSUBZdn.T, Pg/M, Zdn.T, Zm.T SHA256SU0 sha256su0 SHA256SU0SHA256 schedule update 0SHA256SU0Vd.4S, Vn.4SSQRSHRNsqrshrnSQRSHRN8Signed saturating rounded shift right narrow (immediate)SQRSHRNVbd, Van, #shift SQRSHRN{2} Vd.Tb, Vn.Ta, #shift$SQRSHRNZd.H, { Zn1.S-Zn2.S }, #const&SQRSHRNZd.T, { Zn1.Tb-Zn4.Tb }, #constFCCMPfccmpFCCMP1Floating-point conditional quiet compare (scalar)FCCMPHn, Hm, #nzcv, condFCCMPSn, Sm, #nzcv, condFCCMPDn, Dm, #nzcv, condFNMADDfnmaddFNMADD2Floating-point negated fused multiply-add (scalar)FNMADDHd, Hn, Hm, HaFNMADDSd, Sn, Sm, SaFNMADDDd, Dn, Dm, DaFMLALLBTfmlallbtFMLALLBT This 8-bit floating-point multiply-add long-long instruction widens the second 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2FMLALLBTZda.S, Zn.B, Zm.BFMLALLBTZda.S, Zn.B, Zm.B[imm]SQADDsqaddSQADDSigned saturating addSQADDVd, Vn, VmSQADDVd.T, Vn.T, Vm.TSQADDZdn.T, Pg/M, Zdn.T, Zm.T SQADDZdn.T, Zdn.T, #imm{, shift}SQADDZd.T, Zn.T, Zm.TSQRSHLsqrshlSQRSHL0Signed saturating rounding shift left (register)SQRSHLVd, Vn, VmSQRSHLVd.T, Vn.T, Vm.TSQRSHLZdn.T, Pg/M, Zdn.T, Zm.TPTRUESptruesPTRUES#Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to false otherwise. If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false.PTRUESPd.T{, pattern}CMLTcmltCMLT&Compare signed less than zero (vector)CMLT Dd, Dn, #0CMLTVd.T, Vn.T, #0RCWSSWPrcwsswpRCWSSWP3Read check write software swap doubleword in memoryRCWSSWPXs, Xt, [Xn|SP]RCWSSWPAXs, Xt, [Xn|SP]RCWSSWPALXs, Xt, [Xn|SP]RCWSSWPLXs, Xt, [Xn|SP]SQCVTsqcvtSQCVTSaturate the signed integer value in each element of the two source vectors to half the original source element width, and place the results in the half-width destination elements.SQCVTZd.H, { Zn1.S-Zn2.S }SQCVTZd.T, { Zn1.Tb-Zn4.Tb }ST3st3ST38Store multiple 3-element structures from three registers#ST3 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP](ST3 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], imm'ST3 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], Xm*ST3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP]*ST3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP]*ST3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP]*ST3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP].ST3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], #3.ST3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], Xm.ST3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], #6.ST3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], Xm/ST3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], #12.ST3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], Xm/ST3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], #24.ST3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], XmCMGTcmgtCMGT$Compare signed greater than (vector)CMGT Dd, Dn, DmCMGTVd.T, Vn.T, Vm.TCMGT Dd, Dn, #0CMGTVd.T, Vn.T, #0ST3Bst3bST3B2Contiguous store three-byte structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,8ST3B{ Zt1.B, Zt2.B, Zt3.B }, Pg, [Xn|SP{, #imm, MUL VL}],ST3B{ Zt1.B, Zt2.B, Zt3.B }, Pg, [Xn|SP, Xm]FMLALLBBfmlallbbFMLALLBBT8-bit floating-point multiply-add long-long to single-precision (vector, by element) "FMLALLBBVd.4S, Vn.16B, Vm.B[index]"FMLALLBTVd.4S, Vn.16B, Vm.B[index]"FMLALLTBVd.4S, Vn.16B, Vm.B[index]"FMLALLTTVd.4S, Vn.16B, Vm.B[index]FMLALLBBVd.4S, Vn.16B, Vm.16BFMLALLBTVd.4S, Vn.16B, Vm.16BFMLALLTBVd.4S, Vn.16B, Vm.16BFMLALLTTVd.4S, Vn.16B, Vm.16BFMLALLBBZda.S, Zn.B, Zm.BFMLALLBBZda.S, Zn.B, Zm.B[imm]ST4Hst4hST4H4Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,?ST4H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, Pg, [Xn|SP{, #imm, MUL VL}];ST4H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, Pg, [Xn|SP, Xm, LSL #1]SUBSsubsSUBS+Subtract (extended register), setting flags&SUBSWd, Wn|WSP, Wm{, extend {#amount}}%SUBSXd, Xn|SP, Rm{, extend {#amount}}SUBSWd, Wn|WSP, #imm{, shift}SUBSXd, Xn|SP, #imm{, shift}SUBSWd, Wn, Wm{, shift #amount}SUBSXd, Xn, Xm{, shift #amount}UZPQ2uzpq2UZPQ2Concatenate adjacent odd-numbered elements from the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated.UZPQ2Zd.T, Zn.T, Zm.TZIPQ1zipq1ZIPQ1Interleave alternating elements from low halves of the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated.ZIPQ1Zd.T, Zn.T, Zm.TANDVandvANDVBitwise AND horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as all ones.ANDVVd, Pg, Zn.TPUNPKHIpunpkhiPUNPKHIUnpack elements from the lowest or highest half of the source predicate and place in elements of twice their size within the destination predicate. This instruction is unpredicated.PUNPKHIPd.H, Pn.BPUNPKLOPd.H, Pn.BUABAuabaUABA+Unsigned absolute difference and accumulateUABAVd.T, Vn.T, Vm.TUABAZda.T, Zn.T, Zm.TSQDMLALsqdmlalSQDMLAL9Signed saturating doubling multiply-add long (by element)SQDMLALVad, Vbn, Vm.Ts[index]&SQDMLAL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]SQDMLALVad, Vbn, VbmSQDMLAL{2} Vd.Ta, Vn.Tb, Vm.Tb PACNBIASPPC pacnbiasppc PACNBIASPPCPPointer Authentication Code for return address, using key A, not a branch target PACNBIASPPCFMLAfmlaFMLA=Floating-point fused multiply-add to accumulator (by element)FMLAHd, Hn, Vm.H[index]FMLAVd, Vn, Vm.Ts[index]FMLAVd.T, Vn.T, Vm.H[index]FMLAVd.T, Vn.T, Vm.Ts[index]FMLAVd.T, Vn.T, Vm.TFMLAVd.T, Vn.T, Vm.TFMLAZda.T, Pg/M, Zn.T, Zm.TFMLAZda.H, Zn.H, Zm.H[imm]FMLAZda.S, Zn.S, Zm.S[imm]FMLAZda.D, Zn.D, Zm.D[imm]<FMLA ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<FMLA ZA.S[Wv, offs{, VGx2}], { Zn1.S-Zn2.S }, Zm.S[index]<FMLA ZA.D[Wv, offs{, VGx2}], { Zn1.D-Zn2.D }, Zm.D[index]<FMLA ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]<FMLA ZA.S[Wv, offs{, VGx4}], { Zn1.S-Zn4.S }, Zm.S[index]<FMLA ZA.D[Wv, offs{, VGx4}], { Zn1.D-Zn4.D }, Zm.D[index]5FMLA ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, Zm.T5FMLA ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5FMLA ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, Zm.T5FMLA ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@FMLA ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, { Zm1.T-Zm2.T }@FMLA ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@FMLA ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, { Zm1.T-Zm4.T }@FMLA ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }STLURBstlurbSTLURB&Store-release register byte (unscaled)STLURBWt, [Xn|SP{, #simm}]SQSHLsqshlSQSHL(Signed saturating shift left (immediate)SQSHLVd, Vn, #shiftSQSHLVd.T, Vn.T, #shiftSQSHLVd, Vn, VmSQSHLVd.T, Vn.T, Vm.TSQSHLZdn.T, Pg/M, Zdn.T, #constSQSHLZdn.T, Pg/M, Zdn.T, Zm.TLD1Bld1bLD1BContiguous load of unsigned bytes to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.3LD1B{ Zt1.B-Zt2.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}]3LD1B{ Zt1.B-Zt4.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}]'LD1B{ Zt1.B-Zt2.B }, PNg/Z, [Xn|SP, Xm]'LD1B{ Zt1.B-Zt4.B }, PNg/Z, [Xn|SP, Xm]4LD1B{ Zt1.B, Zt2.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}]BLD1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg/Z, [Xn|SP{, #imm, MUL VL}](LD1B{ Zt1.B, Zt2.B }, PNg/Z, [Xn|SP, Xm]6LD1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg/Z, [Xn|SP, Xm]"LD1B{ Zt.S }, Pg/Z, [Zn.S{, #imm}]"LD1B{ Zt.D }, Pg/Z, [Zn.D{, #imm}]+LD1B{ Zt.B }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1B{ Zt.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1B{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1B{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]LD1B{ Zt.B }, Pg/Z, [Xn|SP, Xm]LD1B{ Zt.H }, Pg/Z, [Xn|SP, Xm]LD1B{ Zt.S }, Pg/Z, [Xn|SP, Xm]LD1B{ Zt.D }, Pg/Z, [Xn|SP, Xm]&LD1B{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod]&LD1B{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod]!LD1B{ Zt.D }, Pg/Z, [Xn|SP, Zm.D].LD1B{ ZA0HV.B[Ws, offs] }, Pg/Z, [Xn|SP{, Xm}]LDUMAXBldumaxbLDUMAXB9Atomic unsigned maximum on byte in memory, without return+STUMAXBWs, [Xn|SP]LDUMAXB Ws, WZR, [Xn|SP]-STUMAXLBWs, [Xn|SP]LDUMAXLB Ws, WZR, [Xn|SP]BEXTbextBEXTxThis instruction gathers bits in each element of the first source vector from the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector to the lowest-numbered contiguous bits of the corresponding destination element, preserving their order, and sets the remaining higher-numbered bits to zero. This instruction is unpredicated.BEXTZd.T, Zn.T, Zm.TLDSMAXAHldsmaxahLDSMAXAH+Atomic signed maximum on halfword in memoryLDSMAXAHWs, Wt, [Xn|SP]LDSMAXALHWs, Wt, [Xn|SP]LDSMAXHWs, Wt, [Xn|SP]LDSMAXLHWs, Wt, [Xn|SP]BMOPSbmopsBMOPSwThis instruction works with 32-bit element ZA tile. This instruction generates an outer product of the first source SVL#BMOPSZAda.S, Pn/M, Pm/M, Zn.S, Zm.SLD1RHld1rhLD1RHLoad a single unsigned halfword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 2 in the range 0 to 126.$LD1RH{ Zt.H }, Pg/Z, [Xn|SP{, #imm}]$LD1RH{ Zt.S }, Pg/Z, [Xn|SP{, #imm}]$LD1RH{ Zt.D }, Pg/Z, [Xn|SP{, #imm}]UQDECPuqdecpUQDECPCounts the number of true elements in the source predicate and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.UQDECPWdn, Pm.TUQDECPXdn, Pm.TUQDECPZdn.T, Pm.TCNOTcnotCNOTLogically invert the boolean value in each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.CNOTZd.T, Pg/M, Zn.TLDARBldarbLDARBLoad-acquire register byteLDARBWt, [Xn|SP{, #0}]USHLLBushllbUSHLLB3Shift left by immediate each even-numbered unsigned element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.USHLLBZd.T, Zn.Tb, #constADRPadrpADRP$Form PC-relative address to 4KB page ADRPXd, labelLSRRlsrrLSRRReversed shift right, inserting zeroes, active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.LSRRZdn.T, Pg/M, Zdn.T, Zm.TSADDLBsaddlbSADDLBAdd the corresponding even-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SADDLBZd.T, Zn.Tb, Zm.TbST2Gst2gST2GStore Allocation TagsST2GXt|SP, [Xn|SP], #simmST2GXt|SP, [Xn|SP, #simm]!ST2GXt|SP, [Xn|SP{, #simm}]BFMULbfmulBFMULMultiply active BFloat16 elements of the second source vector to corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.BFMULZdn.H, Pg/M, Zdn.H, Zm.HBFMULZd.H, Zn.H, Zm.HBFMULZd.H, Zn.H, Zm.H[imm]LD1RBld1rbLD1RBLoad a single unsigned byte from a memory address generated by a 64-bit scalar base address plus an immediate offset which is in the range 0 to 63.$LD1RB{ Zt.B }, Pg/Z, [Xn|SP{, #imm}]$LD1RB{ Zt.H }, Pg/Z, [Xn|SP{, #imm}]$LD1RB{ Zt.S }, Pg/Z, [Xn|SP{, #imm}]$LD1RB{ Zt.D }, Pg/Z, [Xn|SP{, #imm}]PACDApacdaPACDA9Pointer Authentication Code for data address, using key APACDAXd, Xn|SPPACDZAXdAUTDAautdaAUTDA&Authenticate data address, using key AAUTDAXd, Xn|SPAUTDZAXdBGRPbgrpBGRPThis instruction separates bits in each element of the first source vector by gathering from the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector to the lowest-numbered contiguous bits of the corresponding destination element, and from positions indicated by zero bits to the highest-numbered bits of the destination element, preserving the bit order within each group. This instruction is unpredicated.BGRPZd.T, Zn.T, Zm.TSQRDMULHsqrdmulhSQRDMULHMSigned saturating rounding doubling multiply returning high half (by element)SQRDMULHVd, Vn, Vm.Ts[index] SQRDMULHVd.T, Vn.T, Vm.Ts[index]SQRDMULHVd, Vn, VmSQRDMULHVd.T, Vn.T, Vm.TSQRDMULHZd.T, Zn.T, Zm.TSQRDMULHZd.H, Zn.H, Zm.H[imm]SQRDMULHZd.S, Zn.S, Zm.S[imm]SQRDMULHZd.D, Zn.D, Zm.D[imm]BFMAXbfmaxBFMAXDetermine the maximum of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors./BFMAX{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, Zm.H/BFMAX{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, Zm.H:BFMAX{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, { Zm1.H-Zm2.H }:BFMAX{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, { Zm1.H-Zm4.H }BFMAXZdn.H, Pg/M, Zdn.H, Zm.HYIELDyieldYIELDYieldYIELDRSHRNrshrnRSHRN'Rounding shift right narrow (immediate)RSHRN{2} Vd.Tb, Vn.Ta, #shiftPRFHprfhPRFHGather prefetch of halfwords from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive addresses are not prefetched from memory.PRFHprfop, Pg, [Zn.S{, #imm}]PRFHprfop, Pg, [Zn.D{, #imm}]&PRFHprfop, Pg, [Xn|SP{, #imm, MUL VL}]"PRFHprfop, Pg, [Xn|SP, Xm, LSL #1]$PRFHprfop, Pg, [Xn|SP, Zm.S, mod #1]$PRFHprfop, Pg, [Xn|SP, Zm.D, mod #1]$PRFHprfop, Pg, [Xn|SP, Zm.D, LSL #1]USMMLAusmmlaUSMMLAEUnsigned and signed 8-bit integer matrix multiply-accumulate (vector)USMMLAVd.4S, Vn.16B, Vm.16BUSMMLAZda.S, Zn.B, Zm.BBIFbifBIFBitwise insert if falseBIFVd.T, Vn.T, Vm.TSYSLsyslSYSLSystem instruction with resultSYSLXt, #op1, Cn, Cm, #op2SM3TT1Asm3tt1aSM3TT1ASM3TT1ASM3TT1AVd.4S, Vn.4S, Vm.S[imm2]FMAXPfmaxpFMAXP3Floating-point maximum of pair of elements (scalar)FMAXP Hd, Vn.2H FMAXPVd, Vn.TFMAXPVd.T, Vn.T, Vm.TFMAXPVd.T, Vn.T, Vm.TFMAXPZdn.T, Pg/M, Zdn.T, Zm.TUMAXVumaxvUMAXVUnsigned maximum across vector UMAXVVd, Vn.TUMAXVVd, Pg, Zn.TST3Wst3wST3W2Contiguous store three-word structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,8ST3W{ Zt1.S, Zt2.S, Zt3.S }, Pg, [Xn|SP{, #imm, MUL VL}]4ST3W{ Zt1.S, Zt2.S, Zt3.S }, Pg, [Xn|SP, Xm, LSL #2]FMINNMVfminnmvFMINNMV+Floating-point minimum number across vectorFMINNMVVd, Vn.TFMINNMV Sd, Vn.4SFMINNMVVd, Pg, Zn.TLDNT1Wldnt1wLDNT1WContiguous load non-temporal of words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 5LDNT1W{ Zt1.S-Zt2.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]5LDNT1W{ Zt1.S-Zt4.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]1LDNT1W{ Zt1.S-Zt2.S }, PNg/Z, [Xn|SP, Xm, LSL #2]1LDNT1W{ Zt1.S-Zt4.S }, PNg/Z, [Xn|SP, Xm, LSL #2]6LDNT1W{ Zt1.S, Zt2.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]DLDNT1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg/Z, [Xn|SP{, #imm, MUL VL}]2LDNT1W{ Zt1.S, Zt2.S }, PNg/Z, [Xn|SP, Xm, LSL #2]@LDNT1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg/Z, [Xn|SP, Xm, LSL #2]"LDNT1W{ Zt.S }, Pg/Z, [Zn.S{, Xm}]"LDNT1W{ Zt.D }, Pg/Z, [Zn.D{, Xm}]-LDNT1W{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}])LDNT1W{ Zt.S }, Pg/Z, [Xn|SP, Xm, LSL #2]STLXRstlxrSTLXR Store-release exclusive registerSTLXRWs, Wt, [Xn|SP{, #0}]STLXRWs, Xt, [Xn|SP{, #0}]SQSHLUsqshluSQSHLU1Signed saturating shift left unsigned (immediate)SQSHLUVd, Vn, #shiftSQSHLUVd.T, Vn.T, #shift SQSHLUZdn.T, Pg/M, Zdn.T, #constSRIsriSRI"Shift right and insert (immediate)SRI Dd, Dn, #shiftSRIVd.T, Vn.T, #shiftSRIZd.T, Zn.T, #constLDNT1SBldnt1sbLDNT1SB,Gather load non-temporal of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.#LDNT1SB{ Zt.S }, Pg/Z, [Zn.S{, Xm}]#LDNT1SB{ Zt.D }, Pg/Z, [Zn.D{, Xm}]SHLshlSHLShift left (immediate)SHL Dd, Dn, #shiftSHLVd.T, Vn.T, #shiftCPYFPWTcpyfpwtCPYFPWT-Memory copy forward-only, writes unprivilegedCPYFPWT [Xd]!, [Xs]!, Xn!CPYFMWT [Xd]!, [Xs]!, Xn!CPYFEWT [Xd]!, [Xs]!, Xn!MADDmaddMADD Multiply-addMADDWd, Wn, Wm, WaMADDXd, Xn, Xm, XaSSUBWBssubwbSSUBWB Subtract the even-numbered signed elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.SSUBWBZd.T, Zn.T, Zm.TbCPYPTRNcpyptrnCPYPTRN>Memory copy, reads and writes unprivileged, reads non-temporalCPYPTRN [Xd]!, [Xs]!, Xn!CPYMTRN [Xd]!, [Xs]!, Xn!CPYETRN [Xd]!, [Xs]!, Xn!FCVTLfcvtlFCVTL8Floating-point convert to higher precision long (vector)FCVTL{2} Vd.Ta, Vn.TbFCVTL{ Zd1.S-Zd2.S }, Zn.HSQXTNTsqxtntSQXTNTSaturate the signed integer value in each source element to half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.SQXTNTZd.T, Zn.TbFMLSLfmlslFMLSLIFloating-point fused multiply-subtract long from accumulator (by element) FMLSLVd.Ta, Vn.Tb, Vm.H[index]FMLSL2Vd.Ta, Vn.Tb, Vm.H[index]FMLSLVd.Ta, Vn.Tb, Vm.TbFMLSL2Vd.Ta, Vn.Tb, Vm.Tb0FMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CFMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CFMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])FMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<FMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<FMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGFMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GFMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }BFMINbfminBFMINDetermine the minimum of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors./BFMIN{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, Zm.H/BFMIN{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, Zm.H:BFMIN{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, { Zm1.H-Zm2.H }:BFMIN{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, { Zm1.H-Zm4.H }BFMINZdn.H, Pg/M, Zdn.H, Zm.HLDRHldrhLDRH"Load register halfword (immediate)LDRHWt, [Xn|SP], #simmLDRHWt, [Xn|SP, #simm]!LDRHWt, [Xn|SP{, #pimm}]+LDRHWt, [Xn|SP, (Wm|Xm){, extend {amount}}]UQCVTuqcvtUQCVTSaturate the unsigned integer value in each element of the two source vectors to half the original source element width, and place the results in the half-width destination elements.UQCVTZd.H, { Zn1.S-Zn2.S }UQCVTZd.T, { Zn1.Tb-Zn4.Tb }SHA1Psha1pSHA1PSHA1 hash update (parity)SHA1PQd, Sn, Vm.4SASRDasrdASRDShift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The result rounds toward zero as in a signed division. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.ASRDZdn.T, Pg/M, Zdn.T, #const AUTIASPPCR autiasppcr AUTIASPPCR9Authenticate return address using key A, using a register AUTIASPPCRXnCPYFPRNcpyfprnCPYFPRN,Memory copy forward-only, reads non-temporalCPYFPRN [Xd]!, [Xs]!, Xn!CPYFMRN [Xd]!, [Xs]!, Xn!CPYFERN [Xd]!, [Xs]!, Xn!UQSUBRuqsubrUQSUBR2Subtract active unsigned elements of the first source vector from corresponding unsigned elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2UQSUBRZdn.T, Pg/M, Zdn.T, Zm.TUADDWTuaddwtUADDWTAdd the odd-numbered unsigned elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.UADDWTZd.T, Zn.T, Zm.TbLDXRHldxrhLDXRH Load exclusive register halfwordLDXRHWt, [Xn|SP{, #0}]ST64BV0st64bv0ST64BV07Single-copy atomic 64-byte EL0 store with status resultST64BV0Xs, Xt, [Xn|SP]SSHLLBsshllbSSHLLB1Shift left by immediate each even-numbered signed element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.SSHLLBZd.T, Zn.Tb, #constLD2ld2LD23Load multiple 2-element structures to two registersLD2 {Vt.T, Vt2.T }, [Xn|SP]!LD2 {Vt.T, Vt2.T }, [Xn|SP], imm LD2 {Vt.T, Vt2.T }, [Xn|SP], Xm#LD2 {Vt.B, Vt2.B }[index], [Xn|SP]#LD2 {Vt.H, Vt2.H }[index], [Xn|SP]#LD2 {Vt.S, Vt2.S }[index], [Xn|SP]#LD2 {Vt.D, Vt2.D }[index], [Xn|SP]'LD2 {Vt.B, Vt2.B }[index], [Xn|SP], #2'LD2 {Vt.B, Vt2.B }[index], [Xn|SP], Xm'LD2 {Vt.H, Vt2.H }[index], [Xn|SP], #4'LD2 {Vt.H, Vt2.H }[index], [Xn|SP], Xm'LD2 {Vt.S, Vt2.S }[index], [Xn|SP], #8'LD2 {Vt.S, Vt2.S }[index], [Xn|SP], Xm(LD2 {Vt.D, Vt2.D }[index], [Xn|SP], #16'LD2 {Vt.D, Vt2.D }[index], [Xn|SP], XmFNMSBfnmsbFNMSBbMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.FNMSBZdn.T, Pg/M, Zm.T, Za.TLDFF1SWldff1swLDFF1SWWGather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.%LDFF1SW{ Zt.D }, Pg/Z, [Zn.D{, #imm}],LDFF1SW{ Zt.D }, Pg/Z, [Xn|SP{, Xm, LSL #2}],LDFF1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #2])LDFF1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod],LDFF1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #2]$LDFF1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]SQINCDsqincdSQINCDkDetermines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQINCDXdn, Wdn{, pattern{, MUL #imm}} SQINCDXdn{, pattern{, MUL #imm}}"SQINCDZdn.D{, pattern{, MUL #imm}}USHLLTushlltUSHLLT2Shift left by immediate each odd-numbered unsigned element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.USHLLTZd.T, Zn.Tb, #constAESEaeseAESEAES single round encryptionAESEVd.16B, Vn.16BAESEZdn.B, Zdn.B, Zm.BLDUMINAHlduminahLDUMINAH-Atomic unsigned minimum on halfword in memoryLDUMINAHWs, Wt, [Xn|SP]LDUMINALHWs, Wt, [Xn|SP]LDUMINHWs, Wt, [Xn|SP]LDUMINLHWs, Wt, [Xn|SP]HLThltHLTHalt instruction HLT #immLDFF1Bldff1bLDFF1BHGather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector. $LDFF1B{ Zt.S }, Pg/Z, [Zn.S{, #imm}]$LDFF1B{ Zt.D }, Pg/Z, [Zn.D{, #imm}]#LDFF1B{ Zt.B }, Pg/Z, [Xn|SP{, Xm}]#LDFF1B{ Zt.H }, Pg/Z, [Xn|SP{, Xm}]#LDFF1B{ Zt.S }, Pg/Z, [Xn|SP{, Xm}]#LDFF1B{ Zt.D }, Pg/Z, [Xn|SP{, Xm}](LDFF1B{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod](LDFF1B{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod]#LDFF1B{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]MOVPRFXmovprfxMOVPRFXThe predicated MOVPRFXZd.T, Pg/ZM, Zn.T MOVPRFXZd, ZnBFMINNMbfminnmBFMINNMDetermine the minimum number value of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.1BFMINNM{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, Zm.H1BFMINNM{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, Zm.H<BFMINNM{ Zdn1.H-Zdn2.H }, { Zdn1.H-Zdn2.H }, { Zm1.H-Zm2.H }<BFMINNM{ Zdn1.H-Zdn4.H }, { Zdn1.H-Zdn4.H }, { Zm1.H-Zm4.H }BFMINNMZdn.H, Pg/M, Zdn.H, Zm.HFRINT64Xfrint64xFRINT64XLFloating-point round to 64-bit integer, using current rounding mode (vector)FRINT64XVd.T, Vn.TFRINT64XSd, SnFRINT64XDd, DnLDADDABldaddabLDADDABAtomic add on byte in memoryLDADDABWs, Wt, [Xn|SP]LDADDALBWs, Wt, [Xn|SP]LDADDBWs, Wt, [Xn|SP]LDADDLBWs, Wt, [Xn|SP]ADDHAaddhaADDHAAdd each element of the source vector to the corresponding active element of each horizontal slice of a ZA tile. The tile elements are predicated by a pair of governing predicates. An element of a horizontal slice is considered active if its corresponding element in the second governing predicate is TRUE and the element corresponding to its horizontal slice number in the first governing predicate is TRUE. Inactive elements in the destination tile remain unmodified.ADDHAZAda.S, Pn/M, Pm/M, Zn.SADDHAZAda.D, Pn/M, Pm/M, Zn.DSETFFRsetffrSETFFR%Initialise the first-fault register (SETFFRSSUBLssublSSUBLSigned subtract longSSUBL{2} Vd.Ta, Vn.Tb, Vm.TbCSELcselCSELConditional selectCSELWd, Wn, Wm, condCSELXd, Xn, Xm, condCASHcashCASH#Compare and swap halfword in memoryCASHWs, Wt, [Xn|SP{, #0}]CASAHWs, Wt, [Xn|SP{, #0}]CASALHWs, Wt, [Xn|SP{, #0}]CASLHWs, Wt, [Xn|SP{, #0}]ANDQVandqvANDQVBitwise AND of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as all ones.ANDQVVd.T, Pg, Zn.TbBF1CVTLbf1cvtlBF1CVTL18-bit floating-point convert to BFloat16 (vector)BF1CVTL{2} Vd.8H, Vn.TaBF2CVTL{2} Vd.8H, Vn.TaBF1CVTL{ Zd1.H-Zd2.H }, Zn.BBF2CVTL{ Zd1.H-Zd2.H }, Zn.BBFMLALbfmlalBFMLAL?BFloat16 floating-point widening multiply-add long (by element) #BFMLALbt Vd.4S, Vn.8H, Vm.H[index]BFMLALbt Vd.4S, Vn.8H, Vm.8H0BFMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CBFMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CBFMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])BFMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<BFMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<BFMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGBFMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GBFMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }SQDMULLTsqdmulltSQDMULLTMultiply the corresponding odd-numbered signed elements of the first and second source vectors, double and place the results in the overlapping double-width elements of the destination vector. Each result element is saturated to the double-width N-bit element's signed integer range -2SQDMULLTZd.T, Zn.Tb, Zm.TbSQDMULLTZd.S, Zn.H, Zm.H[imm]SQDMULLTZd.D, Zn.S, Zm.S[imm]XPACDxpacdXPACD!Strip Pointer Authentication CodeXPACDXdXPACIXdXPACLRIUDIVudivUDIVUnsigned divideUDIVWd, Wn, WmUDIVXd, Xn, XmUDIVZdn.T, Pg/M, Zdn.T, Zm.TSADDLVsaddlvSADDLVSigned add long across vectorSADDLVVd, Vn.TLSRVlsrvLSRVLogical shift right variableLSRVWd, Wn, WmLSRVXd, Xn, XmFMAXNMfmaxnmFMAXNM&Floating-point maximum number (vector) FMAXNMVd.T, Vn.T, Vm.TFMAXNMVd.T, Vn.T, Vm.TFMAXNMHd, Hn, HmFMAXNMSd, Sn, SmFMAXNMDd, Dn, Dm0FMAXNM{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T0FMAXNM{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T;FMAXNM{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T };FMAXNM{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }FMAXNMZdn.T, Pg/M, Zdn.T, constFMAXNMZdn.T, Pg/M, Zdn.T, Zm.TCMPLOcmploCMPLOCMPLO (vectors)PCompare active unsigned integer elements in the first source vector being lower than corresponding unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the CMPLO Pd.T, Pg/Z, Zm.T, Zn.TCMPHI Pd.T, Pg/Z, Zn.T, Zm.TSM3TT2Asm3tt2aSM3TT2ASM3TT2ASM3TT2AVd.4S, Vn.4S, Vm.S[imm2]LDCLRHldclrhLDCLRH6Atomic bit clear on halfword in memory, without return)STCLRHWs, [Xn|SP]LDCLRH Ws, WZR, [Xn|SP]+STCLRLHWs, [Xn|SP]LDCLRLH Ws, WZR, [Xn|SP]FMLSLTfmlsltFMLSLTThis half-precision floating-point multiply-subtract long instruction widens the odd-numbered half-precision elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding half-precision elements in the source vectors. This instruction is unpredicated.FMLSLTZda.S, Zn.H, Zm.HFMLSLTZda.S, Zn.H, Zm.H[imm]FMLALLfmlallFMLALL7This 8-bit floating-point multiply-add long long instruction widens all 8-bit floating-point elements in the one, two, or four first source vectors and the indexed element of the second source vector to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 20FMLALL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B[index]CFMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CFMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index])FMLALL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B<FMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B<FMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.BGFMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, { Zm1.B-Zm2.B }GFMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, { Zm1.B-Zm4.B }ADCLBadclbADCLBTAdd the even-numbered elements of the first source vector and the 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector to the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.ADCLBZda.T, Zn.T, Zm.TINCPincpINCPxCounts the number of true elements in the source predicate and then uses the result to increment the scalar destination. INCPXdn, Pm.TINCPZdn.T, Pm.TLDSMAXABldsmaxabLDSMAXAB'Atomic signed maximum on byte in memoryLDSMAXABWs, Wt, [Xn|SP]LDSMAXALBWs, Wt, [Xn|SP]LDSMAXBWs, Wt, [Xn|SP]LDSMAXLBWs, Wt, [Xn|SP]LDNT1Dldnt1dLDNT1D!Contiguous load non-temporal of doublewords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 5LDNT1D{ Zt1.D-Zt2.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]5LDNT1D{ Zt1.D-Zt4.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]1LDNT1D{ Zt1.D-Zt2.D }, PNg/Z, [Xn|SP, Xm, LSL #3]1LDNT1D{ Zt1.D-Zt4.D }, PNg/Z, [Xn|SP, Xm, LSL #3]6LDNT1D{ Zt1.D, Zt2.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]DLDNT1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg/Z, [Xn|SP{, #imm, MUL VL}]2LDNT1D{ Zt1.D, Zt2.D }, PNg/Z, [Xn|SP, Xm, LSL #3]@LDNT1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg/Z, [Xn|SP, Xm, LSL #3]"LDNT1D{ Zt.D }, Pg/Z, [Zn.D{, Xm}]-LDNT1D{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}])LDNT1D{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #3]SABALsabalSABAL.Signed absolute difference and accumulate longSABAL{2} Vd.Ta, Vn.Tb, Vm.TbNOTnotNOTBitwise NOT (vector) NOTVd.T, Vn.TNOTZd.T, Pg/M, Zn.TNOT (predicate)Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.NOT Pd.B, Pg/Z, Pn.BEOR Pd.B, Pg/Z, Pn.B, Pg.BSTNT1Bstnt1bSTNT1BContiguous store non-temporal of bytes from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 3STNT1B{ Zt1.B-Zt2.B }, PNg, [Xn|SP{, #imm, MUL VL}]3STNT1B{ Zt1.B-Zt4.B }, PNg, [Xn|SP{, #imm, MUL VL}]'STNT1B{ Zt1.B-Zt2.B }, PNg, [Xn|SP, Xm]'STNT1B{ Zt1.B-Zt4.B }, PNg, [Xn|SP, Xm]4STNT1B{ Zt1.B, Zt2.B }, PNg, [Xn|SP{, #imm, MUL VL}]BSTNT1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg, [Xn|SP{, #imm, MUL VL}](STNT1B{ Zt1.B, Zt2.B }, PNg, [Xn|SP, Xm]6STNT1B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, PNg, [Xn|SP, Xm] STNT1B{ Zt.S }, Pg, [Zn.S{, Xm}] STNT1B{ Zt.D }, Pg, [Zn.D{, Xm}]+STNT1B{ Zt.B }, Pg, [Xn|SP{, #imm, MUL VL}]STNT1B{ Zt.B }, Pg, [Xn|SP, Xm]UZPQ1uzpq1UZPQ1Concatenate adjacent even-numbered elements from the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated.UZPQ1Zd.T, Zn.T, Zm.TNANDnandNAND2Bitwise NAND active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.NANDPd.B, Pg/Z, Pn.B, Pm.BRCWSSWPPrcwsswppRCWSSWPP1Read check write software swap quadword in memoryRCWSSWPPXt1, Xt2, [Xn|SP]RCWSSWPPAXt1, Xt2, [Xn|SP]RCWSSWPPALXt1, Xt2, [Xn|SP]RCWSSWPPLXt1, Xt2, [Xn|SP]UDOTudotUDOT4Dot product unsigned arithmetic (vector, by element)UDOTVd.Ta, Vn.Tb, Vm.4B[index]UDOTVd.Ta, Vn.Tb, Vm.TbUDOTZda.S, Zn.H, Zm.HUDOTZda.S, Zn.H, Zm.H[imm]UDOTZda.T, Zn.Tb, Zm.TbUDOTZda.S, Zn.B, Zm.B[imm]UDOTZda.D, Zn.H, Zm.H[imm]<UDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<UDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]5UDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5UDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@UDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@UDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }<UDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]<UDOT ZA.D[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<UDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]<UDOT ZA.D[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]8UDOT ZA.T[Wv, offs{, VGx2}], { Zn1.Tb-Zn2.Tb }, Zm.Tb8UDOT ZA.T[Wv, offs{, VGx4}], { Zn1.Tb-Zn4.Tb }, Zm.TbDUDOT ZA.T[Wv, offs{, VGx2}], { Zn1.Tb-Zn2.Tb }, { Zm1.Tb-Zm2.Tb }DUDOT ZA.T[Wv, offs{, VGx4}], { Zn1.Tb-Zn4.Tb }, { Zm1.Tb-Zm4.Tb }SQSHRUNsqshrunSQSHRUN9Signed saturating shift right unsigned narrow (immediate)SQSHRUNVbd, Van, #shift SQSHRUN{2} Vd.Tb, Vn.Ta, #shiftXTNxtnXTNExtract narrowXTN{2} Vd.Tb, Vn.TaBFMOPAbfmopaBFMOPAqThe BFloat16 floating-point sum of outer products and accumulate instruction works with a 32-bit element ZA tile.$BFMOPAZAda.S, Pn/M, Pm/M, Zn.H, Zm.H$BFMOPAZAda.H, Pn/M, Pm/M, Zn.H, Zm.HSETPTsetptSETPTMemory set, unprivilegedSETPT [Xd]!, Xn!, XsSETMT [Xd]!, Xn!, XsSETET [Xd]!, Xn!, XsUSUBLTusubltUSUBLTSubtract the odd-numbered unsigned elements of the second source vector from the corresponding unsigned elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.USUBLTZd.T, Zn.Tb, Zm.TbLDLARBldlarbLDLARBLoad LOAcquire register byteLDLARBWt, [Xn|SP{, #0}]PEXTpextPEXT$Converts the source predicate-as-counter into a four register wide predicate-as-mask, and copies the portion of the mask value selected by the portion index to the destination predicate register. A portion corresponds to a one predicate register fraction of the wider predicate-as-mask value.PEXTPd.T, PNn[imm]PEXT{ Pd1.T, Pd2.T }, PNn[imm]SUBRsubrSUBRReversed subtract active elements of the first source vector from corresponding elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.SUBRZdn.T, Pg/M, Zdn.T, Zm.TSUBRZdn.T, Zdn.T, #imm{, shift}FNMADfnmadFNMAD[Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.FNMADZdn.T, Pg/M, Zm.T, Za.TORNornORN!Bitwise inclusive OR NOT (vector)ORNVd.T, Vn.T, Vm.TORNWd, Wn, Wm{, shift #amount}ORNXd, Xn, Xm{, shift #amount}ORNPd.B, Pg/Z, Pn.B, Pm.BORN (immediate)KBitwise inclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.ORN Zdn.T, Zdn.T, #constORR Zdn.T, Zdn.T, #(-const - 1)PRFUMprfumPRFUM!Prefetch memory (unscaled offset)&PRFUM (prfop|#imm5), [Xn|SP{, #simm}]STTRHsttrhSTTRH&Store register halfword (unprivileged)STTRHWt, [Xn|SP{, #simm}]SUBHNTsubhntSUBHNT1Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.SUBHNTZd.T, Zn.Tb, Zm.TbFCVTMSfcvtmsFCVTMSQFloating-point convert to signed integer, rounding toward minus infinity (vector) FCVTMSHd, Hn FCVTMSVd, VnFCVTMSVd.T, Vn.TFCVTMSVd.T, Vn.T FCVTMSWd, Hn FCVTMSXd, Hn FCVTMSWd, Sn FCVTMSXd, Sn FCVTMSWd, Dn FCVTMSXd, DnST1Dst1dST1DContiguous store of doublewords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.1ST1D{ Zt1.D-Zt2.D }, PNg, [Xn|SP{, #imm, MUL VL}]1ST1D{ Zt1.D-Zt4.D }, PNg, [Xn|SP{, #imm, MUL VL}]-ST1D{ Zt1.D-Zt2.D }, PNg, [Xn|SP, Xm, LSL #3]-ST1D{ Zt1.D-Zt4.D }, PNg, [Xn|SP, Xm, LSL #3]2ST1D{ Zt1.D, Zt2.D }, PNg, [Xn|SP{, #imm, MUL VL}]@ST1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg, [Xn|SP{, #imm, MUL VL}].ST1D{ Zt1.D, Zt2.D }, PNg, [Xn|SP, Xm, LSL #3]<ST1D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, PNg, [Xn|SP, Xm, LSL #3] ST1D{ Zt.D }, Pg, [Zn.D{, #imm}])ST1D{ Zt.D }, Pg, [Xn|SP{, #imm, MUL VL}])ST1D{ Zt.Q }, Pg, [Xn|SP{, #imm, MUL VL}]%ST1D{ Zt.D }, Pg, [Xn|SP, Xm, LSL #3]%ST1D{ Zt.Q }, Pg, [Xn|SP, Xm, LSL #3]'ST1D{ Zt.D }, Pg, [Xn|SP, Zm.D, mod #3]$ST1D{ Zt.D }, Pg, [Xn|SP, Zm.D, mod]'ST1D{ Zt.D }, Pg, [Xn|SP, Zm.D, LSL #3]ST1D{ Zt.D }, Pg, [Xn|SP, Zm.D]4ST1D{ ZAtHV.D[Ws, offs] }, Pg, [Xn|SP{, Xm, LSL #3}]CDOTcdotCDOT The complex integer dot product instructions delimit the source vectors into pairs of 8-bit or 16-bit signed integer complex numbers. Within each pair, the complex numbers in the first source vector are multiplied by the corresponding complex numbers in the second source vector and the resulting wide real or wide imaginary part of the product is accumulated into a 32-bit or 64-bit destination vector element which overlaps all four of the elements that comprise a pair of complex number values in the first source vector.CDOTZda.T, Zn.Tb, Zm.Tb, const!CDOTZda.S, Zn.B, Zm.B[imm], const!CDOTZda.D, Zn.H, Zm.H[imm], constMOVSmovsMOVSMOVS (predicated)Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the MOVS Pd.B, Pg/Z, Pn.BANDS Pd.B, Pg/Z, Pn.B, Pn.BMOVS (unpredicated)Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated. Sets the MOVS Pd.B, Pn.BORRS Pd.B, Pn/Z, Pn.B, Pn.BFMLALfmlalFMLALBFloating-point fused multiply-add long to accumulator (by element)FMLALVd.Ta, Vn.Tb, Vm.H[index]FMLAL2Vd.Ta, Vn.Tb, Vm.H[index]FMLALVd.Ta, Vn.Tb, Vm.TbFMLAL2Vd.Ta, Vn.Tb, Vm.Tb0FMLAL ZA.H[Wv, offs1:offs2], Zn.B, Zm.B[index]CFMLAL ZA.H[Wv, offs1:offs2{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CFMLAL ZA.H[Wv, offs1:offs2{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index])FMLAL ZA.H[Wv, offs1:offs2], Zn.B, Zm.B<FMLAL ZA.H[Wv, offs1:offs2{, VGx2}], { Zn1.B-Zn2.B }, Zm.B<FMLAL ZA.H[Wv, offs1:offs2{, VGx4}], { Zn1.B-Zn4.B }, Zm.BGFMLAL ZA.H[Wv, offs1:offs2{, VGx2}], { Zn1.B-Zn2.B }, { Zm1.B-Zm2.B }GFMLAL ZA.H[Wv, offs1:offs2{, VGx4}], { Zn1.B-Zn4.B }, { Zm1.B-Zm4.B }0FMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CFMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CFMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])FMLAL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<FMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<FMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGFMLAL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GFMLAL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }STZ2Gstz2gSTZ2GStore Allocation Tags, zeroingSTZ2GXt|SP, [Xn|SP], #simmSTZ2GXt|SP, [Xn|SP, #simm]!STZ2GXt|SP, [Xn|SP{, #simm}]LDSETABldsetabLDSETAB Atomic bit set on byte in memoryLDSETABWs, Wt, [Xn|SP]LDSETALBWs, Wt, [Xn|SP]LDSETBWs, Wt, [Xn|SP]LDSETLBWs, Wt, [Xn|SP]USUBWTusubwtUSUBWT-Subtract the odd-numbered unsigned elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated. This instruction is unpredicated.USUBWTZd.T, Zn.T, Zm.TbUMINVuminvUMINVUnsigned minimum across vector UMINVVd, Vn.TUMINVVd, Pg, Zn.T CPYFPRTWN cpyfprtwn CPYFPRTWNAMemory copy forward-only, reads unprivileged, writes non-temporalCPYFPRTWN [Xd]!, [Xs]!, Xn!CPYFMRTWN [Xd]!, [Xs]!, Xn!CPYFERTWN [Xd]!, [Xs]!, Xn!FMADDfmaddFMADD*Floating-point fused multiply-add (scalar)FMADDHd, Hn, Hm, HaFMADDSd, Sn, Sm, SaFMADDDd, Dn, Dm, DaSHSUBRshsubrSHSUBR5Subtract active signed elements of the first source vector from corresponding signed elements of the second source vector, shift right one bit, and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.SHSUBRZdn.T, Pg/M, Zdn.T, Zm.TTTESTttestTTESTTest transaction stateTTESTXtUHSUBuhsubUHSUBUnsigned halving subtractUHSUBVd.T, Vn.T, Vm.TUHSUBZdn.T, Pg/M, Zdn.T, Zm.TRCWSETrcwsetRCWSET7Read check write atomic bit set on doubleword in memoryRCWSETXs, Xt, [Xn|SP]RCWSETAXs, Xt, [Xn|SP]RCWSETALXs, Xt, [Xn|SP]RCWSETLXs, Xt, [Xn|SP]SQCADDsqcaddSQCADDAdd the real and imaginary components of the integral complex numbers from the first source vector to the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation, equivalent to multiplying the complex numbers in the second source vector by ±SQCADDZdn.T, Zdn.T, Zm.T, constSUNPKsunpkSUNPKUnpack elements from one or two source vectors and then sign-extend them to place in elements of twice their size within the two or four destination vectors.SUNPK{ Zd1.T-Zd2.T }, Zn.Tb'SUNPK{ Zd1.T-Zd4.T }, { Zn1.Tb-Zn2.Tb } CPYFPRTRN cpyfprtrn CPYFPRTRN=Memory copy forward-only, reads unprivileged and non-temporalCPYFPRTRN [Xd]!, [Xs]!, Xn!CPYFMRTRN [Xd]!, [Xs]!, Xn!CPYFERTRN [Xd]!, [Xs]!, Xn!FCADDfcaddFCADDFloating-point complex addFCADDVd.T, Vn.T, Vm.T, #rotate$FCADDZdn.T, Pg/M, Zdn.T, Zm.T, constCPYFPRTcpyfprtCPYFPRT,Memory copy forward-only, reads unprivilegedCPYFPRT [Xd]!, [Xs]!, Xn!CPYFMRT [Xd]!, [Xs]!, Xn!CPYFERT [Xd]!, [Xs]!, Xn!UMAXPumaxpUMAXPUnsigned maximum pairwiseUMAXPVd.T, Vn.T, Vm.TUMAXPZdn.T, Pg/M, Zdn.T, Zm.TPFALSEpfalsePFALSE7Set all elements in the destination predicate to false. PFALSEPd.BSWPABswpabSWPABSwap byte in memorySWPABWs, Wt, [Xn|SP]SWPALBWs, Wt, [Xn|SP]SWPBWs, Wt, [Xn|SP]SWPLBWs, Wt, [Xn|SP]UQINCPuqincpUQINCPCounts the number of true elements in the source predicate and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.UQINCPWdn, Pm.TUQINCPXdn, Pm.TUQINCPZdn.T, Pm.TORQVorqvORQVBitwise inclusive OR of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as all zeros.ORQVVd.T, Pg, Zn.TbSBCSsbcsSBCS"Subtract with carry, setting flagsSBCSWd, Wn, WmSBCSXd, Xn, XmFAMINfaminFAMINFloating-point absolute minimumFAMINVd.T, Vn.T, Vm.TFAMINVd.T, Vn.T, Vm.T:FAMIN{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }:FAMIN{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }FAMINZdn.T, Pg/M, Zdn.T, Zm.TCSINVcsinvCSINVConditional select invertCSINVWd, Wn, Wm, condCSINVXd, Xn, Xm, condLD4Bld4bLD4B/Contiguous load four-byte structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,ALD4B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, Pg/Z, [Xn|SP{, #imm, MUL VL}]5LD4B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, Pg/Z, [Xn|SP, Xm]WHILEWRwhilewrWHILEWRnThis instruction checks two addresses for a conflict or overlap between address ranges of the form [addr,addr+WHILEWRPd.T, Xn, XmNEGSnegsNEGS NEGS -- A64Negate, setting flagsNEGS Wd, Wm{, shift #amount}"SUBS Wd, WZR, Wm{, shift #amount}NEGS Xd, Xm{, shift #amount}"SUBS Xd, XZR, Xm{, shift #amount}LD2Wld2wLD2W-Contiguous load two-word structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,3LD2W{ Zt1.S, Zt2.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]/LD2W{ Zt1.S, Zt2.S }, Pg/Z, [Xn|SP, Xm, LSL #2]URSRAursraURSRA8Unsigned rounding shift right and accumulate (immediate)URSRA Dd, Dn, #shiftURSRAVd.T, Vn.T, #shiftURSRAZda.T, Zn.T, #constSADDLBTsaddlbtSADDLBTAdd the even-numbered signed elements of the first source vector to the odd-numbered signed elements of the second source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SADDLBTZd.T, Zn.Tb, Zm.TbUDIVRudivrUDIVR Unsigned reversed divide active elements of the second source vector by corresponding elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.UDIVRZdn.T, Pg/M, Zdn.T, Zm.TFACLEfacleFACLEFACLECompare active absolute values of floating-point elements in the first source vector being less than or equal to corresponding absolute values of elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.FACLE Pd.T, Pg/Z, Zm.T, Zn.TFACGE Pd.T, Pg/Z, Zn.T, Zm.TSQSHRUNTsqshruntSQSHRUNT?Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2SQSHRUNTZd.T, Zn.Tb, #constSTNT1Wstnt1wSTNT1WContiguous store non-temporal of words from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 3STNT1W{ Zt1.S-Zt2.S }, PNg, [Xn|SP{, #imm, MUL VL}]3STNT1W{ Zt1.S-Zt4.S }, PNg, [Xn|SP{, #imm, MUL VL}]/STNT1W{ Zt1.S-Zt2.S }, PNg, [Xn|SP, Xm, LSL #2]/STNT1W{ Zt1.S-Zt4.S }, PNg, [Xn|SP, Xm, LSL #2]4STNT1W{ Zt1.S, Zt2.S }, PNg, [Xn|SP{, #imm, MUL VL}]BSTNT1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg, [Xn|SP{, #imm, MUL VL}]0STNT1W{ Zt1.S, Zt2.S }, PNg, [Xn|SP, Xm, LSL #2]>STNT1W{ Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg, [Xn|SP, Xm, LSL #2] STNT1W{ Zt.S }, Pg, [Zn.S{, Xm}] STNT1W{ Zt.D }, Pg, [Zn.D{, Xm}]+STNT1W{ Zt.S }, Pg, [Xn|SP{, #imm, MUL VL}]'STNT1W{ Zt.S }, Pg, [Xn|SP, Xm, LSL #2]ADRadrADRForm PC-relative address ADRXd, label#ADRZd.T, [Zn.T, Zm.T{, mod amount}]$ADRZd.D, [Zn.D, Zm.D, SXTW{ amount}]$ADRZd.D, [Zn.D, Zm.D, UXTW{ amount}]ADDPLaddplADDPL Add the current predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDPLXd|SP, Xn|SP, #immESBesbESBError synchronization barrierESBBFMLSbfmlsBFMLSUMultiply the corresponding active BFloat16 elements of the first and second source vectors and subtract from elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.BFMLSZda.H, Pg/M, Zn.H, Zm.HBFMLSZda.H, Zn.H, Zm.H[imm]<BFMLS ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<BFMLS ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]5BFMLS ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5BFMLS ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@BFMLS ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@BFMLS ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }CPYFPcpyfpCPYFPMemory copy forward-onlyCPYFP [Xd]!, [Xs]!, Xn!CPYFM [Xd]!, [Xs]!, Xn!CPYFE [Xd]!, [Xs]!, Xn!FCVTZUfcvtzuFCVTZUMFloating-point convert to unsigned fixed-point, rounding toward zero (vector)FCVTZUVd, Vn, #fbitsFCVTZUVd.T, Vn.T, #fbits FCVTZUHd, Hn FCVTZUVd, VnFCVTZUVd.T, Vn.TFCVTZUVd.T, Vn.TFCVTZUWd, Hn, #fbitsFCVTZUXd, Hn, #fbitsFCVTZUWd, Sn, #fbitsFCVTZUXd, Sn, #fbitsFCVTZUWd, Dn, #fbitsFCVTZUXd, Dn, #fbits FCVTZUWd, Hn FCVTZUXd, Hn FCVTZUWd, Sn FCVTZUXd, Sn FCVTZUWd, Dn FCVTZUXd, Dn&FCVTZU{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }&FCVTZU{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }FCVTZUZd.H, Pg/M, Zn.HFCVTZUZd.S, Pg/M, Zn.HFCVTZUZd.D, Pg/M, Zn.HFCVTZUZd.S, Pg/M, Zn.SFCVTZUZd.D, Pg/M, Zn.SFCVTZUZd.S, Pg/M, Zn.DFCVTZUZd.D, Pg/M, Zn.DLD4Dld4dLD4D5Contiguous load four-doubleword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,ALD4D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]=LD4D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, Pg/Z, [Xn|SP, Xm, LSL #3]PRFMprfmPRFMPrefetch memory (immediate)%PRFM (prfop|#imm5), [Xn|SP{, #pimm}]PRFM (prfop|#imm5), label8PRFM (prfop|#imm5), [Xn|SP, (Wm|Xm){, extend {amount}}]BLblBLBranch with linkBLlabelSVCsvcSVCSupervisor call SVC #immTBXQtbxqTBXQ(For each 128-bit destination vector segment, reads each element of the corresponding second source (index) vector segment and uses its value to select an indexed element from the corresponding first source (table) vector segment. The indexed table element is placed in the element of the destination vector that corresponds to the index vector element. If an index value is greater than or equal to the number of elements in a 128-bit vector segment then the corresponding destination vector element is left unchanged. This instruction is unpredicated.TBXQZd.T, Zn.T, Zm.TBFIbfiBFI BFI -- A64Bitfield insertBFI Wd, Wn, #lsb, #width)BFM Wd, Wn, #(-lsb MOD 32), #(width-1)BFI Xd, Xn, #lsb, #width)BFM Xd, Xn, #(-lsb MOD 64), #(width-1)DECPdecpDECPxCounts the number of true elements in the source predicate and then uses the result to decrement the scalar destination. DECPXdn, Pm.TDECPZdn.T, Pm.TSTILPstilpSTILP'Store-release ordered pair of registersSTILPWt1, Wt2, [Xn|SP, #-8]!STILPWt1, Wt2, [Xn|SP]STILPXt1, Xt2, [Xn|SP, #-16]!STILPXt1, Xt2, [Xn|SP]USHLushlUSHLUnsigned shift left (register)USHL Dd, Dn, DmUSHLVd.T, Vn.T, Vm.TBFSUBbfsubBFSUB#Subtract active BFloat16 elements of the second source vector from corresponding BFloat16 elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.BFSUBZdn.H, Pg/M, Zdn.H, Zm.HBFSUBZd.H, Zn.H, Zm.H/BFSUB ZA.H[Wv, offs{, VGx2}], { Zm1.H-Zm2.H }/BFSUB ZA.H[Wv, offs{, VGx4}], { Zm1.H-Zm4.H }WHILEGEwhilegeWHILEGEGenerate a predicate that starting from the highest numbered element is true while the decrementing value of the first, signed scalar operand is greater than or equal to the second scalar operand and false thereafter down to the lowest numbered element.WHILEGEPd.T, Rn, RmWHILEGEPNd.T, Xn, Xm, vlWHILEGE{ Pd1.T, Pd2.T }, Xn, XmLDSETPldsetpLDSETP$Atomic bit set on quadword in memoryLDSETPXt1, Xt2, [Xn|SP]LDSETPAXt1, Xt2, [Xn|SP]LDSETPALXt1, Xt2, [Xn|SP]LDSETPLXt1, Xt2, [Xn|SP]STXRstxrSTXRStore exclusive registerSTXRWs, Wt, [Xn|SP{, #0}]STXRWs, Xt, [Xn|SP{, #0}]SEVLsevlSEVLSend event localSEVLCMLAcmlaCMLAMultiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of the integral numbers in the first source vector by the corresponding complex number in the second source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation.CMLAZda.T, Zn.T, Zm.T, const!CMLAZda.H, Zn.H, Zm.H[imm], const!CMLAZda.S, Zn.S, Zm.S[imm], constLDCLRldclrLDCLR0Atomic bit clear on word or doubleword in memory LDCLRWs, Wt, [Xn|SP]LDCLRAWs, Wt, [Xn|SP]LDCLRALWs, Wt, [Xn|SP]LDCLRLWs, Wt, [Xn|SP]LDCLRXs, Xt, [Xn|SP]LDCLRAXs, Xt, [Xn|SP]LDCLRALXs, Xt, [Xn|SP]LDCLRLXs, Xt, [Xn|SP]'STCLRWs, [Xn|SP]LDCLR Ws, WZR, [Xn|SP])STCLRLWs, [Xn|SP]LDCLRL Ws, WZR, [Xn|SP]'STCLRXs, [Xn|SP]LDCLR Xs, XZR, [Xn|SP])STCLRLXs, [Xn|SP]LDCLRL Xs, XZR, [Xn|SP]LDCLRABldclrabLDCLRAB"Atomic bit clear on byte in memoryLDCLRABWs, Wt, [Xn|SP]LDCLRALBWs, Wt, [Xn|SP]LDCLRBWs, Wt, [Xn|SP]LDCLRLBWs, Wt, [Xn|SP] AUTIBSPPC autibsppc AUTIBSPPCBAuthenticate return address using key B, using an immediate offsetAUTIBSPPClabelBICSbicsBICS3Bitwise bit clear (shifted register), setting flagsBICSWd, Wn, Wm{, shift #amount}BICSXd, Xn, Xm{, shift #amount}BICSPd.B, Pg/Z, Pn.B, Pm.BFMINVfminvFMINV$Floating-point minimum across vector FMINVVd, Vn.TFMINV Sd, Vn.4SFMINVVd, Pg, Zn.TLDAPURSWldapurswLDAPURSW1Load-acquire RCpc register signed word (unscaled)LDAPURSWXt, [Xn|SP{, #simm}]LD1Hld1hLD1HContiguous load of unsigned halfwords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.3LD1H{ Zt1.H-Zt2.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]3LD1H{ Zt1.H-Zt4.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]/LD1H{ Zt1.H-Zt2.H }, PNg/Z, [Xn|SP, Xm, LSL #1]/LD1H{ Zt1.H-Zt4.H }, PNg/Z, [Xn|SP, Xm, LSL #1]4LD1H{ Zt1.H, Zt2.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]BLD1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]0LD1H{ Zt1.H, Zt2.H }, PNg/Z, [Xn|SP, Xm, LSL #1]>LD1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg/Z, [Xn|SP, Xm, LSL #1]"LD1H{ Zt.S }, Pg/Z, [Zn.S{, #imm}]"LD1H{ Zt.D }, Pg/Z, [Zn.D{, #imm}]+LD1H{ Zt.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1H{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]+LD1H{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]'LD1H{ Zt.H }, Pg/Z, [Xn|SP, Xm, LSL #1]'LD1H{ Zt.S }, Pg/Z, [Xn|SP, Xm, LSL #1]'LD1H{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #1])LD1H{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod #1])LD1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #1]&LD1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod]&LD1H{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod])LD1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #1]!LD1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]6LD1H{ ZAtHV.H[Ws, offs] }, Pg/Z, [Xn|SP{, Xm, LSL #1}]LDSMINBldsminbLDSMINB7Atomic signed minimum on byte in memory, without return+STSMINBWs, [Xn|SP]LDSMINB Ws, WZR, [Xn|SP]-STSMINLBWs, [Xn|SP]LDSMINLB Ws, WZR, [Xn|SP]ZIP2zip2ZIP2Interleave alternating elements from the lowest or highest halves of the first and second source predicates and place in elements of the destination predicate. This instruction is unpredicated.ZIP2Pd.T, Pn.T, Pm.TZIP1Pd.T, Pn.T, Pm.TZIP2Zd.T, Zn.T, Zm.TZIP2Zd.Q, Zn.Q, Zm.QZIP1Zd.T, Zn.T, Zm.TZIP1Zd.Q, Zn.Q, Zm.QZIP2Vd.T, Vn.T, Vm.TUABDLBuabdlbUABDLB0Compute the absolute difference between the even-numbered unsigned integer values in elements of the second source vector and the corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.UABDLBZd.T, Zn.Tb, Zm.TbUCVTFucvtfUCVTF7Unsigned fixed-point convert to floating-point (vector)UCVTFVd, Vn, #fbitsUCVTFVd.T, Vn.T, #fbits UCVTFHd, Hn UCVTFVd, VnUCVTFVd.T, Vn.TUCVTFVd.T, Vn.TUCVTFHd, Wn, #fbitsUCVTFHd, Xn, #fbitsUCVTFSd, Wn, #fbitsUCVTFSd, Xn, #fbitsUCVTFDd, Wn, #fbitsUCVTFDd, Xn, #fbits UCVTFHd, Wn UCVTFSd, Wn UCVTFDd, Wn UCVTFHd, Xn UCVTFSd, Xn UCVTFDd, Xn%UCVTF{ Zd1.S-Zd2.S }, { Zn1.S-Zn2.S }%UCVTF{ Zd1.S-Zd4.S }, { Zn1.S-Zn4.S }UCVTFZd.H, Pg/M, Zn.HUCVTFZd.H, Pg/M, Zn.SUCVTFZd.S, Pg/M, Zn.SUCVTFZd.D, Pg/M, Zn.SUCVTFZd.H, Pg/M, Zn.DUCVTFZd.S, Pg/M, Zn.DUCVTFZd.D, Pg/M, Zn.DLDXRBldxrbLDXRBLoad exclusive register byteLDXRBWt, [Xn|SP{, #0}]STLRBstlrbSTLRBStore-release register byteSTLRBWt, [Xn|SP{, #0}]UMINuminUMINUnsigned minimum (vector) UMINVd.T, Vn.T, Vm.TUMINWd, Wn, #uimmUMINXd, Xn, #uimm.UMIN{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T.UMIN{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T9UMIN{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }9UMIN{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }UMINWd, Wn, WmUMINXd, Xn, XmUMINZdn.T, Pg/M, Zdn.T, Zm.TUMINZdn.T, Zdn.T, #immUQXTNTuqxtntUQXTNTSaturate the unsigned integer value in each source element to half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.UQXTNTZd.T, Zn.TbNEGnegNEGNegate (vector) NEG Dd, Dn NEGVd.T, Vn.TNEGZd.T, Pg/M, Zn.TNEG (shifted register) -- A64Negate (shifted register)NEG Wd, Wm{, shift #amount}!SUB Wd, WZR, Wm{, shift #amount}NEG Xd, Xm{, shift #amount}!SUB Xd, XZR, Xm{, shift #amount}ANDSandsANDS&Bitwise AND (immediate), setting flagsANDSWd, Wn, #immANDSXd, Xn, #immANDSWd, Wn, Wm{, shift #amount}ANDSXd, Xn, Xm{, shift #amount}ANDSPd.B, Pg/Z, Pn.B, Pm.BSTGMstgmSTGMStore Allocation Tag multipleSTGMXt, [Xn|SP]UMLSLBumlslbUMLSLBMultiply the corresponding even-numbered unsigned elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.UMLSLBZda.T, Zn.Tb, Zm.TbUMLSLBZda.S, Zn.H, Zm.H[imm]UMLSLBZda.D, Zn.S, Zm.S[imm]BFMLSLbfmlslBFMLSLThis BFloat16 floating-point multiply-subtract long instruction widens all 16-bit BFloat16 elements in the one, two, or four first source vectors and the indexed element of the second source vector to single-precision format, then multiplies the corresponding elements and destructively subtracts these values without intermediate rounding from the overlapping 32-bit single-precision elements of the ZA double-vector groups.0BFMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CBFMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CBFMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])BFMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<BFMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<BFMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGBFMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GBFMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }RDFFRSrdffrsRDFFRSRead the first-fault register (RDFFRSPd.B, Pg/ZFRECPXfrecpxFRECPX+Floating-point reciprocal exponent (scalar) FRECPXHd, Hn FRECPXVd, VnFRECPXZd.T, Pg/M, Zn.TFSCALEfscaleFSCALE(Floating-point adjust exponent by vectorFSCALEVd.T, Vn.T, Vm.TFSCALEVd.T, Vn.T, Vm.T0FSCALE{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T0FSCALE{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T;FSCALE{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T };FSCALE{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }FSCALEZdn.T, Pg/M, Zdn.T, Zm.TSSHLLTsshlltSSHLLT0Shift left by immediate each odd-numbered signed element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.SSHLLTZd.T, Zn.Tb, #constSADDWTsaddwtSADDWTAdd the odd-numbered signed elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.SADDWTZd.T, Zn.T, Zm.TbGCSPUSHXgcspushxGCSPUSHXGCSPUSHX -- A642Guarded Control Stack push exception return recordGCSPUSHXSYS #0, C7, C7, #4{, Xt} PACIBSPPC pacibsppc PACIBSPPC;Pointer Authentication Code for return address, using key B PACIBSPPCSABALBsabalbSABALB!Compute the absolute difference between even-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.SABALBZda.T, Zn.Tb, Zm.TbSQDECPsqdecpSQDECP Counts the number of true elements in the source predicate and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.SQDECPXdn, Pm.T, WdnSQDECPXdn, Pm.TSQDECPZdn.T, Pm.T SQRSHRUNT sqrshrunt SQRSHRUNT=Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2SQRSHRUNTZd.T, Zn.Tb, #constSTXRHstxrhSTXRH!Store exclusive register halfwordSTXRHWs, Wt, [Xn|SP{, #0}]PACIBpacibPACIB@Pointer Authentication Code for instruction address, using key BPACIBXd, Xn|SPPACIZBXd PACIB1716PACIBSPPACIBZUADDLTuaddltUADDLTAdd the corresponding odd-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.UADDLTZd.T, Zn.Tb, Zm.TbUQDECDuqdecdUQDECD*Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQDECDWdn{, pattern{, MUL #imm}} UQDECDXdn{, pattern{, MUL #imm}}"UQDECDZdn.D{, pattern{, MUL #imm}}PRFBprfbPRFBGather prefetch of bytes from the active memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive addresses are not prefetched from memory.PRFBprfop, Pg, [Zn.S{, #imm}]PRFBprfop, Pg, [Zn.D{, #imm}]&PRFBprfop, Pg, [Xn|SP{, #imm, MUL VL}]PRFBprfop, Pg, [Xn|SP, Xm]!PRFBprfop, Pg, [Xn|SP, Zm.S, mod]!PRFBprfop, Pg, [Xn|SP, Zm.D, mod]PRFBprfop, Pg, [Xn|SP, Zm.D]LSRlsrLSRSShift right by immediate, inserting zeroes, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.LSRZdn.T, Pg/M, Zdn.T, #constLSRZdn.T, Pg/M, Zdn.T, Zm.DLSRZdn.T, Pg/M, Zdn.T, Zm.TLSRZd.T, Zn.T, #constLSRZd.T, Zn.T, Zm.DLSR (register) -- A64Logical shift right (register)LSR Wd, Wn, WmLSRV Wd, Wn, WmLSR Xd, Xn, XmLSRV Xd, Xn, XmLSR (immediate) -- A64Logical shift right (immediate)LSR Wd, Wn, #shiftUBFM Wd, Wn, #shift, #31LSR Xd, Xn, #shiftUBFM Xd, Xn, #shift, #63LDSMAXHldsmaxhLDSMAXH;Atomic signed maximum on halfword in memory, without return+STSMAXHWs, [Xn|SP]LDSMAXH Ws, WZR, [Xn|SP]-STSMAXLHWs, [Xn|SP]LDSMAXLH Ws, WZR, [Xn|SP]UMSUBLumsublUMSUBLUnsigned multiply-subtract longUMSUBLXd, Wn, Wm, XaLDAPURldapurLDAPUR8Load-acquire RCpc SIMD&FP register (unscaled offset)LDAPURBt, [Xn|SP{, #simm}]LDAPURHt, [Xn|SP{, #simm}]LDAPURSt, [Xn|SP{, #simm}]LDAPURDt, [Xn|SP{, #simm}]LDAPURQt, [Xn|SP{, #simm}]LDAPURWt, [Xn|SP{, #simm}]LDAPURXt, [Xn|SP{, #simm}]FMLSfmlsFMLSDFloating-point fused multiply-subtract from accumulator (by element)FMLSHd, Hn, Vm.H[index]FMLSVd, Vn, Vm.Ts[index]FMLSVd.T, Vn.T, Vm.H[index]FMLSVd.T, Vn.T, Vm.Ts[index]FMLSVd.T, Vn.T, Vm.TFMLSVd.T, Vn.T, Vm.TFMLSZda.T, Pg/M, Zn.T, Zm.TFMLSZda.H, Zn.H, Zm.H[imm]FMLSZda.S, Zn.S, Zm.S[imm]FMLSZda.D, Zn.D, Zm.D[imm]<FMLS ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<FMLS ZA.S[Wv, offs{, VGx2}], { Zn1.S-Zn2.S }, Zm.S[index]<FMLS ZA.D[Wv, offs{, VGx2}], { Zn1.D-Zn2.D }, Zm.D[index]<FMLS ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]<FMLS ZA.S[Wv, offs{, VGx4}], { Zn1.S-Zn4.S }, Zm.S[index]<FMLS ZA.D[Wv, offs{, VGx4}], { Zn1.D-Zn4.D }, Zm.D[index]5FMLS ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, Zm.T5FMLS ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5FMLS ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, Zm.T5FMLS ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@FMLS ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, { Zm1.T-Zm2.T }@FMLS ZA.H[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@FMLS ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, { Zm1.T-Zm4.T }@FMLS ZA.H[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }UQINCWuqincwUQINCW*Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQINCWWdn{, pattern{, MUL #imm}} UQINCWXdn{, pattern{, MUL #imm}}"UQINCWZdn.S{, pattern{, MUL #imm}}LD4Rld4rLD4RLLoad single 4-element structure and replicate to all lanes of four registers+LD4R {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP]0LD4R {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], imm/LD4R {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], Xm SQDMLALBT sqdmlalbt SQDMLALBTMultiply then double the corresponding even-numbered signed elements of the first and odd-numbered signed elements of the second source vector. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2SQDMLALBTZda.T, Zn.Tb, Zm.TbUQINCBuqincbUQINCB)Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQINCBWdn{, pattern{, MUL #imm}} UQINCBXdn{, pattern{, MUL #imm}}SUBsubSUBSubtract (extended register))SUBWd|WSP, Wn|WSP, Wm{, extend {#amount}}'SUBXd|SP, Xn|SP, Rm{, extend {#amount}} SUBWd|WSP, Wn|WSP, #imm{, shift}SUBXd|SP, Xn|SP, #imm{, shift}SUBWd, Wn, Wm{, shift #amount}SUBXd, Xn, Xm{, shift #amount}SUB Dd, Dn, DmSUBVd.T, Vn.T, Vm.TSUBZdn.T, Pg/M, Zdn.T, Zm.TSUBZdn.T, Zdn.T, #imm{, shift}SUBZd.T, Zn.T, Zm.T/SUB ZA.T[Wv, offs{, VGx2}], { Zm1.T-Zm2.T }/SUB ZA.T[Wv, offs{, VGx4}], { Zm1.T-Zm4.T }5SUB ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, Zm.T5SUB ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, Zm.T@SUB ZA.T[Wv, offs{, VGx2}], { Zn1.T-Zn2.T }, { Zm1.T-Zm2.T }@SUB ZA.T[Wv, offs{, VGx4}], { Zn1.T-Zn4.T }, { Zm1.T-Zm4.T }FMOVfmovFMOV&Floating-point move immediate (vector)FMOVVd.T, #immFMOVVd.T, #immFMOVVd.2D, #imm FMOVHd, Hn FMOVSd, Sn FMOVDd, Dn FMOVWd, Hn FMOVXd, Hn FMOVHd, Wn FMOVSd, Wn FMOVWd, Sn FMOVHd, Xn FMOVDd, XnFMOVVd.D[1], Xn FMOVXd, DnFMOVXd, Vn.D[1] FMOVHd, #imm FMOVSd, #imm FMOVDd, #immFMOV (zero, predicated)Move floating-point constant +0.0 to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FMOV Zd.T, Pg/M, #0.0CPY Zd.T, Pg/M, #0FMOV (zero, unpredicated)Unconditionally broadcast the floating-point constant +0.0 into each element of the destination vector. This instruction is unpredicated.FMOV Zd.T, #0.0 DUP Zd.T, #0FMOV (immediate, predicated)Move a floating-point immediate into each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FMOV Zd.T, Pg/M, #constFCPY Zd.T, Pg/M, #constFMOV (immediate, unpredicated)Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is unpredicated.FMOV Zd.T, #constFDUP Zd.T, #constLD1RSWld1rswLD1RSWLoad a single signed word from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 4 in the range 0 to 252.%LD1RSW{ Zt.D }, Pg/Z, [Xn|SP{, #imm}]LDNF1Dldnf1dLDNF1DContiguous load with non-faulting behavior of doublewords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.-LDNF1D{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]USUBWBusubwbUSUBWB Subtract the even-numbered unsigned elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.USUBWBZd.T, Zn.T, Zm.TbXAFLAGxaflagXAFLAGIConvert floating-point condition flags from external format to Arm formatXAFLAGLDADDAHldaddahLDADDAH Atomic add on halfword in memoryLDADDAHWs, Wt, [Xn|SP]LDADDALHWs, Wt, [Xn|SP]LDADDHWs, Wt, [Xn|SP]LDADDLHWs, Wt, [Xn|SP]SMULLsmullSMULL)Signed multiply long (vector, by element)$SMULL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]SMULL{2} Vd.Ta, Vn.Tb, Vm.Tb SMULL -- A64Signed multiply longSMULL Xd, Wn, WmSMADDL Xd, Wn, Wm, XZR PACIB171615 pacib171615 PACIB171615@Pointer Authentication Code for instruction address, using key B PACIB171615 RETAASPPC retaasppc RETAASPPC]Return from subroutine, with enhanced pointer authentication return using an immediate offsetRETAASPPClabelRETABSPPClabelSUBPsubpSUBPSubtract pointerSUBPXd, Xn|SP, Xm|SPUSMOPAusmopaUSMOPA>The 8-bit integer variant works with a 32-bit element ZA tile.$USMOPAZAda.S, Pn/M, Pm/M, Zn.B, Zm.B$USMOPAZAda.D, Pn/M, Pm/M, Zn.H, Zm.HUMOPAumopaUMOPA5This instruction works with a 32-bit element ZA tile.#UMOPAZAda.S, Pn/M, Pm/M, Zn.H, Zm.H#UMOPAZAda.S, Pn/M, Pm/M, Zn.B, Zm.B#UMOPAZAda.D, Pn/M, Pm/M, Zn.H, Zm.HSQRSHRsqrshrSQRSHRShift right by an immediate value, the signed integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's signed integer range -2#SQRSHRZd.H, { Zn1.S-Zn2.S }, #const%SQRSHRZd.T, { Zn1.Tb-Zn4.Tb }, #constLDNF1Wldnf1wLDNF1WContiguous load with non-faulting behavior of unsigned words to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.-LDNF1W{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]-LDNF1W{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]XARxarXARExclusive-OR and rotateXARVd.2D, Vn.2D, Vm.2D, #imm6XARZdn.T, Zdn.T, Zm.T, #constUMAXumaxUMAXUnsigned maximum (vector) UMAXVd.T, Vn.T, Vm.TUMAXWd, Wn, #uimmUMAXXd, Xn, #uimm.UMAX{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, Zm.T.UMAX{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, Zm.T9UMAX{ Zdn1.T-Zdn2.T }, { Zdn1.T-Zdn2.T }, { Zm1.T-Zm2.T }9UMAX{ Zdn1.T-Zdn4.T }, { Zdn1.T-Zdn4.T }, { Zm1.T-Zm4.T }UMAXWd, Wn, WmUMAXXd, Xn, XmUMAXZdn.T, Pg/M, Zdn.T, Zm.TUMAXZdn.T, Zdn.T, #immUQRSHRNBuqrshrnbUQRSHRNBCShift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2UQRSHRNBZd.T, Zn.Tb, #constLD64Bld64bLD64BSingle-copy atomic 64-byte LoadLD64BXt, [Xn|SP {, #0}]SBFMsbfmSBFMSigned bitfield moveSBFMWd, Wn, #immr, #immsSBFMXd, Xn, #immr, #immsWHILEHSwhilehsWHILEHSGenerate a predicate that starting from the highest numbered element is true while the decrementing value of the first, unsigned scalar operand is higher or same as the second scalar operand and false thereafter down to the lowest numbered element.WHILEHSPd.T, Rn, RmWHILEHSPNd.T, Xn, Xm, vlWHILEHS{ Pd1.T, Pd2.T }, Xn, XmLDFF1Wldff1wLDFF1WYGather load with first-faulting behavior of unsigned words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector. $LDFF1W{ Zt.S }, Pg/Z, [Zn.S{, #imm}]$LDFF1W{ Zt.D }, Pg/Z, [Zn.D{, #imm}]+LDFF1W{ Zt.S }, Pg/Z, [Xn|SP{, Xm, LSL #2}]+LDFF1W{ Zt.D }, Pg/Z, [Xn|SP{, Xm, LSL #2}]+LDFF1W{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod #2]+LDFF1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #2](LDFF1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod](LDFF1W{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod]+LDFF1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #2]#LDFF1W{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]UZP2uzp2UZP2Unzip vectors (secondary)UZP2Vd.T, Vn.T, Vm.TFMADfmadFMADSMultiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.FMADZdn.T, Pg/M, Zm.T, Za.TUQXTNuqxtnUQXTN"Unsigned saturating extract narrow UQXTNVbd, VanUQXTN{2} Vd.Tb, Vn.TaREVrevREV Reverse bytes REVWd, Wn REVXd, Xn REVPd.T, Pn.T REVZd.T, Zn.TICicIC IC -- A64Instruction cache operationIC ic_op{, Xt}SYS #op1, C7, Cm, #op2{, Xt}TSBtsbTSBTrace synchronization barrier TSB CSYNCSDOTsdotSDOT2Dot product signed arithmetic (vector, by element)SDOTVd.Ta, Vn.Tb, Vm.4B[index]SDOTVd.Ta, Vn.Tb, Vm.TbSDOTZda.S, Zn.H, Zm.HSDOTZda.S, Zn.H, Zm.H[imm]SDOTZda.T, Zn.Tb, Zm.TbSDOTZda.S, Zn.B, Zm.B[imm]SDOTZda.D, Zn.H, Zm.H[imm]<SDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<SDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]5SDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5SDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@SDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@SDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }<SDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]<SDOT ZA.D[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<SDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]<SDOT ZA.D[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]8SDOT ZA.T[Wv, offs{, VGx2}], { Zn1.Tb-Zn2.Tb }, Zm.Tb8SDOT ZA.T[Wv, offs{, VGx4}], { Zn1.Tb-Zn4.Tb }, Zm.TbDSDOT ZA.T[Wv, offs{, VGx2}], { Zn1.Tb-Zn2.Tb }, { Zm1.Tb-Zm2.Tb }DSDOT ZA.T[Wv, offs{, VGx4}], { Zn1.Tb-Zn4.Tb }, { Zm1.Tb-Zm4.Tb }BRKASbrkasBRKASSets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKASPd.B, Pg/Z, Pn.BFCVTXNfcvtxnFCVTXNJFloating-point convert to lower precision narrow, rounding to odd (vector)FCVTXN Sd, DnFCVTXN{2} Vd.Tb, Vn.2DBFDOTbfdotBFDOT8BFloat16 floating-point dot product (vector, by element) BFDOTVd.Ta, Vn.Tb, Vm.2H[index]BFDOTVd.Ta, Vn.Tb, Vm.TbBFDOTZda.S, Zn.H, Zm.HBFDOTZda.S, Zn.H, Zm.H[imm]<BFDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<BFDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]5BFDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5BFDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@BFDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@BFDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }BSLbslBSLBitwise selectBSLVd.T, Vn.T, Vm.TBSLZdn.D, Zdn.D, Zm.D, Zk.DFCMGEfcmgeFCMGE5Floating-point compare greater than or equal (vector)FCMGEHd, Hn, HmFCMGEVd, Vn, VmFCMGEVd.T, Vn.T, Vm.TFCMGEVd.T, Vn.T, Vm.TFCMGEHd, Hn, #0.0FCMGEVd, Vn, #0.0FCMGEVd.T, Vn.T, #0.0FCMGEVd.T, Vn.T, #0.0LDNT1Hldnt1hLDNT1HContiguous load non-temporal of halfwords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. 5LDNT1H{ Zt1.H-Zt2.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]5LDNT1H{ Zt1.H-Zt4.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]1LDNT1H{ Zt1.H-Zt2.H }, PNg/Z, [Xn|SP, Xm, LSL #1]1LDNT1H{ Zt1.H-Zt4.H }, PNg/Z, [Xn|SP, Xm, LSL #1]6LDNT1H{ Zt1.H, Zt2.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]DLDNT1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg/Z, [Xn|SP{, #imm, MUL VL}]2LDNT1H{ Zt1.H, Zt2.H }, PNg/Z, [Xn|SP, Xm, LSL #1]@LDNT1H{ Zt1.H, Zt2.H, Zt3.H, Zt4.H }, PNg/Z, [Xn|SP, Xm, LSL #1]"LDNT1H{ Zt.S }, Pg/Z, [Zn.S{, Xm}]"LDNT1H{ Zt.D }, Pg/Z, [Zn.D{, Xm}]-LDNT1H{ Zt.H }, Pg/Z, [Xn|SP{, #imm, MUL VL}])LDNT1H{ Zt.H }, Pg/Z, [Xn|SP, Xm, LSL #1]TBNZtbnzTBNZTest bit and branch if nonzeroTBNZRt, #imm, labelFADDPfaddpFADDP,Floating-point add pair of elements (scalar)FADDP Hd, Vn.2H FADDPVd, Vn.TFADDPVd.T, Vn.T, Vm.TFADDPVd.T, Vn.T, Vm.TFADDPZdn.T, Pg/M, Zdn.T, Zm.TFCMEQfcmeqFCMEQ%Floating-point compare equal (vector)FCMEQHd, Hn, HmFCMEQVd, Vn, VmFCMEQVd.T, Vn.T, Vm.TFCMEQVd.T, Vn.T, Vm.TFCMEQHd, Hn, #0.0FCMEQVd, Vn, #0.0FCMEQVd.T, Vn.T, #0.0FCMEQVd.T, Vn.T, #0.0FCMEQPd.T, Pg/Z, Zn.T, #0.0FCMGTPd.T, Pg/Z, Zn.T, #0.0FCMGEPd.T, Pg/Z, Zn.T, #0.0FCMLTPd.T, Pg/Z, Zn.T, #0.0FCMLEPd.T, Pg/Z, Zn.T, #0.0FCMNEPd.T, Pg/Z, Zn.T, #0.0FCMEQPd.T, Pg/Z, Zn.T, Zm.TFCMGTPd.T, Pg/Z, Zn.T, Zm.TFCMGEPd.T, Pg/Z, Zn.T, Zm.TFCMNEPd.T, Pg/Z, Zn.T, Zm.TFCMUOPd.T, Pg/Z, Zn.T, Zm.TSSHRsshrSSHRSigned shift right (immediate)SSHR Dd, Dn, #shiftSSHRVd.T, Vn.T, #shiftSTLRstlrSTLRStore-release registerSTLRWt, [Xn|SP{, #0}]STLRXt, [Xn|SP{, #0}]STLRWt, [Xn|SP, #-4]!STLRXt, [Xn|SP, #-8]!RORrorRORROR (immediate) -- A64Rotate right (immediate)ROR Wd, Ws, #shiftEXTR Wd, Ws, Ws, #shiftROR Xd, Xs, #shiftEXTR Xd, Xs, Xs, #shiftROR (register) -- A64Rotate right (register)ROR Wd, Wn, WmRORV Wd, Wn, WmROR Xd, Xn, XmRORV Xd, Xn, XmCFPcfpCFP CFP -- A64.Control flow prediction restriction by context CFP RCTX, XtSYS #3, C7, C3, #4, XtEXTextEXT#Extract vector from pair of vectorsEXTVd.T, Vn.T, Vm.T, #indexEXTZd.B, { Zn1.B, Zn2.B }, #immEXTZdn.B, Zdn.B, Zm.B, #immSTURHsturhSTURH"Store register halfword (unscaled)STURHWt, [Xn|SP{, #simm}]AESMCaesmcAESMCAES mix columnsAESMCVd.16B, Vn.16BAESMCZdn.B, Zdn.BBFMLALBbfmlalbBFMLALBThis BFloat16 floating-point multiply-add long instruction widens the even-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLALBZda.S, Zn.H, Zm.HBFMLALBZda.S, Zn.H, Zm.H[imm]LDSMINldsminLDSMIN5Atomic signed minimum on word or doubleword in memory LDSMINWs, Wt, [Xn|SP]LDSMINAWs, Wt, [Xn|SP]LDSMINALWs, Wt, [Xn|SP]LDSMINLWs, Wt, [Xn|SP]LDSMINXs, Xt, [Xn|SP]LDSMINAXs, Xt, [Xn|SP]LDSMINALXs, Xt, [Xn|SP]LDSMINLXs, Xt, [Xn|SP])STSMINWs, [Xn|SP]LDSMIN Ws, WZR, [Xn|SP]+STSMINLWs, [Xn|SP]LDSMINL Ws, WZR, [Xn|SP])STSMINXs, [Xn|SP]LDSMIN Xs, XZR, [Xn|SP]+STSMINLXs, [Xn|SP]LDSMINL Xs, XZR, [Xn|SP]STRBstrbSTRBStore register byte (immediate)STRBWt, [Xn|SP], #simmSTRBWt, [Xn|SP, #simm]!STRBWt, [Xn|SP{, #pimm}])STRBWt, [Xn|SP, (Wm|Xm), extend {amount}]!STRBWt, [Xn|SP, Xm{, LSL amount}]SMADDLsmaddlSMADDLSigned multiply-add longSMADDLXd, Wn, Wm, XaLD3Qld3qLD3Q5Contiguous load three-quadword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,:LD3Q{ Zt1.Q, Zt2.Q, Zt3.Q }, Pg/Z, [Xn|SP{, #imm, MUL VL}]6LD3Q{ Zt1.Q, Zt2.Q, Zt3.Q }, Pg/Z, [Xn|SP, Xm, LSL #4]USUBWusubwUSUBWUnsigned subtract wideUSUBW{2} Vd.Ta, Vn.Ta, Vm.TbCRC32CBcrc32cbCRC32CBCRC32C checksumCRC32CBWd, Wn, WmCRC32CHWd, Wn, WmCRC32CWWd, Wn, WmCRC32CXWd, Wn, XmSBFXsbfxSBFX SBFX -- A64Signed bitfield extractSBFX Wd, Wn, #lsb, #width"SBFM Wd, Wn, #lsb, #(lsb+width-1)SBFX Xd, Xn, #lsb, #width"SBFM Xd, Xn, #lsb, #(lsb+width-1)FCMPEfcmpeFCMPE)Floating-point signaling compare (scalar) FCMPEHn, Hm FCMPEHn, #0.0 FCMPESn, Sm FCMPESn, #0.0 FCMPEDn, Dm FCMPEDn, #0.0CASPcaspCASP7Compare and swap pair of words or doublewords in memory)CASPWs, W(s+1), Wt, W(t+1), [Xn|SP{, #0}]*CASPAWs, W(s+1), Wt, W(t+1), [Xn|SP{, #0}]+CASPALWs, W(s+1), Wt, W(t+1), [Xn|SP{, #0}]*CASPLWs, W(s+1), Wt, W(t+1), [Xn|SP{, #0}])CASPXs, X(s+1), Xt, X(t+1), [Xn|SP{, #0}]*CASPAXs, X(s+1), Xt, X(t+1), [Xn|SP{, #0}]+CASPALXs, X(s+1), Xt, X(t+1), [Xn|SP{, #0}]*CASPLXs, X(s+1), Xt, X(t+1), [Xn|SP{, #0}]LUTI2luti2LUTI2$Lookup table read with 2-bit indices "LUTI2Vd.16B, { Vn.16B }, Vm[index] LUTI2Vd.8H, { Vn.8H }, Vm[index]$LUTI2{ Zd1.T-Zd2.T }, ZT0, Zn[index]%LUTI2{ Zd1.T, Zd2.T }, ZT0, Zn[index]$LUTI2{ Zd1.T-Zd4.T }, ZT0, Zn[index]3LUTI2{ Zd1.T, Zd2.T, Zd3.T, Zd4.T }, ZT0, Zn[index]LUTI2Zd.T, ZT0, Zn[index]LUTI2Zd.B, { Zn.B }, Zm[index]LUTI2Zd.H, { Zn.H }, Zm[index]FADDfaddFADDFloating-point add (vector) FADDVd.T, Vn.T, Vm.TFADDVd.T, Vn.T, Vm.TFADDHd, Hn, HmFADDSd, Sn, SmFADDDd, Dn, DmFADDZdn.T, Pg/M, Zdn.T, constFADDZdn.T, Pg/M, Zdn.T, Zm.TFADDZd.T, Zn.T, Zm.T/FADD ZA.T[Wv, offs{, VGx2}], { Zm1.T-Zm2.T }/FADD ZA.H[Wv, offs{, VGx2}], { Zm1.H-Zm2.H }/FADD ZA.T[Wv, offs{, VGx4}], { Zm1.T-Zm4.T }/FADD ZA.H[Wv, offs{, VGx4}], { Zm1.H-Zm4.H }SADDWsaddwSADDWSigned add wideSADDW{2} Vd.Ta, Vn.Ta, Vm.TbLDFF1Hldff1hLDFF1H\Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector. $LDFF1H{ Zt.S }, Pg/Z, [Zn.S{, #imm}]$LDFF1H{ Zt.D }, Pg/Z, [Zn.D{, #imm}]+LDFF1H{ Zt.H }, Pg/Z, [Xn|SP{, Xm, LSL #1}]+LDFF1H{ Zt.S }, Pg/Z, [Xn|SP{, Xm, LSL #1}]+LDFF1H{ Zt.D }, Pg/Z, [Xn|SP{, Xm, LSL #1}]+LDFF1H{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod #1]+LDFF1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #1](LDFF1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod](LDFF1H{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod]+LDFF1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #1]#LDFF1H{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]BRKPASbrkpasBRKPASaIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKPASPd.B, Pg/Z, Pn.B, Pm.BGCSSS1gcsss1GCSSS1 GCSSS1 -- A64$Guarded Control Stack switch stack 1 GCSSS1 XtSYS #3, C7, C7, #2, XtLD1RQDld1rqdLD1RQDLoad two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address.%LD1RQD{ Zt.D }, Pg/Z, [Xn|SP{, #imm}])LD1RQD{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #3]LDEORHldeorhLDEORH9Atomic exclusive-OR on halfword in memory, without return)STEORHWs, [Xn|SP]LDEORH Ws, WZR, [Xn|SP]+STEORLHWs, [Xn|SP]LDEORLH Ws, WZR, [Xn|SP]STLLRHstllrhSTLLRH!Store LORelease register halfwordSTLLRHWt, [Xn|SP{, #0}]CCMNccmnCCMN(Conditional compare negative (immediate)CCMNWn, #imm, #nzcv, condCCMNXn, #imm, #nzcv, condCCMNWn, Wm, #nzcv, condCCMNXn, Xm, #nzcv, condZIPzipZIPPlace the four-way interleaved elements from the four source vectors in the corresponding elements of the four destination vectors.#ZIP{ Zd1.T-Zd4.T }, { Zn1.T-Zn4.T }#ZIP{ Zd1.Q-Zd4.Q }, { Zn1.Q-Zn4.Q }ZIP{ Zd1.T-Zd2.T }, Zn.T, Zm.TZIP{ Zd1.Q-Zd2.Q }, Zn.Q, Zm.QSMLSLsmlslSMLSL2Signed multiply-subtract long (vector, by element) $SMLSL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]SMLSL{2} Vd.Ta, Vn.Tb, Vm.Tb0SMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H[index]CSMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CSMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index])SMLSL ZA.S[Wv, offs1:offs2], Zn.H, Zm.H<SMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, Zm.H<SMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, Zm.HGSMLSL ZA.S[Wv, offs1:offs2{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }GSMLSL ZA.S[Wv, offs1:offs2{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }LDRBldrbLDRBLoad register byte (immediate)LDRBWt, [Xn|SP], #simmLDRBWt, [Xn|SP, #simm]!LDRBWt, [Xn|SP{, #pimm}])LDRBWt, [Xn|SP, (Wm|Xm), extend {amount}]!LDRBWt, [Xn|SP, Xm{, LSL amount}]UQDECHuqdechUQDECH*Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range. UQDECHWdn{, pattern{, MUL #imm}} UQDECHXdn{, pattern{, MUL #imm}}"UQDECHZdn.H{, pattern{, MUL #imm}} SHA256SU1 sha256su1 SHA256SU1SHA256 schedule update 1SHA256SU1Vd.4S, Vn.4S, Vm.4SSHADDshaddSHADDSigned halving addSHADDVd.T, Vn.T, Vm.TSHADDZdn.T, Pg/M, Zdn.T, Zm.T AUTIASPPC autiasppc AUTIASPPCBAuthenticate return address using key A, using an immediate offsetAUTIASPPClabelORNSornsORNS+Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the ORNSPd.B, Pg/Z, Pn.B, Pm.BFCMLAfcmlaFCMLA7Floating-point complex multiply accumulate (by element)&FCMLAVd.T, Vn.T, Vm.Ts[index], #rotateFCMLAVd.T, Vn.T, Vm.T, #rotate#FCMLAZda.T, Pg/M, Zn.T, Zm.T, const"FCMLAZda.H, Zn.H, Zm.H[imm], const"FCMLAZda.S, Zn.S, Zm.S[imm], constSHA1SU0sha1su0SHA1SU0SHA1 schedule update 0SHA1SU0Vd.4S, Vn.4S, Vm.4SRDVLrdvlRDVLMultiply the current vector register size in bytes by an immediate in the range -32 to 31 and place the result in the 64-bit destination general-purpose register. RDVLXd, #immUSDOTusdotUSDOTBDot product with unsigned and signed integers (vector, by element) USDOTVd.Ta, Vn.Tb, Vm.4B[index]USDOTVd.Ta, Vn.Tb, Vm.TbUSDOTZda.S, Zn.B, Zm.BUSDOTZda.S, Zn.B, Zm.B[imm]<USDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]<USDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]5USDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B5USDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B@USDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, { Zm1.B-Zm2.B }@USDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, { Zm1.B-Zm4.B }WHILEGTwhilegtWHILEGTGenerate a predicate that starting from the highest numbered element is true while the decrementing value of the first, signed scalar operand is greater than the second scalar operand and false thereafter down to the lowest numbered element.WHILEGTPd.T, Rn, RmWHILEGTPNd.T, Xn, Xm, vlWHILEGT{ Pd1.T, Pd2.T }, Xn, XmFMULXfmulxFMULX-Floating-point multiply extended (by element) FMULXHd, Hn, Vm.H[index]FMULXVd, Vn, Vm.Ts[index]FMULXVd.T, Vn.T, Vm.H[index]FMULXVd.T, Vn.T, Vm.Ts[index]FMULXHd, Hn, HmFMULXVd, Vn, VmFMULXVd.T, Vn.T, Vm.TFMULXVd.T, Vn.T, Vm.TFMULXZdn.T, Pg/M, Zdn.T, Zm.TDCdcDC DC -- A64Data cache operation DC dc_op, XtSYS #op1, C7, Cm, #op2, XtBF1CVTLTbf1cvtltBF1CVTLTConvert each odd-numbered 8-bit floating-point element of the source vector to BFloat16 while downscaling the value, and place the results in the overlapping 16-bit elements of the destination vector. BF1CVTLT scales the values by 2BF1CVTLTZd.H, Zn.BBF2CVTLTZd.H, Zn.BWHILELOwhileloWHILELOGenerate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, unsigned scalar operand is lower than the second scalar operand and false thereafter up to the highest numbered element.WHILELOPd.T, Rn, RmWHILELOPNd.T, Xn, Xm, vlWHILELO{ Pd1.T, Pd2.T }, Xn, XmSQXTUNTsqxtuntSQXTUNTSaturate the signed integer value in each source element to an unsigned integer value that is half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.SQXTUNTZd.T, Zn.TbFCVTNBfcvtnbFCVTNB}Convert each single-precision element of the group of two source vectors to 8-bit floating-point while scaling the value by 2FCVTNBZd.B, { Zn1.S-Zn2.S }SQRSHRUNsqrshrunSQRSHRUNASigned saturating rounded shift right unsigned narrow (immediate)SQRSHRUNVbd, Van, #shift!SQRSHRUN{2} Vd.Tb, Vn.Ta, #shift%SQRSHRUNZd.H, { Zn1.S-Zn2.S }, #const'SQRSHRUNZd.T, { Zn1.Tb-Zn4.Tb }, #constEORBTeorbtEORBT?Interleaving exclusive OR between the even-numbered elements of the first source vector register and the odd-numbered elements of the second source vector register, placing the result in the even-numbered elements of the destination vector, leaving the odd-numbered elements unchanged. This instruction is unpredicated.EORBTZd.T, Zn.T, Zm.TSHA256H2sha256h2SHA256H2SHA256 hash update (part 2)SHA256H2Qd, Qn, Vm.4S SHA512SU0 sha512su0 SHA512SU0SHA512 schedule update 0SHA512SU0Vd.2D, Vn.2DUSQADDusqaddUSQADD.Unsigned saturating accumulate of signed value USQADDVd, VnUSQADDVd.T, Vn.TUSQADDZdn.T, Pg/M, Zdn.T, Zm.TMSUBPTmsubptMSUBPT!Multiply-subtract checked pointerMSUBPTXd, Xn, Xm, XaCPYPRNcpyprnCPYPRNMemory copy, reads non-temporalCPYPRN [Xd]!, [Xs]!, Xn!CPYMRN [Xd]!, [Xs]!, Xn!CPYERN [Xd]!, [Xs]!, Xn!CPYPWTNcpypwtnCPYPWTN?Memory copy, writes unprivileged, reads and writes non-temporalCPYPWTN [Xd]!, [Xs]!, Xn!CPYMWTN [Xd]!, [Xs]!, Xn!CPYEWTN [Xd]!, [Xs]!, Xn!LASTAlastaLASTA.If there is an active element then extract the element after the last active element modulo the number of elements from the final source vector register. If there are no active elements, extract element zero. Then zero-extend and place the extracted element in the destination general-purpose register.LASTARd, Pg, Zn.TLASTAVd, Pg, Zn.TRAX1rax1RAX1Rotate and exclusive-ORRAX1Vd.2D, Vn.2D, Vm.2DRAX1Zd.D, Zn.D, Zm.DRETretRETReturn from subroutine RET {Xn}SHA512H2sha512h2SHA512H2SHA512 hash update part 2SHA512H2Qd, Qn, Vm.2DFMMLAfmmlaFMMLAThe floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2 matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the Effective SVE vector length is at least 256 bits.FMMLAZda.S, Zn.S, Zm.SFMMLAZda.D, Zn.D, Zm.DSUDOTsudotSUDOTBDot product with signed and unsigned integers (vector, by element)SUDOTVd.Ta, Vn.Tb, Vm.4B[index]SUDOTZda.S, Zn.B, Zm.B[imm]<SUDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]<SUDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]5SUDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B5SUDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.BLD1SHld1shLD1SH=Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector. #LD1SH{ Zt.S }, Pg/Z, [Zn.S{, #imm}]#LD1SH{ Zt.D }, Pg/Z, [Zn.D{, #imm}],LD1SH{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}],LD1SH{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}](LD1SH{ Zt.S }, Pg/Z, [Xn|SP, Xm, LSL #1](LD1SH{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #1]*LD1SH{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod #1]*LD1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #1]'LD1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod]'LD1SH{ Zt.S }, Pg/Z, [Xn|SP, Zm.S, mod]*LD1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #1]"LD1SH{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]F1CVTLTf1cvtltF1CVTLTConvert each odd-numbered 8-bit floating-point element of the source vector to half-precision while downscaling the value, and place the results in the overlapping 16-bit elements of the destination vector. F1CVTLT scales the values by 2F1CVTLTZd.H, Zn.BF2CVTLTZd.H, Zn.BLD1RSHld1rshLD1RSHLoad a single signed halfword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 2 in the range 0 to 126.%LD1RSH{ Zt.S }, Pg/Z, [Xn|SP{, #imm}]%LD1RSH{ Zt.D }, Pg/Z, [Xn|SP{, #imm}]LD3Wld3wLD3W1Contiguous load three-word structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,:LD3W{ Zt1.S, Zt2.S, Zt3.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}]6LD3W{ Zt1.S, Zt2.S, Zt3.S }, Pg/Z, [Xn|SP, Xm, LSL #2]BRKPAbrkpaBRKPAyIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.BRKPAPd.B, Pg/Z, Pn.B, Pm.BFRSQRTEfrsqrteFRSQRTE.Floating-point reciprocal square root estimate FRSQRTEHd, Hn FRSQRTEVd, VnFRSQRTEVd.T, Vn.TFRSQRTEVd.T, Vn.TFRSQRTEZd.T, Zn.TREVDrevdREVDReverse the order of 64-bit doublewords within each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.REVDZd.Q, Pg/M, Zn.QLDLARHldlarhLDLARH Load LOAcquire register halfwordLDLARHWt, [Xn|SP{, #0}]BSL2Nbsl2nBSL2NCSelects bits from the first source vector where the corresponding bit in the third source vector is '1', and from the inverted second source vector where the corresponding bit in the third source vector is '0'. The result is placed destructively in the destination and first source vector. This instruction is unpredicated.BSL2NZdn.D, Zdn.D, Zm.D, Zk.DST4Dst4dST4D6Contiguous store four-doubleword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,?ST4D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, Pg, [Xn|SP{, #imm, MUL VL}];ST4D{ Zt1.D, Zt2.D, Zt3.D, Zt4.D }, Pg, [Xn|SP, Xm, LSL #3]UABALBuabalbUABALBCompute the absolute difference between even-numbered unsigned elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.UABALBZda.T, Zn.Tb, Zm.TbLDSMAXldsmaxLDSMAX5Atomic signed maximum on word or doubleword in memory LDSMAXWs, Wt, [Xn|SP]LDSMAXAWs, Wt, [Xn|SP]LDSMAXALWs, Wt, [Xn|SP]LDSMAXLWs, Wt, [Xn|SP]LDSMAXXs, Xt, [Xn|SP]LDSMAXAXs, Xt, [Xn|SP]LDSMAXALXs, Xt, [Xn|SP]LDSMAXLXs, Xt, [Xn|SP])STSMAXWs, [Xn|SP]LDSMAX Ws, WZR, [Xn|SP]+STSMAXLWs, [Xn|SP]LDSMAXL Ws, WZR, [Xn|SP])STSMAXXs, [Xn|SP]LDSMAX Xs, XZR, [Xn|SP]+STSMAXLXs, [Xn|SP]LDSMAXL Xs, XZR, [Xn|SP]ADDHNTaddhntADDHNT*Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.ADDHNTZd.T, Zn.Tb, Zm.TbST2Wst2wST2W.Contiguous store two-word structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,1ST2W{ Zt1.S, Zt2.S }, Pg, [Xn|SP{, #imm, MUL VL}]-ST2W{ Zt1.S, Zt2.S }, Pg, [Xn|SP, Xm, LSL #2]LDAPRBldaprbLDAPRBLoad-acquire RCpc register byteLDAPRBWt, [Xn|SP {, #0}]LD1SWld1swLD1SW:Gather load of signed words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.#LD1SW{ Zt.D }, Pg/Z, [Zn.D{, #imm}],LD1SW{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}](LD1SW{ Zt.D }, Pg/Z, [Xn|SP, Xm, LSL #2]*LD1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod #2]'LD1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, mod]*LD1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D, LSL #2]"LD1SW{ Zt.D }, Pg/Z, [Xn|SP, Zm.D]ISBisbISB#Instruction synchronization barrierISB {option|#imm}LD2Qld2qLD2Q1Contiguous load two-quadword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,3LD2Q{ Zt1.Q, Zt2.Q }, Pg/Z, [Xn|SP{, #imm, MUL VL}]/LD2Q{ Zt1.Q, Zt2.Q }, Pg/Z, [Xn|SP, Xm, LSL #4]LDSETHldsethLDSETH4Atomic bit set on halfword in memory, without return)STSETHWs, [Xn|SP]LDSETH Ws, WZR, [Xn|SP]+STSETLHWs, [Xn|SP]LDSETLH Ws, WZR, [Xn|SP]SQDMLSLsqdmlslSQDMLSL>Signed saturating doubling multiply-subtract long (by element)SQDMLSLVad, Vbn, Vm.Ts[index]&SQDMLSL{2} Vd.Ta, Vn.Tb, Vm.Ts[index]SQDMLSLVad, Vbn, VbmSQDMLSL{2} Vd.Ta, Vn.Tb, Vm.TbUMLSLTumlsltUMLSLTMultiply the corresponding odd-numbered unsigned elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.UMLSLTZda.T, Zn.Tb, Zm.TbUMLSLTZda.S, Zn.H, Zm.H[imm]UMLSLTZda.D, Zn.S, Zm.S[imm]CNEGcnegCNEG CNEG -- A64Conditional negateCNEG Wd, Wn, invcondCSNEG Wd, Wn, Wm, condCNEG Xd, Xn, invcondCSNEG Xd, Xn, Xm, condLD1Rld1rLD1RNLoad one single-element structure and replicate to all lanes (of one register)LD1R {Vt.T }, [Xn|SP]LD1R {Vt.T }, [Xn|SP], immLD1R {Vt.T }, [Xn|SP], XmSMLALLsmlallSMLALLxThis signed integer multiply-add long-long instruction multiplies each signed 8-bit or 16-bit element in the one, two, or four first source vectors with each signed 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively adds these values to the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups. 0SMLALL ZA.S[Wv, offs1:offs4], Zn.B, Zm.B[index]0SMLALL ZA.D[Wv, offs1:offs4], Zn.H, Zm.H[index]CSMLALL ZA.S[Wv, offs1:offs4{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]CSMLALL ZA.D[Wv, offs1:offs4{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CSMLALL ZA.S[Wv, offs1:offs4{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]CSMLALL ZA.D[Wv, offs1:offs4{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]+SMLALL ZA.T[Wv, offs1:offs4], Zn.Tb, Zm.Tb?SMLALL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, Zm.Tb?SMLALL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, Zm.TbKSMLALL ZA.T[Wv, offs1:offs4{, VGx2}], { Zn1.Tb-Zn2.Tb }, { Zm1.Tb-Zm2.Tb }KSMLALL ZA.T[Wv, offs1:offs4{, VGx4}], { Zn1.Tb-Zn4.Tb }, { Zm1.Tb-Zm4.Tb }SUQADDsuqaddSUQADD.Signed saturating accumulate of unsigned value SUQADDVd, VnSUQADDVd.T, Vn.TSUQADDZdn.T, Pg/M, Zdn.T, Zm.TMULmulMULMultiply (vector, by element)MULVd.T, Vn.T, Vm.Ts[index]MULVd.T, Vn.T, Vm.TMULZdn.T, Pg/M, Zdn.T, Zm.TMULZdn.T, Zdn.T, #immMULZd.T, Zn.T, Zm.TMULZd.H, Zn.H, Zm.H[imm]MULZd.S, Zn.S, Zm.S[imm]MULZd.D, Zn.D, Zm.D[imm] MUL -- A64MultiplyMUL Wd, Wn, WmMADD Wd, Wn, Wm, WZRMUL Xd, Xn, XmMADD Xd, Xn, Xm, XZRAUTIBautibAUTIB-Authenticate instruction address, using key BAUTIBXd, Xn|SPAUTIZBXd AUTIB1716AUTIBSPAUTIBZASRVasrvASRVArithmetic shift right variableASRVWd, Wn, WmASRVXd, Xn, XmPACGApacgaPACGA.Pointer Authentication Code, using generic keyPACGAXd, Xn, Xm|SPUHSUBRuhsubrUHSUBR9Subtract active unsigned elements of the first source vector from corresponding unsigned elements of the second source vector, shift right one bit, and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.UHSUBRZdn.T, Pg/M, Zdn.T, Zm.TLDSETldsetLDSET.Atomic bit set on word or doubleword in memory LDSETWs, Wt, [Xn|SP]LDSETAWs, Wt, [Xn|SP]LDSETALWs, Wt, [Xn|SP]LDSETLWs, Wt, [Xn|SP]LDSETXs, Xt, [Xn|SP]LDSETAXs, Xt, [Xn|SP]LDSETALXs, Xt, [Xn|SP]LDSETLXs, Xt, [Xn|SP]'STSETWs, [Xn|SP]LDSET Ws, WZR, [Xn|SP])STSETLWs, [Xn|SP]LDSETL Ws, WZR, [Xn|SP]'STSETXs, [Xn|SP]LDSET Xs, XZR, [Xn|SP])STSETLXs, [Xn|SP]LDSETL Xs, XZR, [Xn|SP]BRbrBRBranch to registerBRXnST3Hst3hST3H6Contiguous store three-halfword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,8ST3H{ Zt1.H, Zt2.H, Zt3.H }, Pg, [Xn|SP{, #imm, MUL VL}]4ST3H{ Zt1.H, Zt2.H, Zt3.H }, Pg, [Xn|SP, Xm, LSL #1]MSUBmsubMSUBMultiply-subtractMSUBWd, Wn, Wm, WaMSUBXd, Xn, Xm, XaST4Bst4bST4B0Contiguous store four-byte structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,?ST4B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, Pg, [Xn|SP{, #imm, MUL VL}]3ST4B{ Zt1.B, Zt2.B, Zt3.B, Zt4.B }, Pg, [Xn|SP, Xm]ADDGaddgADDG Add with tag ADDGXd|SP, Xn|SP, #uimm6, #uimm4CMPLEcmpleCMPLECMPLE (vectors)WCompare active signed integer elements in the first source vector being less than or equal to corresponding signed elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the CMPLE Pd.T, Pg/Z, Zm.T, Zn.TCMPGE Pd.T, Pg/Z, Zn.T, Zm.TBFADDbfaddBFADDAdd active BFloat16 elements of the second source vector to corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.BFADDZdn.H, Pg/M, Zdn.H, Zm.HBFADDZd.H, Zn.H, Zm.H/BFADD ZA.H[Wv, offs{, VGx2}], { Zm1.H-Zm2.H }/BFADD ZA.H[Wv, offs{, VGx4}], { Zm1.H-Zm4.H }GCSPOPMgcspopmGCSPOPMGCSPOPM -- A64Guarded Control Stack popGCSPOPM {Xt}SYSL Xt, #3, C7, C7, #1EORVeorvEORVBitwise exclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as zero.EORVVd, Pg, Zn.TSQRSHRNBsqrshrnbSQRSHRNB:Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's signed integer range -2SQRSHRNBZd.T, Zn.Tb, #constSUMOPSsumopsSUMOPS>The 8-bit integer variant works with a 32-bit element ZA tile.$SUMOPSZAda.S, Pn/M, Pm/M, Zn.B, Zm.B$SUMOPSZAda.D, Pn/M, Pm/M, Zn.H, Zm.HFMAXNMVfmaxnmvFMAXNMV+Floating-point maximum number across vectorFMAXNMVVd, Vn.TFMAXNMV Sd, Vn.4SFMAXNMVVd, Pg, Zn.TSETF8setf8SETF8)Evaluation of 8-bit or 16-bit flag valuesSETF8WnSETF16WnLDAP1ldap1LDAP1JLoad-acquire RCpc one single-element structure to one lane of one registerLDAP1 {Vt.D }[index], [Xn|SP]RBITrbitRBITReverse bit order (vector)RBITVd.T, Vn.T RBITWd, Wn RBITXd, XnRBITZd.T, Pg/M, Zn.TSTLXRHstlxrhSTLXRH)Store-release exclusive register halfwordSTLXRHWs, Wt, [Xn|SP{, #0}]ADCLTadcltADCLTSAdd the odd-numbered elements of the first source vector and the 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector to the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.ADCLTZda.T, Zn.T, Zm.TUQXTNBuqxtnbUQXTNBSaturate the unsigned integer value in each source element to half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.UQXTNBZd.T, Zn.TbPMULLpmullPMULLPolynomial multiply longPMULL{2} Vd.Ta, Vn.Tb, Vm.TbSUBGsubgSUBGSubtract with tag SUBGXd|SP, Xn|SP, #uimm6, #uimm4LDRAAldraaLDRAA*Load register, with pointer authenticationLDRAAXt, [Xn|SP{, #simm}]LDRAAXt, [Xn|SP{, #simm}]!LDRABXt, [Xn|SP{, #simm}]LDRABXt, [Xn|SP{, #simm}]!CPYFPTRNcpyfptrnCPYFPTRNKMemory copy forward-only, reads and writes unprivileged, reads non-temporalCPYFPTRN [Xd]!, [Xs]!, Xn!CPYFMTRN [Xd]!, [Xs]!, Xn!CPYFETRN [Xd]!, [Xs]!, Xn!LD1Qld1qLD1QGather load of quadwords to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector. LD1Q{ Zt.Q }, Pg/Z, [Zn.D{, Xm}]6LD1Q{ ZAtHV.Q[Ws, offs] }, Pg/Z, [Xn|SP{, Xm, LSL #4}]EXTRextrEXTRExtract registerEXTRWd, Wn, Wm, #lsbEXTRXd, Xn, Xm, #lsbSHA1Msha1mSHA1MSHA1 hash update (majority)SHA1MQd, Sn, Vm.4SCMEQcmeqCMEQCompare bitwise equal (vector)CMEQ Dd, Dn, DmCMEQVd.T, Vn.T, Vm.TCMEQ Dd, Dn, #0CMEQVd.T, Vn.T, #0BRKBbrkbBRKBVSets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.BRKBPd.B, Pg/ZM, Pn.BFNMLAfnmlaFNMLA\Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.FNMLAZda.T, Pg/M, Zn.T, Zm.TSHA1SU1sha1su1SHA1SU1SHA1 schedule update 1SHA1SU1Vd.4S, Vn.4SFCVTLTfcvtltFCVTLT Convert odd-numbered floating-point elements from the source vector to the next higher precision, and place the results in the active overlapping double-width elements of the destination vector. Inactive elements in the destination vector register remain unmodified.FCVTLTZd.S, Pg/M, Zn.HFCVTLTZd.D, Pg/M, Zn.SSMAXVsmaxvSMAXVSigned maximum across vector SMAXVVd, Vn.TSMAXVVd, Pg, Zn.TFRINTZfrintzFRINTZ6Floating-point round to integral, toward zero (vector)FRINTZVd.T, Vn.TFRINTZVd.T, Vn.T FRINTZHd, Hn FRINTZSd, Sn FRINTZDd, DnRADDHNBraddhnbRADDHNB6Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant rounded half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.RADDHNBZd.T, Zn.Tb, Zm.TbLDTRSWldtrswLDTRSW(Load register signed word (unprivileged)LDTRSWXt, [Xn|SP{, #simm}]ST2Bst2bST2B.Contiguous store two-byte structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,1ST2B{ Zt1.B, Zt2.B }, Pg, [Xn|SP{, #imm, MUL VL}]%ST2B{ Zt1.B, Zt2.B }, Pg, [Xn|SP, Xm]RCWSCLRPrcwsclrpRCWSCLRP@Read check write software atomic bit clear on quadword in memoryRCWSCLRPXt1, Xt2, [Xn|SP]RCWSCLRPAXt1, Xt2, [Xn|SP]RCWSCLRPALXt1, Xt2, [Xn|SP]RCWSCLRPLXt1, Xt2, [Xn|SP]REV64rev64REV64/Reverse elements in 64-bit doublewords (vector)REV64Vd.T, Vn.T REV64 -- A64 Reverse bytes REV64 Xd, Xn REV Xd, XnEORQVeorqvEORQVBitwise exclusive OR of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as all zeros.EORQVVd.T, Pg, Zn.TbUSUBLBusublbUSUBLB Subtract the even-numbered unsigned elements of the second source vector from the corresponding unsigned elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.USUBLBZd.T, Zn.Tb, Zm.TbRSUBHNrsubhnRSUBHN'Rounding subtract returning high narrowRSUBHN{2} Vd.Tb, Vn.Ta, Vm.TaWHILERWwhilerwWHILERWnThis instruction checks two addresses for a conflict or overlap between address ranges of the form [addr,addr+WHILERWPd.T, Xn, XmBSL1Nbsl1nBSL1NCSelects bits from the inverted first source vector where the corresponding bit in the third source vector is '1', and from the second source vector where the corresponding bit in the third source vector is '0'. The result is placed destructively in the destination and first source vector. This instruction is unpredicated.BSL1NZdn.D, Zdn.D, Zm.D, Zk.DCNTPcntpCNTPCounts the number of active and true elements in the source predicate and places the scalar result in the destination general-purpose register. Inactive predicate elements are not counted.CNTPXd, Pg, Pn.TCNTPXd, PNn.T, vlADDSPLaddsplADDSPLAdd the Streaming SVE predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDSPLXd|SP, Xn|SP, #immLDLARldlarLDLARLoad LOAcquire registerLDLARWt, [Xn|SP{, #0}]LDLARXt, [Xn|SP{, #0}]STRstrSTR-Store SIMD&FP register (immediate offset)!STRBt, [Xn|SP], #simmSTRHt, [Xn|SP], #simmSTRSt, [Xn|SP], #simmSTRDt, [Xn|SP], #simmSTRQt, [Xn|SP], #simmSTRBt, [Xn|SP, #simm]!STRHt, [Xn|SP, #simm]!STRSt, [Xn|SP, #simm]!STRDt, [Xn|SP, #simm]!STRQt, [Xn|SP, #simm]!STRBt, [Xn|SP{, #pimm}]STRHt, [Xn|SP{, #pimm}]STRSt, [Xn|SP{, #pimm}]STRDt, [Xn|SP{, #pimm}]STRQt, [Xn|SP{, #pimm}]STRWt, [Xn|SP], #simmSTRXt, [Xn|SP], #simmSTRWt, [Xn|SP, #simm]!STRXt, [Xn|SP, #simm]!STRWt, [Xn|SP{, #pimm}]STRXt, [Xn|SP{, #pimm}]STRPt, [Xn|SP{, #imm, MUL VL}](STRBt, [Xn|SP, (Wm|Xm), extend {amount}] STRBt, [Xn|SP, Xm{, LSL amount}]*STRHt, [Xn|SP, (Wm|Xm){, extend {amount}}]*STRSt, [Xn|SP, (Wm|Xm){, extend {amount}}]*STRDt, [Xn|SP, (Wm|Xm){, extend {amount}}]*STRQt, [Xn|SP, (Wm|Xm){, extend {amount}}]*STRWt, [Xn|SP, (Wm|Xm){, extend {amount}}]*STRXt, [Xn|SP, (Wm|Xm){, extend {amount}}]STRZt, [Xn|SP{, #imm, MUL VL}].STR ZA[Wv, offs], [Xn|SP{, #offs, MUL VL}]STR ZT0, [Xn|SP]SUBPSsubpsSUBPSSubtract pointer, setting flagsSUBPSXd, Xn|SP, Xm|SPCPYPcpypCPYP Memory copyCPYP [Xd]!, [Xs]!, Xn!CPYM [Xd]!, [Xs]!, Xn!CPYE [Xd]!, [Xs]!, Xn!FMINPfminpFMINP3Floating-point minimum of pair of elements (scalar)FMINP Hd, Vn.2H FMINPVd, Vn.TFMINPVd.T, Vn.T, Vm.TFMINPVd.T, Vn.T, Vm.TFMINPZdn.T, Pg/M, Zdn.T, Zm.TUADALPuadalpUADALP)Unsigned add and accumulate long pairwiseUADALPVd.Ta, Vn.TbUADALPZda.T, Pg/M, Zn.TbUUNPKuunpkUUNPKUnpack elements from one or two source vectors and then zero-extend them to place in elements of twice their size within the two or four destination vectors.UUNPK{ Zd1.T-Zd2.T }, Zn.Tb'UUNPK{ Zd1.T-Zd4.T }, { Zn1.Tb-Zn2.Tb }TLBIPtlbipTLBIP TLBIP -- A64TLB invalidate pair operationTLBIP tlbip_op{, Xt1, Xt2}$SYSP #op1, Cn, Cm, #op2{, Xt1, Xt2}SUBHNsubhnSUBHNSubtract returning high narrowSUBHN{2} Vd.Tb, Vn.Ta, Vm.TaLDCLRBldclrbLDCLRB2Atomic bit clear on byte in memory, without return)STCLRBWs, [Xn|SP]LDCLRB Ws, WZR, [Xn|SP]+STCLRLBWs, [Xn|SP]LDCLRLB Ws, WZR, [Xn|SP]SQSHRNsqshrnSQSHRN0Signed saturating shift right narrow (immediate)SQSHRNVbd, Van, #shiftSQSHRN{2} Vd.Tb, Vn.Ta, #shiftSQDMLALTsqdmlaltSQDMLALTMultiply then double the corresponding odd-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2SQDMLALTZda.T, Zn.Tb, Zm.TbSQDMLALTZda.S, Zn.H, Zm.H[imm]SQDMLALTZda.D, Zn.S, Zm.S[imm]UADDLuaddlUADDLUnsigned add long (vector)UADDL{2} Vd.Ta, Vn.Tb, Vm.TbSMAXQVsmaxqvSMAXQV%Signed maximum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&FP destination register. Inactive elements in the source vector are treated as the minimum signed integer for the element size.SMAXQVVd.T, Pg, Zn.TbSMINPsminpSMINPSigned minimum pairwiseSMINPVd.T, Vn.T, Vm.TSMINPZdn.T, Pg/M, Zdn.T, Zm.TTCOMMITtcommitTCOMMITCommit current transactionTCOMMITDMBdmbDMBData memory barrierDMB (option|#imm)BbBBranch conditionally B.cond labelBlabelFABDfabdFABD+Floating-point absolute difference (vector)FABDHd, Hn, HmFABDVd, Vn, VmFABDVd.T, Vn.T, Vm.TFABDVd.T, Vn.T, Vm.TFABDZdn.T, Pg/M, Zdn.T, Zm.TLDSETAHldsetahLDSETAH$Atomic bit set on halfword in memoryLDSETAHWs, Wt, [Xn|SP]LDSETALHWs, Wt, [Xn|SP]LDSETHWs, Wt, [Xn|SP]LDSETLHWs, Wt, [Xn|SP]MSRmsrMSR(Move immediate value to special registerMSRpstatefield, #imm'MSR (systemreg|Sop0_op1_Cn_Cm_op2), XtRCWSCASrcwscasRCWSCAS?Read check write software compare and swap doubleword in memoryRCWSCASXs, Xt, [Xn|SP]RCWSCASAXs, Xt, [Xn|SP]RCWSCASALXs, Xt, [Xn|SP]RCWSCASLXs, Xt, [Xn|SP]RCWSWPrcwswpRCWSWP*Read check write swap doubleword in memoryRCWSWPXs, Xt, [Xn|SP]RCWSWPAXs, Xt, [Xn|SP]RCWSWPALXs, Xt, [Xn|SP]RCWSWPLXs, Xt, [Xn|SP]LSLlslLSLGShift left by immediate each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. Inactive elements in the destination vector register remain unmodified.LSLZdn.T, Pg/M, Zdn.T, #constLSLZdn.T, Pg/M, Zdn.T, Zm.DLSLZdn.T, Pg/M, Zdn.T, Zm.TLSLZd.T, Zn.T, #constLSLZd.T, Zn.T, Zm.DLSL (register) -- A64Logical shift left (register)LSL Wd, Wn, WmLSLV Wd, Wn, WmLSL Xd, Xn, XmLSLV Xd, Xn, XmLSL (immediate) -- A64Logical shift left (immediate)LSL Wd, Wn, #shift-UBFM Wd, Wn, #(-shift MOD 32), #(31-shift)LSL Xd, Xn, #shift-UBFM Xd, Xn, #(-shift MOD 64), #(63-shift)LDNF1SHldnf1shLDNF1SHContiguous load with non-faulting behavior of signed halfwords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector..LDNF1SH{ Zt.S }, Pg/Z, [Xn|SP{, #imm, MUL VL}].LDNF1SH{ Zt.D }, Pg/Z, [Xn|SP{, #imm, MUL VL}]PACDBpacdbPACDB9Pointer Authentication Code for data address, using key BPACDBXd, Xn|SPPACDZBXdSTXPstxpSTXP!Store exclusive pair of registersSTXPWs, Wt1, Wt2, [Xn|SP{, #0}]STXPWs, Xt1, Xt2, [Xn|SP{, #0}]CMPPcmppCMPP CMPP -- A64Compare with tagCMPP Xn|SP, Xm|SPSUBPS XZR, Xn|SP, Xm|SPFDOTfdotFDOTG8-bit floating-point dot product to half-precision (vector, by element)FDOTVd.Ta, Vn.Tb, Vm.2B[index]FDOTVd.Ta, Vn.Tb, Vm.TbFDOTVd.Ta, Vn.Tb, Vm.4B[index]FDOTVd.Ta, Vn.Tb, Vm.TbFDOTZda.S, Zn.B, Zm.BFDOTZda.S, Zn.B, Zm.B[imm]FDOTZda.H, Zn.B, Zm.BFDOTZda.H, Zn.B, Zm.B[imm]FDOTZda.S, Zn.H, Zm.HFDOTZda.S, Zn.H, Zm.H[imm]<FDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]<FDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]5FDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B5FDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B@FDOT ZA.S[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, { Zm1.B-Zm2.B }@FDOT ZA.S[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, { Zm1.B-Zm4.B }<FDOT ZA.H[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]<FDOT ZA.H[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B[index]5FDOT ZA.H[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B5FDOT ZA.H[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, Zm.B@FDOT ZA.H[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, { Zm1.B-Zm2.B }@FDOT ZA.H[Wv, offs{, VGx4}], { Zn1.B-Zn4.B }, { Zm1.B-Zm4.B }<FDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]<FDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H[index]5FDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H5FDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, Zm.H@FDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, { Zm1.H-Zm2.H }@FDOT ZA.S[Wv, offs{, VGx4}], { Zn1.H-Zn4.H }, { Zm1.H-Zm4.H }SMCsmcSMCSecure monitor call SMC #immPFIRSTpfirstPFIRSTSets the first active element in the destination predicate to true, otherwise elements from the source predicate are passed through unchanged. Sets the PFIRSTPdn.B, Pg, Pdn.BSWPPswppSWPPSwap quadword in memorySWPPXt1, Xt2, [Xn|SP]SWPPAXt1, Xt2, [Xn|SP]SWPPALXt1, Xt2, [Xn|SP]SWPPLXt1, Xt2, [Xn|SP]NANDSnandsNANDSBitwise NAND active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the NANDSPd.B, Pg/Z, Pn.B, Pm.BSSRAssraSSRA-Signed shift right and accumulate (immediate)SSRA Dd, Dn, #shiftSSRAVd.T, Vn.T, #shiftSSRAZda.T, Zn.T, #constGCSPUSHMgcspushmGCSPUSHMGCSPUSHM -- A64Guarded Control Stack push GCSPUSHM XtSYS #3, C7, C7, #0, XtSQABSsqabsSQABS Signed saturating absolute value SQABSVd, VnSQABSVd.T, Vn.TSQABSZd.T, Pg/M, Zn.TBFCVTbfcvtBFCVTHFloating-point convert from single-precision to BFloat16 format (scalar) BFCVTHd, SnBFCVTZd.B, { Zn1.H-Zn2.H }BFCVTZd.H, { Zn1.S-Zn2.S }BFCVTZd.H, Pg/M, Zn.SSMOPSsmopsSMOPS5This instruction works with a 32-bit element ZA tile.#SMOPSZAda.S, Pn/M, Pm/M, Zn.H, Zm.H#SMOPSZAda.S, Pn/M, Pm/M, Zn.B, Zm.B#SMOPSZAda.D, Pn/M, Pm/M, Zn.H, Zm.HLDUMAXldumaxLDUMAX7Atomic unsigned maximum on word or doubleword in memory LDUMAXWs, Wt, [Xn|SP]LDUMAXAWs, Wt, [Xn|SP]LDUMAXALWs, Wt, [Xn|SP]LDUMAXLWs, Wt, [Xn|SP]LDUMAXXs, Xt, [Xn|SP]LDUMAXAXs, Xt, [Xn|SP]LDUMAXALXs, Xt, [Xn|SP]LDUMAXLXs, Xt, [Xn|SP])STUMAXWs, [Xn|SP]LDUMAX Ws, WZR, [Xn|SP]+STUMAXLWs, [Xn|SP]LDUMAXL Ws, WZR, [Xn|SP])STUMAXXs, [Xn|SP]LDUMAX Xs, XZR, [Xn|SP]+STUMAXLXs, [Xn|SP]LDUMAXL Xs, XZR, [Xn|SP]CINCcincCINC CINC -- A64Conditional incrementCINC Wd, Wn, invcondCSINC Wd, Wn, Wm, condCINC Xd, Xn, invcondCSINC Xd, Xn, Xm, condCPYPTNcpyptnCPYPTN;Memory copy, reads and writes unprivileged and non-temporalCPYPTN [Xd]!, [Xs]!, Xn!CPYMTN [Xd]!, [Xs]!, Xn!CPYETN [Xd]!, [Xs]!, Xn!UMMLAummlaUMMLA:Unsigned 8-bit integer matrix multiply-accumulate (vector)UMMLAVd.4S, Vn.16B, Vm.16BUMMLAZda.S, Zn.B, Zm.BASRasrASRZShift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.ASRZdn.T, Pg/M, Zdn.T, #constASRZdn.T, Pg/M, Zdn.T, Zm.DASRZdn.T, Pg/M, Zdn.T, Zm.TASRZd.T, Zn.T, #constASRZd.T, Zn.T, Zm.DASR (register) -- A64!Arithmetic shift right (register)ASR Wd, Wn, WmASRV Wd, Wn, WmASR Xd, Xn, XmASRV Xd, Xn, XmASR (immediate) -- A64"Arithmetic shift right (immediate)ASR Wd, Wn, #shiftSBFM Wd, Wn, #shift, #31ASR Xd, Xn, #shiftSBFM Xd, Xn, #shift, #63CPYPRTRNcpyprtrnCPYPRTRN0Memory copy, reads unprivileged and non-temporalCPYPRTRN [Xd]!, [Xs]!, Xn!CPYMRTRN [Xd]!, [Xs]!, Xn!CPYERTRN [Xd]!, [Xs]!, Xn!CMHScmhsCMHS(Compare unsigned higher or same (vector)CMHS Dd, Dn, DmCMHSVd.T, Vn.T, Vm.TPRFDprfdPRFDGather prefetch of doublewords from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive addresses are not prefetched from memory.PRFDprfop, Pg, [Zn.S{, #imm}]PRFDprfop, Pg, [Zn.D{, #imm}]&PRFDprfop, Pg, [Xn|SP{, #imm, MUL VL}]"PRFDprfop, Pg, [Xn|SP, Xm, LSL #3]$PRFDprfop, Pg, [Xn|SP, Zm.S, mod #3]$PRFDprfop, Pg, [Xn|SP, Zm.D, mod #3]$PRFDprfop, Pg, [Xn|SP, Zm.D, LSL #3]RSHRNBrshrnbRSHRNBaShift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.RSHRNBZd.T, Zn.Tb, #constFSUBfsubFSUB Floating-point subtract (vector) FSUBVd.T, Vn.T, Vm.TFSUBVd.T, Vn.T, Vm.TFSUBHd, Hn, HmFSUBSd, Sn, SmFSUBDd, Dn, DmFSUBZdn.T, Pg/M, Zdn.T, constFSUBZdn.T, Pg/M, Zdn.T, Zm.TFSUBZd.T, Zn.T, Zm.T/FSUB ZA.T[Wv, offs{, VGx2}], { Zm1.T-Zm2.T }/FSUB ZA.H[Wv, offs{, VGx2}], { Zm1.H-Zm2.H }/FSUB ZA.T[Wv, offs{, VGx4}], { Zm1.T-Zm4.T }/FSUB ZA.H[Wv, offs{, VGx4}], { Zm1.H-Zm4.H }INCBincbINCBDetermines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination.INCBXdn{, pattern{, MUL #imm}}INCDXdn{, pattern{, MUL #imm}}INCHXdn{, pattern{, MUL #imm}}INCWXdn{, pattern{, MUL #imm}}GCSSS2gcsss2GCSSS2 GCSSS2 -- A64$Guarded Control Stack switch stack 2 GCSSS2 XtSYSL Xt, #3, C7, C7, #3FMINNMPfminnmpFMINNMP:Floating-point minimum number of pair of elements (scalar)FMINNMP Hd, Vn.2HFMINNMPVd, Vn.TFMINNMPVd.T, Vn.T, Vm.TFMINNMPVd.T, Vn.T, Vm.TFMINNMPZdn.T, Pg/M, Zdn.T, Zm.TUQRSHRNuqrshrnUQRSHRN:Unsigned saturating rounded shift right narrow (immediate)UQRSHRNVbd, Van, #shift UQRSHRN{2} Vd.Tb, Vn.Ta, #shift$UQRSHRNZd.H, { Zn1.S-Zn2.S }, #const&UQRSHRNZd.T, { Zn1.Tb-Zn4.Tb }, #constSMULLTsmulltSMULLTMultiply the corresponding odd-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SMULLTZd.T, Zn.Tb, Zm.TbSMULLTZd.S, Zn.H, Zm.H[imm]SMULLTZd.D, Zn.S, Zm.S[imm]ADDSVLaddsvlADDSVLAdd the Streaming SVE vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDSVLXd|SP, Xn|SP, #immCBZcbzCBZCompare and branch on zero CBZWt, label CBZXt, labelSMMLAsmmlaSMMLA8Signed 8-bit integer matrix multiply-accumulate (vector)SMMLAVd.4S, Vn.16B, Vm.16BSMMLAZda.S, Zn.B, Zm.B SM3PARTW2 sm3partw2 SM3PARTW2 SM3PARTW2SM3PARTW2Vd.4S, Vn.4S, Vm.4SSHA256Hsha256hSHA256HSHA256 hash update (part 1)SHA256HQd, Qn, Vm.4SCASBcasbCASBCompare and swap byte in memoryCASBWs, Wt, [Xn|SP{, #0}]CASABWs, Wt, [Xn|SP{, #0}]CASALBWs, Wt, [Xn|SP{, #0}]CASLBWs, Wt, [Xn|SP{, #0}]LDADDBldaddbLDADDB,Atomic add on byte in memory, without return)STADDBWs, [Xn|SP]LDADDB Ws, WZR, [Xn|SP]+STADDLBWs, [Xn|SP]LDADDLB Ws, WZR, [Xn|SP]STLRHstlrhSTLRHStore-release register halfwordSTLRHWt, [Xn|SP{, #0}]GCSSTRgcsstrGCSSTRGuarded Control Stack storeGCSSTRXt, [Xn|SP]SCLAMPsclampSCLAMPdClamp each signed element in the two or four destination vectors to between the signed minimum value in the corresponding element of the first source vector and the signed maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.!SCLAMP{ Zd1.T-Zd2.T }, Zn.T, Zm.T!SCLAMP{ Zd1.T-Zd4.T }, Zn.T, Zm.TSCLAMPZd.T, Zn.T, Zm.TUABALuabalUABAL0Unsigned absolute difference and accumulate longUABAL{2} Vd.Ta, Vn.Tb, Vm.TbSSUBLTssubltSSUBLTSubtract the odd-numbered signed elements of the second source vector from the corresponding signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.SSUBLTZd.T, Zn.Tb, Zm.TbSADDLsaddlSADDLSigned add long (vector)SADDL{2} Vd.Ta, Vn.Tb, Vm.TbST1Qst1qST1QScatter store of quadwords from the active elements of a vector register to the memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements are not written to memory.ST1Q{ Zt.Q }, Pg, [Zn.D{, Xm}]4ST1Q{ ZAtHV.Q[Ws, offs] }, Pg, [Xn|SP{, Xm, LSL #4}]MNEGmnegMNEG MNEG -- A64Multiply-negateMNEG Wd, Wn, WmMSUB Wd, Wn, Wm, WZRMNEG Xd, Xn, XmMSUB Xd, Xn, Xm, XZRWFETwfetWFETWait for event with timeoutWFETXtSQCVTUsqcvtuSQCVTUSaturate the signed integer value in each element of the two source vectors to unsigned integer value that is half the original source element width, and place the results in the half-width destination elements.SQCVTUZd.H, { Zn1.S-Zn2.S }SQCVTUZd.T, { Zn1.Tb-Zn4.Tb }ADCadcADCAdd with carry ADCWd, Wn, Wm ADCXd, Xn, XmRDSVLrdsvlRDSVLMultiply the Streaming SVE vector register size in bytes by an immediate in the range -32 to 31 and place the result in the 64-bit destination general-purpose register. RDSVLXd, #immUQSHRNBuqshrnbUQSHRNBEShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2UQSHRNBZd.T, Zn.Tb, #constLUTI4luti4LUTI4$Lookup table read with 4-bit indices "LUTI4Vd.16B, { Vn.16B }, Vm[index])LUTI4Vd.8H, { Vn1.8H, Vn2.8H }, Vm[index]$LUTI4{ Zd1.T-Zd2.T }, ZT0, Zn[index]%LUTI4{ Zd1.T, Zd2.T }, ZT0, Zn[index]&LUTI4{ Zd1.B-Zd4.B }, ZT0, { Zn1-Zn2 }5LUTI4{ Zd1.B, Zd2.B, Zd3.B, Zd4.B }, ZT0, { Zn1-Zn2 }$LUTI4{ Zd1.T-Zd4.T }, ZT0, Zn[index]3LUTI4{ Zd1.H, Zd2.H, Zd3.H, Zd4.H }, ZT0, Zn[index]LUTI4Zd.T, ZT0, Zn[index]LUTI4Zd.B, { Zn.B }, Zm[index]&LUTI4Zd.H, { Zn1.H, Zn2.H }, Zm[index]LUTI4Zd.H, { Zn.H }, Zm[index]FADDVfaddvFADDVFloating-point add horizontally over all lanes of a vector using a recursive pairwise reduction, and place the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as +0.0.FADDVVd, Pg, Zn.TFVDOTfvdotFVDOTyThe instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with horizontal group of two 8-bit floating-point values in the indexed 16-bit group of the corresponding 128-bit segment of the second source vector. The half-precision sum-of-products are scaled by 2<FVDOT ZA.H[Wv, offs{, VGx2}], { Zn1.B-Zn2.B }, Zm.B[index]<FVDOT ZA.S[Wv, offs{, VGx2}], { Zn1.H-Zn2.H }, Zm.H[index]CPYFPNcpyfpnCPYFPN7Memory copy forward-only, reads and writes non-temporalCPYFPN [Xd]!, [Xs]!, Xn!CPYFMN [Xd]!, [Xs]!, Xn!CPYFEN [Xd]!, [Xs]!, Xn!SQDECDsqdecdSQDECDkDetermines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.%SQDECDXdn, Wdn{, pattern{, MUL #imm}} SQDECDXdn{, pattern{, MUL #imm}}"SQDECDZdn.D{, pattern{, MUL #imm}}ST1st1ST1PStore multiple single-element structures from one, two, three, or four registersST1 {Vt.T }, [Xn|SP]ST1 {Vt.T, Vt2.T }, [Xn|SP]#ST1 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP]*ST1 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP]ST1 {Vt.T }, [Xn|SP], immST1 {Vt.T }, [Xn|SP], Xm!ST1 {Vt.T, Vt2.T }, [Xn|SP], imm ST1 {Vt.T, Vt2.T }, [Xn|SP], Xm(ST1 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], imm'ST1 {Vt.T, Vt2.T, Vt3.T }, [Xn|SP], Xm/ST1 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], imm.ST1 {Vt.T, Vt2.T, Vt3.T, Vt4.T }, [Xn|SP], XmST1 {Vt.B }[index], [Xn|SP]ST1 {Vt.H }[index], [Xn|SP]ST1 {Vt.S }[index], [Xn|SP]ST1 {Vt.D }[index], [Xn|SP] ST1 {Vt.B }[index], [Xn|SP], #1 ST1 {Vt.B }[index], [Xn|SP], Xm ST1 {Vt.H }[index], [Xn|SP], #2 ST1 {Vt.H }[index], [Xn|SP], Xm ST1 {Vt.S }[index], [Xn|SP], #4 ST1 {Vt.S }[index], [Xn|SP], Xm ST1 {Vt.D }[index], [Xn|SP], #8 ST1 {Vt.D }[index], [Xn|SP], XmSADDWBsaddwbSADDWBAdd the even-numbered signed elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.SADDWBZd.T, Zn.T, Zm.TbLDAPRldaprLDAPRLoad-acquire RCpc registerLDAPRWt, [Xn|SP], #4LDAPRXt, [Xn|SP], #8LDAPRWt, [Xn|SP {, #0}]LDAPRXt, [Xn|SP {, #0}]NORSnorsNORSBitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the NORSPd.B, Pg/Z, Pn.B, Pm.BSABDsabdSABDSigned absolute differenceSABDVd.T, Vn.T, Vm.TSABDZdn.T, Pg/M, Zdn.T, Zm.TCAScasCAS-Compare and swap word or doubleword in memoryCASWs, Wt, [Xn|SP{, #0}]CASAWs, Wt, [Xn|SP{, #0}]CASALWs, Wt, [Xn|SP{, #0}]CASLWs, Wt, [Xn|SP{, #0}]CASXs, Xt, [Xn|SP{, #0}]CASAXs, Xt, [Xn|SP{, #0}]CASALXs, Xt, [Xn|SP{, #0}]CASLXs, Xt, [Xn|SP{, #0}]SMSUBLsmsublSMSUBLSigned multiply-subtract longSMSUBLXd, Wn, Wm, Xa