LD1W (scalar plus scalar, tile slice) Contiguous load of words to 32-bit element ZA tile slice The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of 32-bit elements in a vector. The immediate offset is in the range 0 to 3. The memory address is generated by a 64-bit scalar base and an optional 64-bit scalar offset which is multiplied by 4 and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector. Green True True True SM_1_only 1 1 1 0 0 0 0 0 1 0 0 0 LD1W { <ZAt><HV>.S[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}] if !IsFeatureImplemented(FEAT_SME) then UNDEFINED; constant integer n = UInt(Rn); constant integer m = UInt(Rm); constant integer g = UInt('0':Pg); constant integer s = UInt('011':Rs); constant integer t = UInt(ZAt); constant integer offset = UInt(off2); constant integer esize = 32; constant boolean vertical = V == '1'; <ZAt> Is the name of the ZA tile ZA0-ZA3 to be accessed, encoded in the "ZAt" field. <HV> Is the horizontal or vertical slice indicator, V <HV> 0 H 1 V
<Ws> Is the 32-bit name of the slice index register W12-W15, encoded in the "Rs" field. <offs> Is the slice index offset, in the range 0 to 3, encoded in the "off2" field. <Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field. <Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field. <Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the "Rm" field.
CheckStreamingSVEAndZAEnabled(); constant integer VL = CurrentVL; constant integer PL = VL DIV 8; constant integer dim = VL DIV esize; bits(64) base; bits(64) addr; constant bits(PL) mask = P[g, PL]; bits(64) moffs = X[m, 64]; constant bits(32) index = X[s, 32]; constant integer slice = (UInt(index) + offset) MOD dim; bits(VL) result; constant integer mbytes = esize DIV 8; constant boolean contiguous = TRUE; constant boolean nontemporal = FALSE; constant boolean tagchecked = TRUE; constant AccessDescriptor accdesc = CreateAccDescSME(MemOp_LOAD, nontemporal, contiguous, tagchecked); if n == 31 then if (AnyActiveElement(mask, esize) || ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE)) then CheckSPAlignment(); base = SP[]; else base = X[n, 64]; for e = 0 to dim - 1 addr = AddressAdd(base, UInt(moffs) * mbytes, accdesc); if ActivePredicateElement(mask, e, esize) then Elem[result, e, esize] = Mem[addr, mbytes, accdesc]; else Elem[result, e, esize] = Zeros(esize); moffs = moffs + 1; ZAslice[t, esize, vertical, slice, VL] = result;