LDNT1W (scalar plus immediate, consecutive registers)
Contiguous load non-temporal of words to multiple consecutive vectors (immediate index)
Contiguous load non-temporal of words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
Green
True
True
True
It has encodings from 2 classes:
Two registers
and
Four registers
1
0
1
0
0
0
0
0
0
1
0
0
0
1
0
1
LDNT1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]
if !IsFeatureImplemented(FEAT_SME2) && !IsFeatureImplemented(FEAT_SVE2p1) then UNDEFINED;
constant integer n = UInt(Rn);
constant integer g = UInt('1':PNg);
constant integer nreg = 2;
constant integer t = UInt(Zt:'0');
constant integer esize = 32;
constant integer offset = SInt(imm4);
1
0
1
0
0
0
0
0
0
1
0
0
1
1
0
0
1
LDNT1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]
if !IsFeatureImplemented(FEAT_SME2) && !IsFeatureImplemented(FEAT_SVE2p1) then UNDEFINED;
constant integer n = UInt(Rn);
constant integer g = UInt('1':PNg);
constant integer nreg = 4;
constant integer t = UInt(Zt:'00');
constant integer esize = 32;
constant integer offset = SInt(imm4);
<Zt1>
For the two registers variant: is the name of the first scalable vector register to be transferred, encoded as "Zt" times 2.
<Zt1>
For the four registers variant: is the name of the first scalable vector register to be transferred, encoded as "Zt" times 4.
<Zt4>
Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" times 4 plus 3.
<Zt2>
Is the name of the second scalable vector register to be transferred, encoded as "Zt" times 2 plus 1.
<PNg>
Is the name of the governing scalable predicate register PN8-PN15, with predicate-as-counter encoding, encoded in the "PNg" field.
<Xn|SP>
Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm>
For the two registers variant: is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0, encoded in the "imm4" field.
<imm>
For the four registers variant: is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0, encoded in the "imm4" field.
if IsFeatureImplemented(FEAT_SVE2p1) then CheckSVEEnabled(); else CheckStreamingSVEEnabled();
constant integer VL = CurrentVL;
constant integer PL = VL DIV 8;
constant integer elements = VL DIV esize;
constant integer mbytes = esize DIV 8;
bits(64) base;
bits(64) addr;
constant bits(PL) pred = P[g, PL];
constant bits(PL * nreg) mask = CounterToPredicate(pred<15:0>, PL * nreg);
array [0..3] of bits(VL) values;
constant boolean contiguous = TRUE;
constant boolean nontemporal = TRUE;
constant integer transfer = t;
constant boolean tagchecked = n != 31;
constant AccessDescriptor accdesc = CreateAccDescSVE(MemOp_LOAD, nontemporal, contiguous,
tagchecked);
if !AnyActiveElement(mask, esize) then
if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n, 64];
addr = AddressAdd(base, offset * nreg * elements * mbytes, accdesc);
for r = 0 to nreg-1
for e = 0 to elements-1
if ActivePredicateElement(mask, r * elements + e, esize) then
Elem[values[r], e, esize] = Mem[addr, mbytes, accdesc];
else
Elem[values[r], e, esize] = Zeros(esize);
addr = AddressIncrement(addr, mbytes, accdesc);
for r = 0 to nreg-1
Z[transfer+r, VL] = values[r];