CPYPWTWN, CPYMWTWN, CPYEWTWN
Memory copy, writes unprivileged and non-temporal
These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively
in memory: CPYPWTWN, then CPYMWTWN, and then CPYEWTWN.
CPYPWTWN performs some preconditioning of the arguments suitable for using the CPYMWTWN instruction,
and performs an IMPLEMENTATION DEFINED amount of the memory copy.
CPYMWTWN performs an IMPLEMENTATION DEFINED amount of the memory copy.
CPYEWTWN performs the last part of the memory copy.
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy
allows some optimization of the size that can be performed.
For CPYPWTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is
based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B.
Which algorithm is used is IMPLEMENTATION DEFINED.
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTWN, option A (which results in encoding PSTATE.C = 0):
PSTATE.{N,Z,V} are set to {0,0,0}.
If the copy is in the forward direction, then:
Xs holds the original Xs + saturated Xn.
Xd holds the original Xd + saturated Xn.
Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED
number of bytes copied.
If the copy is in the backward direction, then:
Xs and Xd are unchanged.
Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED
number of bytes copied.
After execution of CPYPWTWN, option B (which results in encoding PSTATE.C = 1):
If the copy is in the forward direction, then:
Xs holds the original Xs + an IMPLEMENTATION DEFINED
number of bytes copied.
Xd holds the original Xd + an IMPLEMENTATION DEFINED
number of bytes copied.
Xn holds the saturated Xn - an IMPLEMENTATION DEFINED
number of bytes copied.
PSTATE.{N,Z,V} are set to {0,0,0}.
If the copy is in the backward direction, then:
Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED
number of bytes copied.
Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED
number of bytes copied.
Xn holds the saturated Xn - an IMPLEMENTATION DEFINED
number of bytes copied.
PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
Xn is treated as a signed 64-bit number.
If the copy is in the forward direction (Xn is a negative number), then:
Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
Xs holds the lowest address that the copy is copied from -Xn.
Xd holds the lowest address that the copy is made to -Xn.
At the end of the instruction, the value of Xn is written back with -1* the
number of bytes remaining to be copied in the memory copy in total.
If the copy is in the backward direction (Xn is a positive number), then:
Xn holds the number of bytes remaining to be copied in the memory copy in total.
Xs holds the highest address that the copy is copied from -Xn+1.
Xd holds the highest address that the copy is copied to -Xn+1.
At the end of the instruction, the value of Xn is written back with the
number of bytes remaining to be copied in the memory copy in total.
For CPYMWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
Xn holds the number of bytes to be copied in the memory copy in total.
If the copy is in the forward direction (PSTATE.N == 0), then:
Xs holds the lowest address that the copy is copied from.
Xd holds the lowest address that the copy is copied to.
At the end of the instruction:
the value of Xn is written back with the number of bytes remaining to be copied
in the memory copy in total.
the value of Xs is written back with the lowest address that has not been copied from.
the value of Xd is written back with the lowest address that has not been copied to.
If the copy is in the backward direction (PSTATE.N == 1), then:
Xs holds the highest address that the copy is copied from +1.
Xd holds the highest address that the copy is copied to +1.
At the end of the instruction:
the value of Xn is written back with the number of bytes remaining to be copied
in the memory copy in total.
the value of Xs is written back with the highest address that has not been copied from +1.
the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
Xn is treated as a signed 64-bit number.
If the copy is in the forward direction (Xn is a negative number), then:
Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
Xs holds the lowest address that the copy is copied from -Xn.
Xd holds the lowest address that the copy is made to -Xn.
At the end of the instruction, the value of Xn is written back with 0.
If the copy is in the backward direction (Xn is a positive number), then:
Xn holds the number of bytes remaining to be copied in the memory copy in total.
Xs holds the highest address that the copy is copied from -Xn+1.
Xd holds the highest address that the copy is copied to -Xn+1.
At the end of the instruction, the value of Xn is written back with 0.
For CPYEWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
Xn holds the number of bytes to be copied in the memory copy in total.
If the copy is in the forward direction (PSTATE.N == 0), then:
Xs holds the lowest address that the copy is copied from.
Xd holds the lowest address that the copy is copied to.
At the end of the instruction:
the value of Xn is written back with 0.
the value of Xs is written back with the lowest address that has not been copied from.
the value of Xd is written back with the lowest address that has not been copied to.
If the copy is in the backward direction (PSTATE.N == 1), then:
Xs holds the highest address that the copy is copied from +1.
Xd holds the highest address that the copy is copied to +1.
At the end of the instruction:
the value of Xn is written back with 0.
the value of Xs is written back with the highest address that has not been copied from +1.
the value of Xd is written back with the highest address that has not been copied to +1.
Explicit Memory Write effects produced by the instruction behave as if the instruction was
executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
The instruction is executed at EL1.
The instruction is executed at EL2 when the Effective value
of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the Explicit Memory Write effects operate with the restrictions determined by
the Exception level at which the instruction is executed.
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on UNPREDICTABLE behaviors, and particularly Memory Copy and Memory Set CPY*.
0
1
1
1
0
1
0
0
1
0
1
0
1
0
0
CPYPWTWN [<Xd>]!, [<Xs>]!, <Xn>!
0
1
CPYMWTWN [<Xd>]!, [<Xs>]!, <Xn>!
1
0
CPYEWTWN [<Xd>]!, [<Xs>]!, <Xn>!
if !IsFeatureImplemented(FEAT_MOPS) || sz != '00' then UNDEFINED;
CPYParams memcpy;
memcpy.d = UInt(Rd);
memcpy.s = UInt(Rs);
memcpy.n = UInt(Rn);
constant bits(4) options = op2;
constant boolean rnontemporal = options<3> == '1';
constant boolean wnontemporal = options<2> == '1';
case op1 of
when '00' memcpy.stage = MOPSStage_Prologue;
when '01' memcpy.stage = MOPSStage_Main;
when '10' memcpy.stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
<Xd>
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination address and is updated by the instruction, encoded in the "Rd" field.
<Xd>
For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the destination address, encoded in the "Rd" field.
<Xs>
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source address and is updated by the instruction, encoded in the "Rs" field.
<Xs>
For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the source address, encoded in the "Rs" field.
<Xn>
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of bytes to be transferred and is updated by the instruction to encode the remaining size and destination, encoded in the "Rn" field.
<Xn>
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred, encoded in the "Rn" field.
<Xn>
For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the "Rn" field.
CheckMOPSEnabled();
CheckCPYConstrainedUnpredictable(memcpy.n, memcpy.d, memcpy.s);
memcpy.nzcv = PSTATE.<N,Z,C,V>;
memcpy.toaddress = X[memcpy.d, 64];
memcpy.fromaddress = X[memcpy.s, 64];
if memcpy.stage == MOPSStage_Prologue then
memcpy.cpysize = UInt(X[memcpy.n, 64]);
else
memcpy.cpysize = SInt(X[memcpy.n, 64]);
memcpy.implements_option_a = CPYOptionA();
constant boolean rprivileged = (if options<1> == '1' then AArch64.IsUnprivAccessPriv()
else PSTATE.EL != EL0);
constant boolean wprivileged = (if options<0> == '1' then AArch64.IsUnprivAccessPriv()
else PSTATE.EL != EL0);
constant AccessDescriptor raccdesc = CreateAccDescMOPS(MemOp_LOAD, rprivileged, rnontemporal);
constant AccessDescriptor waccdesc = CreateAccDescMOPS(MemOp_STORE, wprivileged, wnontemporal);
if memcpy.stage == MOPSStage_Prologue then
if memcpy.cpysize > ArchMaxMOPSCPYSize then
memcpy.cpysize = ArchMaxMOPSCPYSize;
memcpy.forward = IsMemCpyForward(memcpy);
if memcpy.implements_option_a then
memcpy.nzcv = '0000';
if memcpy.forward then
// Copy in the forward direction offsets the arguments.
memcpy.toaddress = memcpy.toaddress + memcpy.cpysize;
memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize;
memcpy.cpysize = 0 - memcpy.cpysize;
else
if !memcpy.forward then
// Copy in the reverse direction offsets the arguments.
memcpy.toaddress = memcpy.toaddress + memcpy.cpysize;
memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize;
memcpy.nzcv = '1010';
else
memcpy.nzcv = '0010';
memcpy.stagecpysize = MemCpyStageSize(memcpy);
if memcpy.stage != MOPSStage_Prologue then
memcpy.forward = memcpy.cpysize < 0 || (!memcpy.implements_option_a && memcpy.nzcv<3> == '0');
CheckMemCpyParams(memcpy, options);
integer copied;
boolean iswrite;
AddressDescriptor memaddrdesc;
PhysMemRetStatus memstatus;
boolean fault = FALSE;
MOPSBlockSize B;
if memcpy.implements_option_a then
while memcpy.stagecpysize != 0 && !fault do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(memcpy);
if memcpy.forward then
assert B <= -1 * memcpy.stagecpysize;
(copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(
memcpy.toaddress + memcpy.cpysize,
memcpy.fromaddress + memcpy.cpysize,
memcpy.forward, B,
raccdesc, waccdesc);
if copied != B then
fault = TRUE;
else
memcpy.cpysize = memcpy.cpysize + B;
memcpy.stagecpysize = memcpy.stagecpysize + B;
else
assert B <= memcpy.stagecpysize;
memcpy.cpysize = memcpy.cpysize - B;
memcpy.stagecpysize = memcpy.stagecpysize - B;
(copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(
memcpy.toaddress + memcpy.cpysize,
memcpy.fromaddress + memcpy.cpysize,
memcpy.forward, B, raccdesc,
waccdesc);
if copied != B then
fault = TRUE;
memcpy.cpysize = memcpy.cpysize + B;
memcpy.stagecpysize = memcpy.stagecpysize + B;
else
while memcpy.stagecpysize > 0 && !fault do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(memcpy);
assert B <= memcpy.stagecpysize;
if memcpy.forward then
(copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress,
memcpy.fromaddress,
memcpy.forward, B,
raccdesc, waccdesc);
if copied != B then
fault = TRUE;
else
memcpy.fromaddress = memcpy.fromaddress + B;
memcpy.toaddress = memcpy.toaddress + B;
else
(copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress - B,
memcpy.fromaddress - B,
memcpy.forward, B,
raccdesc, waccdesc);
if copied != B then
fault = TRUE;
else
memcpy.fromaddress = memcpy.fromaddress - B;
memcpy.toaddress = memcpy.toaddress - B;
if !fault then
memcpy.cpysize = memcpy.cpysize - B;
memcpy.stagecpysize = memcpy.stagecpysize - B;
UpdateCpyRegisters(memcpy, fault, copied);
if fault then
if IsFault(memaddrdesc) then
AArch64.Abort(memaddrdesc.vaddress, memaddrdesc.fault);
else
constant AccessDescriptor accdesc = if iswrite then waccdesc else raccdesc;
HandleExternalAbort(memstatus, iswrite, memaddrdesc, B, accdesc);
if memcpy.stage == MOPSStage_Prologue then
PSTATE.<N,Z,C,V> = memcpy.nzcv;