[ { "Name": "AAA", "Alias": [], "Brief": "ASCII Adjust After Addition", "Description": "\nAdjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.\nIf the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are set to 0.\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "AAD", "Alias": [], "Brief": "ASCII Adjust AX Before Division", "Description": "\nAdjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AX register by an unpacked BCD value.\nThe AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to 00H. The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit (base 10) number in registers AH and AL.\nThe generalized version of this instruction allows adjustment of two unpacked digits of any number base (see the “Operation” section below), by setting the imm8 byte to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAD mnemonic is interpreted by all assemblers to mean adjust ASCII (base 10) values. To adjust values in another number base, the instruction must be hand coded in machine code (D5 imm8).\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "AAM", "Alias": [], "Brief": "ASCII Adjust AX After Multiply", "Description": "\nAdjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked (base 10) BCD values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain the correct 2-digit unpacked (base 10) BCD result.\nThe generalized version of this instruction allows adjustment of the contents of the AX to create two unpacked digits of any number base (see the “Operation” section below). Here, the imm8 byte is set to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAM mnemonic is interpreted by all assemblers to mean adjust to ASCII (base 10) values. To adjust to values in another number base, the instruction must be hand coded in machine code (D4 imm8).\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "AAS", "Alias": [], "Brief": "ASCII Adjust AL After Subtraction", "Description": "\nAdjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.\nIf the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top four bits set to 0.\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "ADC", "Alias": [], "Brief": "Add with Carry", "Description": "\nAdds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a carry from a previous addition. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.\nThe ADC instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.\nThe ADC instruction is usually executed as part of a multibyte or multiword addition in which an ADD instruction is followed by an ADC instruction.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "ADCX", "Alias": [], "Brief": "Unsigned Integer Addition of Two Operands with Carry Flag", "Description": "\nPerforms an unsigned addition of the destination operand (first operand), the source operand (second operand) and the carry-flag (CF) and stores the result in the destination operand. The destination operand is a general-purpose register, whereas the source operand can be a general-purpose register or memory location. The state of CF can represent a carry from a previous addition. The instruction sets the CF flag with the carry generated by the unsigned addition of the operands.\nThe ADCX instruction is executed in the context of multi-precision addition, where we add a series of operands with a carry-chain. At the beginning of a chain of additions, we need to make sure the CF is in a desired initial state. Often, this initial state needs to be 0, which can be achieved with an instruction to zero the CF (e.g. XOR).\nThis instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode.\nIn 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi-tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64 bits.\nADCX executes normally either inside or outside a transaction region.\nNote: ADCX defines the OF flag differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 2A.\n" }, { "Name": "ADD", "Alias": [], "Brief": "Add", "Description": "\nAdds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.\nThe ADD instruction performs integer addition. It evaluates the result for both signed and unsigned integer oper-ands and sets the OF and CF flags to indicate a carry (overflow) in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "ADDPD", "Alias": [], "Brief": "Add Packed Double", "Description": "\nPerforms a SIMD add of the two packed double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed double-precision floating-point results in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an overview of SIMD double-precision floating-point operation.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "ADDPS", "Alias": [], "Brief": "Add Packed Single", "Description": "\nPerforms a SIMD add of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an overview of SIMD single-precision floating-point operation.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "ADDSD", "Alias": [], "Brief": "Add Scalar Double", "Description": "\nAdds the low double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the double-precision floating-point result in the destination operand.\nThe source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a scalar double-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "ADDSS", "Alias": [], "Brief": "Add Scalar Single", "Description": "\nAdds the low single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the single-precision floating-point result in the destination operand.\nThe source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a scalar single-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "ADDSUBPD", "Alias": [], "Brief": "Packed Double", "Description": "\nAdds odd-numbered double-precision floating-point values of the first source operand (second operand) with the corresponding double-precision floating-point values from the second source operand (third operand); stores the result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered double-precision floating-point values from the second source operand from the corresponding double-precision floating values in the first source operand; stores the result into the even-numbered values of the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Figure 3-3.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n\nADDSUBPD xmm1, xmm2/m128\nxmm2/m128\n[127:64]\n[63:0]\nRESULT:\nxmm1[127:64] + xmm2/m128[127:64]\nxmm1[63:0] - xmm2/m128[63:0]\nxmm1\n[127:64]\n[63:0]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-3. ADDSUBPD—Packed Double-FP Add/Subtract\n" }, { "Name": "ADDSUBPS", "Alias": [], "Brief": "Packed Single", "Description": "\nAdds odd-numbered single-precision floating-point values of the first source operand (second operand) with the corresponding single-precision floating-point values from the second source operand (third operand); stores the result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered single-precision floating-point values from the second source operand from the corresponding single-precision floating values in the first source operand; stores the result into the even-numbered values of the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Figure 3-4.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n\nADDSUBPS xmm1, xmm2/m128\nxmm2/\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nm128\nRESULT:\nxmm1[127:96] +\nxmm1[95:64] - xmm2/\nxmm1[63:32] +\nxmm1[31:0] -\nxmm1\nxmm2/m128[127:96]\nm128[95:64]\nxmm2/m128[63:32]\nxmm2/m128[31:0]\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nOM15992\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-4. ADDSUBPS—Packed Single-FP Add/Subtract\n" }, { "Name": "ADOX", "Alias": [], "Brief": "Unsigned Integer Addition of Two Operands with Overflow Flag", "Description": "\nPerforms an unsigned addition of the destination operand (first operand), the source operand (second operand) and the overflow-flag (OF) and stores the result in the destination operand. The destination operand is a general-purpose register, whereas the source operand can be a general-purpose register or memory location. The state of OF represents a carry from a previous addition. The instruction sets the OF flag with the carry generated by the unsigned addition of the operands.\nThe ADOX instruction is executed in the context of multi-precision addition, where we add a series of operands with a carry-chain. At the beginning of a chain of additions, we execute an instruction to zero the OF (e.g. XOR).\nThis instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode.\nIn 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi-tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64-bits.\nADOX executes normally either inside or outside a transaction region.\nNote: ADOX defines the CF and OF flags differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A.\n" }, { "Name": "AESDEC", "Alias": [], "Brief": "Perform One Round of an AES Decryption Flow", "Description": "\nThis instruction performs a single round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.\nUse the AESDEC instruction for all but the last decryption round. For the last decryption round, use the AESDEC-CLAST instruction.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "AESDECLAST", "Alias": [], "Brief": "Perform Last Round of an AES Decryption Flow", "Description": "\nThis instruction performs the last round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "AESENC", "Alias": [], "Brief": "Perform One Round of an AES Encryption Flow", "Description": "\nThis instruction performs a single round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.\nUse the AESENC instruction for all but the last encryption rounds. For the last encryption round, use the AESENC-CLAST instruction.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "AESENCLAST", "Alias": [], "Brief": "Perform Last Round of an AES Encryption Flow", "Description": "\nThis instruction performs the last round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "AESIMC", "Alias": [], "Brief": "Perform the AES InvMixColumn Transformation", "Description": "\nPerform the InvMixColumns transformation on the source operand and store the result in the destination operand. The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca-tion.\nNote: the AESIMC instruction should be applied to the expanded AES round keys (except for the first and last round key) in order to prepare them for decryption using the “Equivalent Inverse Cipher” (defined in FIPS 197).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "AESKEYGENASSIST", "Alias": [], "Brief": "AES Round Key Generation Assist", "Description": "\nAssist in expanding the AES cipher key, by computing steps towards generating a round key for encryption, using 128-bit data specified in the source operand and an 8-bit round constant specified as an immediate, store the result in the destination operand.\nThe destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca-tion.\n128-bit Legacy SSE version:Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "AND", "Alias": [], "Brief": "Logical AND", "Description": "\nPerforms a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0.\nThis instruction can be used with a LOCK prefix to allow the it to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "ANDN", "Alias": [], "Brief": "Logical AND NOT", "Description": "\nPerforms a bitwise logical AND of inverted second operand (the first source operand) with the third operand (the second source operand). The result is stored in the first operand (destination operand).\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "ANDNPD", "Alias": [], "Brief": "Bitwise Logical AND NOT of Packed Double", "Description": "\nPerforms a bitwise logical AND NOT of the two or four packed double-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "ANDNPS", "Alias": [], "Brief": "Bitwise Logical AND NOT of Packed Single", "Description": "\nInverts the bits of the four packed single-precision floating-point values in the destination operand (first operand), performs a bitwise logical AND of the four packed single-precision floating-point values in the source operand (second operand) and the temporary inverted result, and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "ANDPD", "Alias": [], "Brief": "Bitwise Logical AND of Packed Double", "Description": "\nPerforms a bitwise logical AND of the two packed double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "ANDPS", "Alias": [], "Brief": "Bitwise Logical AND of Packed Single", "Description": "\nPerforms a bitwise logical AND of the four or eight packed single-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "ARPL", "Alias": [], "Brief": "Adjust RPL Field of Segment Selector", "Description": "\nCompares the RPL fields of two segment selectors. The first operand (the destination operand) contains one segment selector and the second operand (source operand) contains the other. (The RPL field is located in bits 0 and 1 of each operand.) If the RPL field of the destination operand is less than the RPL field of the source operand, the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand. Otherwise, the ZF flag is cleared and no change is made to the destination operand. (The destination operand can be a word register or a memory location; the source operand must be a word register.)\nThe ARPL instruction is provided for use by operating-system procedures (however, it can also be used by applica-tions). It is generally used to adjust the RPL of a segment selector that has been passed to the operating system by an application program to match the privilege level of the application program. Here the segment selector passed to the operating system is placed in the destination operand and segment selector for the application program’s code segment is placed in the source operand. (The RPL field in the source operand represents the priv-ilege level of the application program.) Execution of the ARPL instruction then ensures that the RPL of the segment selector received by the operating system is no lower (does not have a higher privilege) than the privilege level of the application program (the segment selector for the application program’s code segment can be read from the stack following a procedure call).\nThis instruction executes as described in compatibility mode and legacy mode. It is not encodable in 64-bit mode.\nSee “Checking Caller Access Privileges” in Chapter 3, “Protected-Mode Memory Management,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information about the use of this instruc-tion.\n" }, { "Name": "BEXTR", "Alias": [], "Brief": "Bit Field Extract", "Description": "\nExtracts contiguous bits from the first source operand (the second operand) using an index value and length value specified in the second source operand (the third operand). Bit 7:0 of the second source operand specifies the starting bit position of bit extraction. A START value exceeding the operand size will not extract any bits from the second source operand. Bit 15:8 of the second source operand specifies the maximum number of bits (LENGTH) beginning at the START position to extract. Only bit positions up to (OperandSize -1) of the first source operand are extracted. The extracted bits are written to the destination register, starting from the least significant bit. All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed. The destination register is cleared if no bits are extracted.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "BLENDPD", "Alias": [], "Brief": "Blend Packed Double Precision Floating", "Description": "\nDouble-precision floating-point values from the second source operand (third operand) are conditionally merged with values from the first source operand (second operand) and written to the destination operand (first operand). The immediate bits [3:0] determine whether the corresponding double-precision floating-point value in the desti-nation is copied from the second source or first source. If a bit in the mask, corresponding to a word, is “1\", then the double-precision floating-point value in the second source operand is copied, else the value in the first source operand is copied.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "BLENDPS", "Alias": [], "Brief": "Blend Packed Single Precision Floating", "Description": "\nPacked single-precision floating-point values from the second source operand (third operand) are conditionally merged with values from the first source operand (second operand) and written to the destination operand (first operand). The immediate bits [7:0] determine whether the corresponding single precision floating-point value in the destination is copied from the second source or first source. If a bit in the mask, corresponding to a word, is “1\", then the single-precision floating-point value in the second source operand is copied, else the value in the first source operand is copied.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "BLENDVPD", "Alias": [], "Brief": "Variable Blend Packed Double Precision Floating", "Description": "\nConditionally copy each quadword data element of double-precision floating-point value from the second source operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits are the most significant bit in each quadword element of the mask register.\nEach quadword element of the destination operand is copied from:\nThe register assignment of the implicit mask operand for BLENDVPD is defined to be the architectural register XMM0.\n128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. The mask register operand is implicitly defined to be the architectural register XMM0. An attempt to execute BLENDVPD with a VEX prefix will cause #UD.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. The upper bits (VLMAX-1:128) of the corresponding YMM register (destination register) are zeroed. VEX.W must be 0, otherwise, the instruction will #UD.\nVEX.256 encoded version: The first source operand and destination operand are YMM registers. The second source operand can be a YMM register or a 256-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. VEX.W must be 0, otherwise, the instruction will #UD.\nVBLENDVPD permits the mask to be any XMM or YMM register. In contrast, BLENDVPD treats XMM0 implicitly as the mask and do not support non-destructive destination operation.\n" }, { "Name": "BLENDVPS", "Alias": [], "Brief": "Variable Blend Packed Single Precision Floating", "Description": "\nConditionally copy each dword data element of single-precision floating-point value from the second source operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits are the most significant bit in each dword element of the mask register.\nEach quadword element of the destination operand is copied from:\nThe register assignment of the implicit mask operand for BLENDVPS is defined to be the architectural register XMM0.\n128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. The mask register operand is implicitly defined to be the architectural register XMM0. An attempt to execute BLENDVPS with a VEX prefix will cause #UD.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. The upper bits (VLMAX-1:128) of the corresponding YMM register (destination register) are zeroed. VEX.W must be 0, otherwise, the instruction will #UD.\nVEX.256 encoded version: The first source operand and destination operand are YMM registers. The second source operand can be a YMM register or a 256-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. VEX.W must be 0, otherwise, the instruction will #UD.\nVBLENDVPS permits the mask to be any XMM or YMM register. In contrast, BLENDVPS treats XMM0 implicitly as the mask and do not support non-destructive destination operation.\n" }, { "Name": "BLSI", "Alias": [], "Brief": "Extract Lowest Set Isolated Bit", "Description": "\nExtracts the lowest set bit from the source operand and set the corresponding bit in the destination register. All other bits in the destination operand are zeroed. If no bits are set in the source operand, BLSI sets all the bits in the destination to 0 and sets ZF and CF.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "BLSMSK", "Alias": [], "Brief": "Get Mask Up to Lowest Set Bit", "Description": "\nSets all the lower bits of the destination operand to “1” up to and including lowest set bit (=1) in the source operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "BLSR", "Alias": [], "Brief": "Reset Lowest Set Bit", "Description": "\nCopies all bits from the source operand to the destination operand and resets (=0) the bit position in the destina-tion operand that corresponds to the lowest set bit of the source operand. If the source operand is zero BLSR sets CF.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "BOUND", "Alias": [], "Brief": "Check Array Index Against Bounds", "Description": "\nBOUND determines if the first operand (array index) is within the bounds of an array specified the second operand (bounds operand). The array index is a signed integer located in a register. The bounds operand is a memory loca-tion that contains a pair of signed doubleword-integers (when the operand-size attribute is 32) or a pair of signed word-integers (when the operand-size attribute is 16). The first doubleword (or word) is the lower bound of the array and the second doubleword (or word) is the upper bound of the array. The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If the index is not within bounds, a BOUND range exceeded exception (#BR) is signaled. When this exception is generated, the saved return instruction pointer points to the BOUND instruction.\nThe bounds limit data structure (two words or doublewords containing the lower and upper limits of the array) is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds.\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "BSF", "Alias": [], "Brief": "Bit Scan Forward", "Description": "\nSearches the source operand (second operand) for the least significant set bit (1 bit). If a least significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content of the source operand is 0, the content of the destination operand is undefined.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "BSR", "Alias": [], "Brief": "Bit Scan Reverse", "Description": "\nSearches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "BSWAP", "Alias": [], "Brief": "Byte Swap", "Description": "\nReverses the byte order of a 32-bit or 64-bit (destination) register. This instruction is provided for converting little-endian values to big-endian format and vice versa. To swap bytes in a word value (16-bit register), use the XCHG instruction. When the BSWAP instruction references a 16-bit register, the result is undefined.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "BT", "Alias": [], "Brief": "Bit Test", "Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset (specified by the second operand) and stores the value of the bit in the CF flag. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. In this case, the low-order 3 or 5 bits (3 for 16-bit oper-ands, 5 for 32-bit operands) of the immediate bit offset are stored in the immediate bit offset field, and the high-order bits are shifted and combined with the byte displacement in the addressing mode by the assembler. The processor will ignore the high order bits if they are not zero.\nWhen accessing a bit in memory, the processor may access 4 bytes starting from the memory address for a 32-bit operand size, using by the following relationship:\nEffective Address + (4 ∗ (BitOffset DIV 32))\nOr, it may access 2 bytes starting from the memory address for a 16-bit operand, using this relationship:\nEffective Address + (2 ∗ (BitOffset DIV 16))\nIt may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit addressing mechanism, software should avoid referencing areas of memory close to address space holes. In partic-ular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instruc-tions to load from or store to these addresses, and use the register form of these instructions to manipulate the data.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit oper-ands. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "BTC", "Alias": [], "Brief": "Bit Test and Complement", "Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and complements the selected bit in the bit string. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. See “BT—Bit Test” in this chapter for more information on this addressing mechanism.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "BTR", "Alias": [], "Brief": "Bit Test and Reset", "Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and clears the selected bit in the bit string to 0. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. See “BT—Bit Test” in this chapter for more information on this addressing mechanism.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "BTS", "Alias": [], "Brief": "Bit Test and Set", "Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. See “BT—Bit Test” in this chapter for more information on this addressing mechanism.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "BZHI", "Alias": [], "Brief": "Zero High Bits Starting with Specified Bit Position", "Description": "\nBZHI copies the bits of the first source operand (the second operand) into the destination operand (the first operand) and clears the higher bits in the destination according to the INDEX value specified by the second source operand (the third operand). The INDEX is specified by bits 7:0 of the second source operand. The INDEX value is saturated at the value of OperandSize -1. CF is set, if the number contained in the 8 low bits of the third operand is greater than OperandSize -1.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "CALL", "Alias": [], "Brief": "Call Procedure", "Description": "\nSaves procedure linking information on the stack and branches to the called procedure specified using the target operand. The target operand specifies the address of the first instruction in the called procedure. The operand can be an immediate value, a general-purpose register, or a memory location.\nThis instruction can be used to execute four types of calls:\nThe latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See “Calling Procedures Using Call and RET” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 7, “Task Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for infor-mation on performing task switches with the CALL instruction.\nNear Call. When executing a near call, the processor pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer). The processor then branches to the address in the current code segment specified by the target operand. The target operand specifies either an absolute offset in the code segment (an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register; this value points to the instruction following the CALL instruction). The CS register is not changed on near calls.\nFor a near call absolute, an absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16, r/m32, or r/m64). The operand-size attribute determines the size of the target operand (16, 32 or 64 bits). When in 64-bit mode, the operand size for near call (and all near branches) is forced to 64-bits. Absolute offsets are loaded directly into the EIP(RIP) register. If the operand size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits. When accessing an absolute offset indirectly using the stack pointer [ESP] as the base register, the base value used is the value of the ESP before the instruction executes.\nA relative offset (rel16 or rel32) is generally specified as a label in assembly code. But at the machine code level, it is encoded as a signed, 16- or 32-bit immediate value. This value is added to the value in the EIP(RIP) register. In 64-bit mode the relative offset is always a 32-bit immediate value which is sign extended to 64-bits before it is added to the value in the RIP register for the target calculation. As with absolute offsets, the operand-size attribute determines the size of the target operand (16, 32, or 64 bits). In 64-bit mode the target operand will always be 64-bits because the operand size is forced to 64-bits for near branches.\nFar Calls in Real-Address or Virtual-8086 Mode. When executing a far call in real- address or virtual-8086 mode, the processor pushes the current value of both the CS and EIP registers on the stack for use as a return-instruction pointer. The processor then performs a “far branch” to the code segment and offset specified with the target operand for the called procedure. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and offset of the called procedure is encoded in the instruction using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared.\nFar Calls in Protected Mode. When the processor is operating in protected mode, the CALL instruction can be used to perform the following types of far calls:\nIn protected mode, the processor always uses the segment selector part of the far address to access the corre-sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access rights determine the type of call operation to be performed.\nIf the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in protected mode is very similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand- size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register; the offset from the instruction is loaded into the EIP register.\nA call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making calls between 16-bit and 32-bit code segments.\nWhen executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a call gate. The segment selector specified by the target operand identifies the call gate. The target\noperand can specify the call gate segment selector either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)\nOn inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a segment at the same privilege level, no stack switch occurs.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack, an optional set of parameters from the calling proce-dures stack, and the segment selector and instruction pointer for the calling procedure’s code segment. (A value in the call gate descriptor determines how many parameters to copy to the new stack.) Finally, the processor branches to the address of the procedure being called within the new code segment.\nExecuting a task switch with the CALL instruction is similar to executing a call through a call gate. The target operand specifies the segment selector of the task gate for the new task activated by the switch (the offset in the target operand is ignored). The task gate in turn points to the TSS for the new task, which contains the segment selectors for the task’s code and stack segments. Note that the TSS also contains the EIP value for the next instruc-tion that was to be executed before the calling task was suspended. This instruction pointer value is loaded into the EIP register to re-start the calling task.\nThe CALL instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the task gate. See Chapter 7, “Task Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on the mechanics of a task switch.\nWhen you execute at task switch with a CALL instruction, the nested task flag (NT) is set in the EFLAGS register and the new TSS’s previous task link field is loaded with the old task’s TSS selector. Code is expected to suspend this nested task by executing an IRET instruction which, because the NT flag is set, automatically uses the previous task link to return to the calling task. (See “Task Linking” in Chapter 7 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on nested tasks.) Switching tasks with the CALL instruc-tion differs in this regard from JMP instruction. JMP does not set the NT flag and therefore does not expect an IRET instruction to suspend the task.\nMixing 16-Bit and 32-Bit Calls. When making far calls between 16-bit and 32-bit code segments, use a call gate. If the far call is from a 32-bit code segment to a 16-bit code segment, the call should be made from the first 64 KBytes of the 32-bit code segment. This is because the operand-size attribute of the instruction is set to 16, so only a 16-bit return address offset can be saved. Also, the call should be made using a 16-bit call gate so that 16-bit values can be pushed on the stack. See Chapter 21, “Mixing 16-Bit and 32-Bit Code,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B, for more information.\nFar Calls in Compatibility Mode. When the processor is operating in compatibility mode, the CALL instruction can be used to perform the following types of far calls:\nNote that a CALL instruction can not be used to cause a task switch in compatibility mode since task switches are not supported in IA-32e mode.\nIn compatibility mode, the processor always uses the segment selector part of the far address to access the corre-sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate) and access rights determine the type of call operation to be performed.\nIf the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in compatibility mode is very similar to one carried out in protected mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register and the offset from the instruction is loaded into the EIP register. The differ-ence is that 64-bit mode may be entered. This specified by the L bit in the new code segment descriptor.\nNote that a 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. However, using this mechanism requires that the target code segment descriptor have the L bit set, causing an entry to 64-bit mode.\nWhen executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target operand can specify the call gate segment selector either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the 16-byte call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)\nOn inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is set to NULL. The new stack pointer is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a segment at the same privilege level, an implicit stack switch occurs as a result of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use a segment base of 0x0, the limit is ignored, and the default stack size is 64-bits. The full value of RSP is used for the offset, of which the upper 32-bits are undefined.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack and the segment selector and instruction pointer for the calling procedure’s code segment. (Parameter copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the procedure being called within the new code segment.\nNear/(Far) Calls in 64-bit Mode. When the processor is operating in 64-bit mode, the CALL instruction can be used to perform the following types of far calls:\nNote that in this mode the CALL instruction can not be used to cause a task switch in 64-bit mode since task switches are not supported in IA-32e mode.\nIn 64-bit mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate) and access rights determine the type of call operation to be performed.\nIf the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in 64-bit mode is very similar to one carried out in compatibility mode. The target operand specifies an absolute far address indirectly with a memory location (m16:16, m16:32 or m16:64). The form of CALL with a direct specification of absolute far address is not defined in 64-bit mode. The operand-size attribute determines the size of the offset (16, 32, or 64 bits) in the far address. The new code segment selector and its descriptor are loaded into the CS register; the offset from the instruction is loaded into the EIP register. The new code segment may specify entry either into compati-bility or 64-bit mode, based on the L bit value.\nA 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. However, using this mechanism requires that the target code segment descriptor have the L bit set.\nWhen executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target operand can only specify the call gate segment selector indirectly with a memory location (m16:16, m16:32 or m16:64). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the 16-byte call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)\nOn inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is set to NULL. The new stack pointer is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch.\nNote that when using a call gate to perform a far call to a segment at the same privilege level, an implicit stack switch occurs as a result of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use a segment base of 0x0, the limit is ignored, and the default stack size is 64-bits. (The full value of RSP is used for the\noffset.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack and the segment selector and instruction pointer for the calling procedure’s code segment. (Parameter copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the procedure being called within the new code segment.\n" }, { "Name": "CBW", "Alias": [ "CWDE", "CDQE" ], "Brief": "Convert Byte to Word/Convert Word to Doubleword/Convert Doubleword to Quadword", "Description": "\nDouble the size of the source operand by means of sign extension. The CBW (convert byte to word) instruction copies the sign (bit 7) in the source operand into every bit in the AH register. The CWDE (convert word to double-word) instruction copies the sign (bit 15) of the word in the AX register into the high 16 bits of the EAX register.\nCBW and CWDE reference the same opcode. The CBW instruction is intended for use when the operand-size attri-bute is 16; CWDE is intended for use when the operand-size attribute is 32. Some assemblers may force the operand size. Others may treat these two mnemonics as synonyms (CBW/CWDE) and use the setting of the operand-size attribute to determine the size of values to be converted.\nIn 64-bit mode, the default operation size is the size of the destination register. Use of the REX.W prefix promotes this instruction (CDQE when promoted) to operate on 64-bit operands. In which case, CDQE copies the sign (bit 31) of the doubleword in the EAX register into the high 32 bits of RAX.\n" }, { "Name": "CWD", "Alias": [ "CDQ", "CQO" ], "Brief": "Convert Word to Doubleword/Convert Doubleword to Quadword", "Description": "\nDoubles the size of the operand in register AX, EAX, or RAX (depending on the operand size) by means of sign extension and stores the result in registers DX:AX, EDX:EAX, or RDX:RAX, respectively. The CWD instruction copies the sign (bit 15) of the value in the AX register into every bit position in the DX register. The CDQ instruction copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register. The CQO instruc-tion (available in 64-bit mode only) copies the sign (bit 63) of the value in the RAX register into every bit position in the RDX register.\nThe CWD instruction can be used to produce a doubleword dividend from a word before word division. The CDQ instruction can be used to produce a quadword dividend from a doubleword before doubleword division. The CQO instruction can be used to produce a double quadword dividend from a quadword before a quadword division.\nThe CWD and CDQ mnemonics reference the same opcode. The CWD instruction is intended for use when the operand-size attribute is 16 and the CDQ instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when CWD is used and to 32 when CDQ is used. Others may treat these mnemonics as synonyms (CWD/CDQ) and use the current setting of the operand-size attribute to determine the size of values to be converted, regardless of the mnemonic used.\nIn 64-bit mode, use of the REX.W prefix promotes operation to 64 bits. The CQO mnemonics reference the same opcode as CWD/CDQ. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "CLAC", "Alias": [], "Brief": "Clear AC Flag in EFLAGS Register", "Description": "\nClears the AC flag bit in EFLAGS register. This disables any alignment checking of user-mode data accesses. If the SMAP bit is set in the CR4 register, this disallows explicit supervisor-mode data accesses to user-mode pages.\nThis instruction's operation is the same in non-64-bit modes and 64-bit mode. Attempts to execute CLAC when CPL > 0 cause #UD.\n" }, { "Name": "CLC", "Alias": [], "Brief": "Clear Carry Flag", "Description": "\nClears the CF flag in the EFLAGS register. Operation is the same in all modes.\n" }, { "Name": "CLD", "Alias": [], "Brief": "Clear Direction Flag", "Description": "\nClears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index regis-ters (ESI and/or EDI). Operation is the same in all modes.\n" }, { "Name": "CLFLUSH", "Alias": [], "Brief": "Flush Cache Line", "Description": "\nInvalidates the cache line that contains the linear address specified with the source operand from all levels of the processor cache hierarchy (data and instruction). The invalidation is broadcast throughout the cache coherence domain. If, at any level of the cache hierarchy, the line is inconsistent with memory (dirty) it is written to memory before invalidation. The source operand is a byte memory location.\nThe availability of CLFLUSH is indicated by the presence of the CPUID feature flag CLFSH (bit 19 of the EDX register, see “CPUID—CPU Identification” in this chapter). The aligned cache line size affected is also indicated with the CPUID instruction (bits 8 through 15 of the EBX register when the initial value in the EAX register is 1).\nThe memory attribute of the page containing the affected line has no effect on the behavior of this instruction. It should be noted that processors are free to speculatively fetch and cache data from system memory regions assigned a memory-type allowing for speculative reads (such as, the WB, WC, and WT memory types). PREFETCHh instructions can be used to provide the processor with hints for this speculative behavior. Because this speculative fetching can occur at any time and is not tied to instruction execution, the CLFLUSH instruction is not ordered with respect to PREFETCHh instructions or any of the speculative fetching mechanisms (that is, data can be specula-tively loaded into a cache line just before, during, or after the execution of a CLFLUSH instruction that references the cache line).\nCLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing or seri-alizing instructions or by another CLFLUSH instruction. For example, software can use an MFENCE instruction to ensure that previous stores are included in the write-back.\nThe CLFLUSH instruction can be used at all privilege levels and is subject to all permission checking and faults asso-ciated with a byte load (and in addition, a CLFLUSH instruction is allowed to flush a linear address in an execute-only segment). Like a load, the CLFLUSH instruction sets the A bit but not the D bit in the page tables.\nThe CLFLUSH instruction was introduced with the SSE2 extensions; however, because it has its own CPUID feature flag, it can be implemented in IA-32 processors that do not include the SSE2 extensions. Also, detecting the pres-ence of the SSE2 extensions with the CPUID instruction does not guarantee that the CLFLUSH instruction is imple-mented in the processor.\nCLFLUSH operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "CLI", "Alias": [], "Brief": "Clear Interrupt Flag", "Description": "\nIf protected-mode virtual interrupts are not enabled, CLI clears the IF flag in the EFLAGS register. No other flags are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the CLI and STI instruction have no affect on the generation of exceptions and NMI interrupts.\nWhen protected-mode virtual interrupts are enabled, CPL is 3, and IOPL is less than 3; CLI clears the VIF flag in the EFLAGS register, leaving IF unaffected. Table 3-6 indicates the action of the CLI instruction depending on the processor operating mode and the CPL/IOPL of the running program or procedure.\nOperation is the same in all modes.\nTable 3-6. Decision Table for CLI Results\n\n\nPE\nVM\nIOPL\nCPL\nPVI\nVIP\nVME\nCLI Result\n\n0\nX\nX\nX\nX\nX\nX\nIF = 0\n\n1\n0\n≥ CPL\nX\nX\nX\nX\nIF = 0\n\n1\n0\n< CPL\n3\n1\nX\nX\nVIF = 0\n\n1\n0\n< CPL\n< 3\nX\nX\nX\nGP Fault\n\n1\n0\n< CPL\nX\n0\nX\nX\nGP Fault\n\n1\n1\n3\nX\nX\nX\nX\nIF = 0\n\n1\n1\n< 3\nX\nX\nX\n1\nVIF = 0\n\n1\n1\n< 3\nX\nX\nX\n0\nGP Fault\nNOTES:\n* X = This setting has no impact.\n" }, { "Name": "CLTS", "Alias": [], "Brief": "Clear Task", "Description": "\nClears the task-switched (TS) flag in the CR0 register. This instruction is intended for use in operating-system procedures. It is a privileged instruction that can only be executed at a CPL of 0. It is allowed to be executed in real-address mode to allow initialization for protected mode.\nThe processor sets the TS flag every time a task switch occurs. The flag is used to synchronize the saving of FPU context in multitasking applications. See the description of the TS flag in the section titled “Control Registers” in Chapter 2 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information about this flag.\nCLTS operation is the same in non-64-bit modes and 64-bit mode.\nSee Chapter 25, “VMX Non-Root Operation,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.\n" }, { "Name": "CMC", "Alias": [], "Brief": "Complement Carry Flag", "Description": "\nComplements the CF flag in the EFLAGS register. CMC operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "CMOVcc", "Alias": [ "CMOVA", "CMOVAE", "CMOVB", "CMOVBE", "CMOVC", "CMOVE", "CMOVG", "CMOVGE", "CMOVL", "CMOVLE", "CMOVNA", "CMOVNAE", "CMOVNB", "CMOVNBE", "CMOVNC", "CMOVNE", "CMOVNG", "CMOVNGE", "CMOVNL", "CMOVNLE", "CMOVNO", "CMOVNP", "CMOVNS", "CMOVNZ", "CMOVO", "CMOVP", "CMOVPE", "CMOVPO", "CMOVS", "CMOVZ" ], "Brief": "Conditional Move", "Description": "\nThe CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and perform a move operation if the flags are in a specified state (or condition). A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.\nThese instructions can move 16-bit, 32-bit or 64-bit values from memory to a general-purpose register or from one general-purpose register to another. Conditional moves of 8-bit register operands are not supported.\nThe condition for each CMOVcc mnemonic is given in the description column of the above table. The terms “less” and “greater” are used for comparisons of signed integers and the terms “above” and “below” are used for unsigned integers.\nBecause a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the CMOVA (conditional move if above) instruction and the CMOVNBE (conditional move if not below or equal) instruction are alternate mnemonics for the opcode 0F 47H.\nThe CMOVcc instructions were introduced in P6 family processors; however, these instructions may not be supported by all IA-32 processors. Software can determine if the CMOVcc instructions are supported by checking the processor’s feature information with the CPUID instruction (see “CPUID—CPU Identification” in this chapter).\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "CMP", "Alias": [], "Brief": "Compare Two Operands", "Description": "\nCompares the first source operand with the second source operand and sets the status flags in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as an operand, it is sign-extended to the length of the first operand.\nThe condition codes used by the Jcc, CMOVcc, and SETcc instructions are based on the results of a CMP instruction. Appendix B, “EFLAGS Condition Codes,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, shows the relationship of the status flags and the condition codes.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "CMPPD", "Alias": [], "Brief": "Compare Packed Double", "Description": "\nPerforms a SIMD compare of the packed double-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed on each of the pairs of packed values. The result of each comparison is a quadword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\n128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 128-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. Two comparisons are performed with results written to bits 127:0 of the destination operand.\nTable 3-7. Comparison Predicate for CMPPD and CMPPS Instructions\n\n\nPredi-cate\nimm8 Encoding\nDescription\nRelation where: A Is 1st Operand B Is 2nd Operand\nEmulation\nResult if NaN Operand\nQNaN Oper-and Signals Invalid\n\nEQ\n000B\nEqual\nA = B\n\nFalse\nNo\n\nLT\n001B\nLess-than\nA < B\n\nFalse\nYes\n\nLE\n010B\nLess-than-or-equal\nA ≤ B\n\nFalse\nYes\n\n\n\nGreater than\nA > B\nSwap Operands, Use LT\nFalse\nYes\n\n\n\nGreater-than-or-equal\nA ≥ B\nSwap Operands, Use LE\nFalse\nYes\n\nUNORD\n011B\nUnordered\nA, B = Unordered\n\nTrue\nNo\n\nNEQ\n100B\nNot-equal\nA ≠ B\n\nTrue\nNo\n\nNLT\n101B\nNot-less-than\nNOT(A < B)\n\nTrue\nYes\nTable 3-7. Comparison Predicate for CMPPD and CMPPS Instructions (Contd.)\n\n\nPredi-cate\nimm8 Encoding\nDescription\nRelation where: A Is 1st Operand B Is 2nd Operand\nEmulation\nResult if NaN Operand\nQNaN Oper-and Signals Invalid\n\nNLE\n110B\nNot-less-than-or-equal\nNOT(A ≤ B)\n\nTrue\nYes\n\n\n\nNot-greater-than\nNOT(A > B)\nSwap Operands, Use NLT\nTrue\nYes\n\n\n\nNot-greater-than-or-equal\nNOT(A ≥ B)\nSwap Operands, Use NLE\nTrue\nYes\n\nORD\n111B\nOrdered\nA , B = Ordered\n\nFalse\nNo\nThe unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN.\nA subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate an exception, because a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN.\nNote that the processors with “CPUID.1H:ECX.AVX =0” do not implement the greater-than, greater-than-or-equal, not-greater-than, and not-greater-than-or-equal relations. These comparisons can be made either by using the inverse relationship (that is, use the “not-less-than-or-equal” to make a “greater-than” comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emula-tion.\nCompilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand CMPPD instruction, for processors with “CPUID.1H:ECX.AVX =0”. See Table 3-8. Compiler should treat reserved Imm8 values as illegal syntax.\nTable 3-8. Pseudo-Op and CMPPD Implementation\n:\n\n\nPseudo-Op\nCMPPD Implementation\n\nCMPEQPD xmm1, xmm2\nCMPPD xmm1, xmm2, 0\n\nCMPLTPD xmm1, xmm2\nCMPPD xmm1, xmm2, 1\n\nCMPLEPD xmm1, xmm2\nCMPPD xmm1, xmm2, 2\n\nCMPUNORDPD xmm1, xmm2\nCMPPD xmm1, xmm2, 3\n\nCMPNEQPD xmm1, xmm2\nCMPPD xmm1, xmm2, 4\n\nCMPNLTPD xmm1, xmm2\nCMPPD xmm1, xmm2, 5\n\nCMPNLEPD xmm1, xmm2\nCMPPD xmm1, xmm2, 6\n\nCMPORDPD xmm1, xmm2\nCMPPD xmm1, xmm2, 7\nThe greater-than relations that the processor does not implement, require more than one instruction to emulate in software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.)\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nEnhanced Comparison Predicate for VEX-Encoded VCMPPD\nVEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source operand (third operand) can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destina-tion YMM register are zeroed. Two comparisons are performed with results written to bits 127:0 of the destination operand.\nVEX.256 encoded version: The first source operand (second operand) is a YMM register. The second source operand (third operand) can be a YMM register or a 256-bit memory location. The destination operand (first operand) is a YMM register. Four comparisons are performed with results written to the destination operand.\nThe comparison predicate operand is an 8-bit immediate:\nTable 3-9. Comparison Predicate for VCMPPD and VCMPPS Instructions\n\n\nPredicate\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nA >B\nA < B\nA = B\nUnordered1\non QNAN\n\nEQ_OQ (EQ)\n0H\nEqual (ordered, non-signaling)\nFalse\nFalse\nTrue\nFalse\nNo\n\nLT_OS (LT)\n1H\nLess-than (ordered, signaling)\nFalse\nTrue\nFalse\nFalse\nYes\n\nLE_OS (LE)\n2H\nLess-than-or-equal (ordered, signaling)\nFalse\nTrue\nTrue\nFalse\nYes\n\nUNORD_Q (UNORD)\n3H\nUnordered (non-signaling)\nFalse\nFalse\nFalse\nTrue\nNo\n\nNEQ_UQ (NEQ)\n4H\nNot-equal (unordered, non-signaling)\nTrue\nTrue\nFalse\nTrue\nNo\n\nNLT_US (NLT)\n5H\nNot-less-than (unordered, signaling)\nTrue\nFalse\nTrue\nTrue\nYes\n\nNLE_US (NLE)\n6H\nNot-less-than-or-equal (unordered, signaling)\nTrue\nFalse\nFalse\nTrue\nYes\n\nORD_Q (ORD)\n7H\nOrdered (non-signaling)\nTrue\nTrue\nTrue\nFalse\nNo\n\nEQ_UQ\n8H\nEqual (unordered, non-signaling)\nFalse\nFalse\nTrue\nTrue\nNo\n\nNGE_US (NGE)\n9H\nNot-greater-than-or-equal (unordered, signaling)\nFalse\nTrue\nFalse\nTrue\nYes\n\nNGT_US (NGT)\nAH\nNot-greater-than (unordered, sig-naling)\nFalse\nTrue\nTrue\nTrue\nYes\n\nFALSE_OQ(FALSE)\nBH\nFalse (ordered, non-signaling)\nFalse\nFalse\nFalse\nFalse\nNo\n\nNEQ_OQ\nCH\nNot-equal (ordered, non-signaling)\nTrue\nTrue\nFalse\nFalse\nNo\n\nGE_OS (GE)\nDH\nGreater-than-or-equal (ordered, sig-naling)\nTrue\nFalse\nTrue\nFalse\nYes\n\nGT_OS (GT)\nEH\nGreater-than (ordered, signaling)\nTrue\nFalse\nFalse\nFalse\nYes\n\nTRUE_UQ(TRUE)\nFH\nTrue (unordered, non-signaling)\nTrue\nTrue\nTrue\nTrue\nNo\n\nEQ_OS\n10H\nEqual (ordered, signaling)\nFalse\nFalse\nTrue\nFalse\nYes\n\nLT_OQ\n11H\nLess-than (ordered, nonsignaling)\nFalse\nTrue\nFalse\nFalse\nNo\n\nLE_OQ\n12H\nLess-than-or-equal (ordered, non-signaling)\nFalse\nTrue\nTrue\nFalse\nNo\n\nUNORD_S\n13H\nUnordered (signaling)\nFalse\nFalse\nFalse\nTrue\nYes\n\nNEQ_US\n14H\nNot-equal (unordered, signaling)\nTrue\nTrue\nFalse\nTrue\nYes\n\nNLT_UQ\n15H\nNot-less-than (unordered, nonsig-naling)\nTrue\nFalse\nTrue\nTrue\nNo\n\nNLE_UQ\n16H\nNot-less-than-or-equal (unordered, nonsignaling)\nTrue\nFalse\nFalse\nTrue\nNo\n\nORD_S\n17H\nOrdered (signaling)\nTrue\nTrue\nTrue\nFalse\nYes\n\nEQ_US\n18H\nEqual (unordered, signaling)\nFalse\nFalse\nTrue\nTrue\nYes\nTable 3-9. Comparison Predicate for VCMPPD and VCMPPS Instructions (Contd.)\n\n\nPredicate\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nA >B\nA < B\nA = B\nUnordered1\non QNAN\n\nNGE_UQ\n19H\nNot-greater-than-or-equal (unor-dered, nonsignaling)\nFalse\nTrue\nFalse\nTrue\nNo\n\nNGT_UQ\n1AH\nNot-greater-than (unordered, non-signaling)\nFalse\nTrue\nTrue\nTrue\nNo\n\nFALSE_OS\n1BH\nFalse (ordered, signaling)\nFalse\nFalse\nFalse\nFalse\nYes\n\nNEQ_OS\n1CH\nNot-equal (ordered, signaling)\nTrue\nTrue\nFalse\nFalse\nYes\n\nGE_OQ\n1DH\nGreater-than-or-equal (ordered, nonsignaling)\nTrue\nFalse\nTrue\nFalse\nNo\n\nGT_OQ\n1EH\nGreater-than (ordered, nonsignal-ing)\nTrue\nFalse\nFalse\nFalse\nNo\n\nTRUE_US\n1FH\nTrue (unordered, signaling)\nTrue\nTrue\nTrue\nTrue\nYes\nNOTES:\n1. If either operand A or B is a NAN.\nProcessors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates shown in Table 3-9, soft-ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand pseudo-ops in addition to the four-operand VCMPPD instruction. See Table 3-10, where the notations of reg1 reg2, and reg3 represent either XMM registers or YMM registers. Compiler should treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter-face.\nTable 3-10. Pseudo-Op and VCMPPD Implementation\n:\n\n\nPseudo-Op\nCMPPD Implementation\n\nVCMPEQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0\n\nVCMPLTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1\n\nVCMPLEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 2\n\nVCMPUNORDPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 3\n\nVCMPNEQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 4\n\nVCMPNLTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 5\n\nVCMPNLEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 6\n\nVCMPORDPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 7\n\nVCMPEQ_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 8\n\nVCMPNGEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 9\n\nVCMPNGTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0AH\n\nVCMPFALSEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0BH\n\nVCMPNEQ_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0CH\n\nVCMPGEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0DH\n\nVCMPGTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0EH\n\nVCMPTRUEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0FH\n\nVCMPEQ_OSPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 10H\n\nVCMPLT_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 11H\n\nVCMPLE_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 12H\nTable 3-10. Pseudo-Op and VCMPPD Implementation\n\n\nPseudo-Op\nCMPPD Implementation\n\nVCMPUNORD_SPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 13H\n\nVCMPNEQ_USPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 14H\n\nVCMPNLT_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 15H\n\nVCMPNLE_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 16H\n\nVCMPORD_SPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 17H\n\nVCMPEQ_USPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 18H\n\nVCMPNGE_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 19H\n\nVCMPNGT_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1AH\n\nVCMPFALSE_OSPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1BH\n\nVCMPNEQ_OSPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1CH\n\nVCMPGE_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1DH\n\nVCMPGT_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1EH\n\nVCMPTRUE_USPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1FH\n" }, { "Name": "CMPPS", "Alias": [], "Brief": "Compare Packed Single", "Description": "\nPerforms a SIMD compare of the packed single-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed on each of the pairs of packed values. The result of each comparison is a doubleword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\n128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 128-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. Four comparisons are performed with results written to bits 127:0 of the destination operand.\nThe unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN.\nA subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault, because a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN.\nNote that processors with “CPUID.1H:ECX.AVX =0” do not implement the “greater-than”, “greater-than-or-equal”, “not-greater than”, and “not-greater-than-or-equal relations” predicates. These comparisons can be made either by using the inverse relationship (that is, use the “not-less-than-or-equal” to make a “greater-than” comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emulation.\nCompilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand CMPPS instruction, for processors with “CPUID.1H:ECX.AVX =0”. See Table 3-11. Compiler should treat reserved Imm8 values as illegal syntax.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nTable 3-11. Pseudo-Ops and CMPPS\n\n\nPseudo-Op\nImplementation\n\nCMPEQPS xmm1, xmm2\nCMPPS xmm1, xmm2, 0\n\nCMPLTPS xmm1, xmm2\nCMPPS xmm1, xmm2, 1\n\nCMPLEPS xmm1, xmm2\nCMPPS xmm1, xmm2, 2\n\nCMPUNORDPS xmm1, xmm2\nCMPPS xmm1, xmm2, 3\n\nCMPNEQPS xmm1, xmm2\nCMPPS xmm1, xmm2, 4\n\nCMPNLTPS xmm1, xmm2\nCMPPS xmm1, xmm2, 5\n\nCMPNLEPS xmm1, xmm2\nCMPPS xmm1, xmm2, 6\n\nCMPORDPS xmm1, xmm2\nCMPPS xmm1, xmm2, 7\nThe greater-than relations not implemented by processor require more than one instruction to emulate in software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.)\nEnhanced Comparison Predicate for VEX-Encoded VCMPPS\nVEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source operand (third operand) can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destina-tion YMM register are zeroed. Four comparisons are performed with results written to bits 127:0 of the destination operand.\nVEX.256 encoded version: The first source operand (second operand) is a YMM register. The second source operand (third operand) can be a YMM register or a 256-bit memory location. The destination operand (first operand) is a YMM register. Eight comparisons are performed with results written to the destination operand.\nThe comparison predicate operand is an 8-bit immediate:\nProcessors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates shown in Table 3-9, soft-ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand pseudo-ops in addition to the four-operand VCMPPS instruction. See Table 3-12, where the notation of reg1 and reg2 represent either XMM registers or YMM registers. Compiler should treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter-face.\nTable 3-12. Pseudo-Op and VCMPPS Implementation\n:\n\n\nPseudo-Op\nCMPPS Implementation\n\nVCMPEQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0\n\nVCMPLTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1\n\nVCMPLEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 2\n\nVCMPUNORDPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 3\n\nVCMPNEQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 4\n\nVCMPNLTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 5\n\nVCMPNLEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 6\n\nVCMPORDPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 7\n\nVCMPEQ_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 8\n\nVCMPNGEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 9\n\nVCMPNGTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0AH\n\nVCMPFALSEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0BH\nTable 3-12. Pseudo-Op and VCMPPS Implementation\n\n\nPseudo-Op\nCMPPS Implementation\n\nVCMPNEQ_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0CH\n\nVCMPGEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0DH\n\nVCMPGTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0EH\n\nVCMPTRUEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0FH\n\nVCMPEQ_OSPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 10H\n\nVCMPLT_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 11H\n\nVCMPLE_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 12H\n\nVCMPUNORD_SPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 13H\n\nVCMPNEQ_USPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 14H\n\nVCMPNLT_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 15H\n\nVCMPNLE_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 16H\n\nVCMPORD_SPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 17H\n\nVCMPEQ_USPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 18H\n\nVCMPNGE_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 19H\n\nVCMPNGT_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1AH\n\nVCMPFALSE_OSPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1BH\n\nVCMPNEQ_OSPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1CH\n\nVCMPGE_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1DH\n\nVCMPGT_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1EH\n\nVCMPTRUE_USPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1FH\n" }, { "Name": "CMPS", "Alias": [ "CMPSB", "CMPSW", "CMPSD", "CMPSQ" ], "Brief": "Compare String Operands", "Description": "\nCompares the byte, word, doubleword, or quadword specified with the first source operand with the byte, word, doubleword, or quadword specified with the second source operand and sets the status flags in the EFLAGS register according to the results.\nBoth source operands are located in memory. The address of the first source operand is read from DS:SI, DS:ESI or RSI (depending on the address-size attribute of the instruction is 16, 32, or 64, respectively). The address of the second source operand is read from ES:DI, ES:EDI or RDI (again depending on the address-size attribute of the\ninstruction is 16, 32, or 64). The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden.\nAt the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the CMPS mnemonic) allows the two source operands to be specified explicitly. Here, the source operands should be symbols that indicate the size and location of the source values. This explicit-operand form is provided to allow documentation. However, note that the documenta-tion provided by this form can be misleading. That is, the source operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords, quadwords), but they do not have to specify the correct loca-tion. Locations of the source operands are always specified by the DS:(E)SI (or RSI) and ES:(E)DI (or RDI) regis-ters, which must be loaded correctly before the compare string instruction is executed.\nThe no-operands form provides “short forms” of the byte, word, and doubleword versions of the CMPS instructions. Here also the DS:(E)SI (or RSI) and ES:(E)DI (or RDI) registers are assumed by the processor to specify the loca-tion of the source operands. The size of the source operands is selected with the mnemonic: CMPSB (byte compar-ison), CMPSW (word comparison), CMPSD (doubleword comparison), or CMPSQ (quadword comparison using REX.W).\nAfter the comparison, the (E/R)SI and (E/R)DI registers increment or decrement automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E/R)SI and (E/R)DI register increment; if the DF flag is 1, the registers decrement.) The registers increment or decrement by 1 for byte operations, by 2 for word operations, 4 for doubleword operations. If operand size is 64, RSI and RDI registers increment by 8 for quadword operations.\nThe CMPS, CMPSB, CMPSW, CMPSD, and CMPSQ instructions can be preceded by the REP prefix for block compar-isons. More often, however, these instructions will be used in a LOOP construct that takes some action based on the setting of the status flags before the next comparison is made. See “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, for a description of the REP prefix.\nIn 64-bit mode, the instruction’s default address size is 64 bits, 32 bit address size is supported using the prefix 67H. Use of the REX.W prefix promotes doubleword operation to 64 bits (see CMPSQ). See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "CMPSS", "Alias": [], "Brief": "Compare Scalar Single", "Description": "\nCompares the low single-precision floating-point values in the source operand (second operand) and the destina-tion operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed. The comparison result is a double-word mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\n128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location. The comparison pred-icate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nThe unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN\nA subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault, since a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN.\nNote that processors with “CPUID.1H:ECX.AVX =0” do not implement the “greater-than”, “greater-than-or-equal”, “not-greater than”, and “not-greater-than-or-equal relations” predicates. These comparisons can be made either by using the inverse relationship (that is, use the “not-less-than-or-equal” to make a “greater-than” comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination operand), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emulation.\nCompilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand CMPSS instruction, for processors with “CPUID.1H:ECX.AVX =0”. See Table 3-15. Compiler should treat reserved Imm8 values as illegal syntax.\nTable 3-15. Pseudo-Ops and CMPSS\n\n\nPseudo-Op\nCMPSS Implementation\n\nCMPEQSS xmm1, xmm2\nCMPSS xmm1, xmm2, 0\n\nCMPLTSS xmm1, xmm2\nCMPSS xmm1, xmm2, 1\n\nCMPLESS xmm1, xmm2\nCMPSS xmm1, xmm2, 2\n\nCMPUNORDSS xmm1, xmm2\nCMPSS xmm1, xmm2, 3\n\nCMPNEQSS xmm1, xmm2\nCMPSS xmm1, xmm2, 4\n\nCMPNLTSS xmm1, xmm2\nCMPSS xmm1, xmm2, 5\n\nCMPNLESS xmm1, xmm2\nCMPSS xmm1, xmm2, 6\n\nCMPORDSS xmm1, xmm2\nCMPSS xmm1, xmm2, 7\nThe greater-than relations not implemented in the processor require more than one instruction to emulate in soft-ware and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.)\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nEnhanced Comparison Predicate for VEX-Encoded VCMPSD\nVEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source operand (third operand) can be an XMM register or a 32-bit memory location. Bits (VLMAX-1:128) of the destina-tion YMM register are zeroed. The comparison predicate operand is an 8-bit immediate:\nProcessors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates shown in Table 3-9, soft-ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand pseudo-ops in addition to the four-operand VCMPSS instruction. See Table 3-16, where the notations of reg1 reg2, and reg3 represent either XMM registers or YMM registers. Compiler should treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter-face.\nTable 3-16. Pseudo-Op and VCMPSS Implementation\n:\n\n\nPseudo-Op\nCMPSS Implementation\n\nVCMPEQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0\n\nVCMPLTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1\n\nVCMPLESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 2\n\nVCMPUNORDSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 3\n\nVCMPNEQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 4\n\nVCMPNLTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 5\n\nVCMPNLESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 6\n\nVCMPORDSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 7\n\nVCMPEQ_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 8\n\nVCMPNGESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 9\n\nVCMPNGTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0AH\n\nVCMPFALSESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0BH\n\nVCMPNEQ_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0CH\n\nVCMPGESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0DH\n\nVCMPGTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0EH\nTable 3-16. Pseudo-Op and VCMPSS Implementation (Contd.)\n\n\nPseudo-Op\nCMPSS Implementation\n\nVCMPTRUESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0FH\n\nVCMPEQ_OSSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 10H\n\nVCMPLT_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 11H\n\nVCMPLE_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 12H\n\nVCMPUNORD_SSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 13H\n\nVCMPNEQ_USSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 14H\n\nVCMPNLT_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 15H\n\nVCMPNLE_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 16H\n\nVCMPORD_SSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 17H\n\nVCMPEQ_USSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 18H\n\nVCMPNGE_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 19H\n\nVCMPNGT_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1AH\n\nVCMPFALSE_OSSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1BH\n\nVCMPNEQ_OSSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1CH\n\nVCMPGE_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1DH\n\nVCMPGT_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1EH\n\nVCMPTRUE_USSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1FH\n" }, { "Name": "CMPXCHG", "Alias": [], "Brief": "Compare and Exchange", "Description": "\nCompares the value in the AL, AX, EAX, or RAX register with the first operand (destination operand). If the two values are equal, the second operand (source operand) is loaded into the destination operand. Otherwise, the destination operand is loaded into the AL, AX, EAX or RAX register. RAX register is available only in 64-bit mode.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "CMPXCHG8B", "Alias": [ "CMPXCHG16B" ], "Brief": "Compare and Exchange Bytes", "Description": "\nCompares the 64-bit value in EDX:EAX (or 128-bit value in RDX:RAX if operand size is 128 bits) with the operand (destination operand). If the values are equal, the 64-bit value in ECX:EBX (or 128-bit value in RCX:RBX) is stored in the destination operand. Otherwise, the value in the destination operand is loaded into EDX:EAX (or RDX:RAX). The destination operand is an 8-byte memory location (or 16-byte memory location if operand size is 128 bits). For the EDX:EAX and ECX:EBX register pairs, EDX and ECX contain the high-order 32 bits and EAX and EBX contain the low-order 32 bits of a 64-bit value. For the RDX:RAX and RCX:RBX register pairs, RDX and RCX contain the high-order 64 bits and RAX and RBX contain the low-order 64bits of a 128-bit value.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)\nIn 64-bit mode, default operation size is 64 bits. Use of the REX.W prefix promotes operation to 128 bits. Note that CMPXCHG16B requires that the destination (memory) operand be 16-byte aligned. See the summary chart at the beginning of this section for encoding data and limits. For information on the CPUID flag that indicates CMPXCHG16B, see page 3-175.\n" }, { "Name": "COMISD", "Alias": [], "Brief": "Compare Scalar Ordered Double", "Description": "\nCompares the double-precision floating-point values in the low quadwords of operand 1 (first operand) and operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unor-dered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unor-dered result is returned if either source operand is a NaN (QNaN or SNaN).The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\nOperand 1 is an XMM register; operand 2 can be an XMM register or a 64 bit memory location.\nThe COMISD instruction differs from the UCOMISD instruction in that it signals a SIMD floating-point invalid oper-ation exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISD instruction signals an invalid numeric exception only if a source operand is an SNaN.\nThe EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "COMISS", "Alias": [], "Brief": "Compare Scalar Ordered Single", "Description": "\nCompares the single-precision floating-point values in the low doublewords of operand 1 (first operand) and operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unor-dered, greater than, less than, or equal). The OF, SF, and AF flags in the EFLAGS register are set to 0. The unor-dered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\nOperand 1 is an XMM register; Operand 2 can be an XMM register or a 32 bit memory location.\nThe COMISS instruction differs from the UCOMISS instruction in that it signals a SIMD floating-point invalid opera-tion exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISS instruction signals an invalid numeric exception only if a source operand is an SNaN.\nThe EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "CPUID", "Alias": [], "Brief": "CPU Identification", "Description": "\nThe ID flag (bit 21) in the EFLAGS register indicates support for the CPUID instruction. If a software procedure can set and clear this flag, the processor executing the procedure supports the CPUID instruction. This instruction oper-ates the same in non-64-bit modes and 64-bit mode.\nCPUID returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers.1 The instruction’s output is dependent on the contents of the EAX register upon execution (in some cases, ECX as well). For example, the following pseudocode loads EAX with 00H and causes CPUID to return a Maximum Return Value and the Vendor Identification String in the appropriate registers:\nMOV EAX, 00H\nCPUID\nTable 3-17 shows information returned, depending on the initial value loaded into the EAX register. Table 3-18 shows the maximum CPUID input value recognized for each family of IA-32 processors on which CPUID is imple-mented.\nTwo types of information are returned: basic and extended function information. If a value entered for CPUID.EAX is higher than the maximum input value for basic or extended function for that processor then the data for the highest basic information leaf is returned. For example, using the Intel Core i7 processor, the following is true:\nCPUID.EAX = 05H (* Returns MONITOR/MWAIT leaf. *)\nCPUID.EAX = 0AH (* Returns Architectural Performance Monitoring leaf. *)\nCPUID.EAX = 0BH (* Returns Extended Topology Enumeration leaf. *)\nCPUID.EAX = 0CH (* INVALID: Returns the same information as CPUID.EAX = 0BH. *)\nCPUID.EAX = 80000008H (* Returns linear/physical address size data. *)\nCPUID.EAX = 8000000AH (* INVALID: Returns same information as CPUID.EAX = 0BH. *)\nIf a value entered for CPUID.EAX is less than or equal to the maximum input value and the leaf is not supported on that processor then 0 is returned in all the registers. For example, using the Intel Core i7 processor, the following is true:\nCPUID.EAX = 07H (*Returns EAX=EBX=ECX=EDX=0. *)\nWhen CPUID returns the highest basic leaf information as a result of an invalid input EAX value, any dependence on input ECX value in the basic leaf is honored.\nCPUID can be executed at any privilege level to serialize instruction execution. Serializing instruction execution guarantees that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed.\nSee also:\n“Serializing Instructions” in Chapter 8, “Multiple-Processor Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\n1.\nOn Intel 64 processors, CPUID clears the high 32 bits of the RAX/RBX/RCX/RDX registers in all modes.\n“Caching Translation Information” in Chapter 4, “Paging,” in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 3A.\nTable 3-17. Information Returned by CPUID Instruction\nInitial EAX\nValue\nInformation Provided about the Processor\nBasic CPUID Information\n0H\nEAX\nMaximum Input Value for Basic CPUID Information (see Table 3-18)\nEBX\n“Genu”\nECX\n“ntel”\nEDX\n“ineI”\n01H\nEAX\nVersion Information: Type, Family, Model, and Stepping ID (see Figure 3-5)\nEBX\nBits 07-00: Brand Index Bits 15-08: CLFLUSH line size (Value ∗ 8 = cache line size in bytes) Bits 23-16: Maximum number of addressable IDs for logical processors in this physical package*. Bits 31-24: Initial APIC ID\nECX\nFeature Information (see Figure 3-6 and Table 3-20)\nEDX\nFeature Information (see Figure 3-7 and Table 3-21)\nNOTES:\n*\nThe nearest power-of-2 integer that is not smaller than EBX[23:16] is the number of unique initial APIC IDs reserved for addressing different logical processors in a physical package. This field is only valid if CPUID.1.EDX.HTT[bit 28]= 1.\n02H\nEAX\nCache and TLB Information (see Table 3-22)\nEBX\nCache and TLB Information\nECX\nCache and TLB Information\nEDX\nCache and TLB Information\n03H\nEAX\nReserved.\nEBX\nReserved.\nECX\nBits 00-31 of 96 bit processor serial number. (Available in Pentium III processor only; otherwise, the value in this register is reserved.)\nEDX\nBits 32-63 of 96 bit processor serial number. (Available in Pentium III processor only; otherwise, the value in this register is reserved.)\nNOTES:\nProcessor serial number (PSN) is not supported in the Pentium 4 processor or later. On all models, use the PSN flag (returned using CPUID) to check for PSN support before accessing the feature.\nSee AP-485, Intel Processor Identification and the CPUID Instruction (Order Number 241618) for more information on PSN.\nCPUID leaves > 3 < 80000000 are visible only when IA32_MISC_ENABLE.BOOT_NT4[bit 22] = 0 (default).\nDeterministic Cache Parameters Leaf\nNOTES:\n04H\nLeaf 04H output depends on the initial value in ECX.*\nSee also: “INPUT EAX = 4: Returns Deterministic Cache Parameters for each level on page 3-182.\nEAX\nBits 04-00: Cache Type Field\n0 = Null - No more caches 1 = Data Cache 2 = Instruction Cache 3 = Unified Cache 4-31 = Reserved\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nBits 07-05: Cache Level (starts at 1) Bit 08: Self Initializing cache level (does not need SW initialization) Bit 09: Fully Associative cache\nBits 13-10: Reserved Bits 25-14: Maximum number of addressable IDs for logical processors sharing this cache**,\n***\nBits 31-26: Maximum number of addressable IDs for processor cores in the physical package**,\n****,\n*****\nEBX\nBits 11-00: L = System Coherency Line Size** Bits 21-12: P = Physical Line partitions** Bits 31-22: W = Ways of associativity**\nECX\nBits 31-00: S = Number of Sets**\nEDX\nBit 0: Write-Back Invalidate/Invalidate\n0 = WBINVD/INVD from threads sharing this cache acts upon lower level caches for threads sharing this cache. 1 = WBINVD/INVD is not guaranteed to act upon lower level caches of non-originating threads sharing this cache.\nBit 1: Cache Inclusiveness\n0 = Cache is not inclusive of lower cache levels. 1 = Cache is inclusive of lower cache levels.\nBit 2: Complex Cache Indexing\n0 = Direct mapped cache. 1 = A complex function is used to index the cache, potentially using all address bits.\nBits 31-03: Reserved = 0\n\n\n\n\n\n\nNOTES:\n* If ECX contains an invalid sub leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf index n+1 is invalid if sub-\nleaf n returns EAX[4:0] as 0.\n** Add one to the return value to get the result.\n***The nearest power-of-2 integer that is not smaller than (1 + EAX[25:14]) is the number of unique ini-\ntial APIC IDs reserved for addressing different logical processors sharing this cache\n**** The nearest power-of-2 integer that is not smaller than (1 + EAX[31:26]) is the number of unique\nCore_IDs reserved for addressing different processor cores in a physical package. Core ID is a subset of bits of the initial APIC ID.\n***** The returned value is constant for valid initial values in ECX. Valid ECX values start from 0.\nMONITOR/MWAIT Leaf\n05H\nEAX\nBits 15-00: Smallest monitor-line size in bytes (default is processor's monitor granularity) Bits 31-16: Reserved = 0\nEBX\nBits 15-00: Largest monitor-line size in bytes (default is processor's monitor granularity) Bits 31-16: Reserved = 0\nECX\nBit 00: Enumeration of Monitor-Mwait extensions (beyond EAX and EBX registers) supported\nBit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled\nBits 31 - 02: Reserved\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEDX\nBits 03 - 00: Number of C0* sub C-states supported using MWAIT Bits 07 - 04: Number of C1* sub C-states supported using MWAIT Bits 11 - 08: Number of C2* sub C-states supported using MWAIT Bits 15 - 12: Number of C3* sub C-states supported using MWAIT Bits 19 - 16: Number of C4* sub C-states supported using MWAIT Bits 23 - 20: Number of C5* sub C-states supported using MWAIT Bits 27 - 24: Number of C6* sub C-states supported using MWAIT Bits 31 - 28: Number of C7* sub C-states supported using MWAIT\nNOTE:\n* The definition of C0 through C7 states for MWAIT extension are processor-specific C-states, not ACPI C-\nstates.\n\n\nThermal and Power Management Leaf\n\n06H\n\nEAX\nBit 00: Digital temperature sensor is supported if set Bit 01: Intel Turbo Boost Technology Available (see description of IA32_MISC_ENABLE[38]). Bit 02: ARAT. APIC-Timer-always-running feature is supported if set. Bit 03: Reserved Bit 04: PLN. Power limit notification controls are supported if set. Bit 05: ECMD. Clock modulation duty cycle extension is supported if set. Bit 06: PTM. Package thermal management is supported if set. Bit 07: HWP. HWP base registers (IA32_PM_ENALBE[bit 0], IA32_HWP_CAPABILITIES, IA32_HWP_REQUEST, IA32_HWP_STATUS) are supported if set. Bit 08: HWP_Notification. IA32_HWP_INTERRUPT MSR is supported if set. Bit 09: HWP_Activity_Window. IA32_HWP_REQUEST[bits 41:32] is supported if set. Bit 10: HWP_Energy_Performance_Preference. IA32_HWP_REQUEST[bits 31:24] is supported if set. Bit 11: HWP_Package_Level_Request. IA32_HWP_REQUEST_PKG MSR is supported if set. Bit 12: Reserved. Bit 13: HDC. HDC base registers IA32_PKG_HDC_CTL, IA32_PM_CTL1, IA32_THREAD_STALL MSRs are supported if set. Bits 31 - 15: Reserved\nEBX\nBits 03 - 00: Number of Interrupt Thresholds in Digital Thermal Sensor Bits 31 - 04: Reserved\nECX\nBit 00: Hardware Coordination Feedback Capability (Presence of IA32_MPERF and IA32_APERF). The capability to provide a measure of delivered processor performance (since last reset of the counters), as a percentage of the expected processor performance when running at the TSC frequency. Bits 02 - 01: Reserved = 0 Bit 03: The processor supports performance-energy bias preference if CPUID.06H:ECX.SETBH[bit 3] is set and it also implies the presence of a new architectural MSR called IA32_ENERGY_PERF_BIAS (1B0H). Bits 31 - 04: Reserved = 0\nEDX\nReserved = 0\n\n\nStructured Extended Feature Flags Enumeration Leaf (Output depends on ECX input value)\n\n07H\n\nSub-leaf 0 (Input ECX = 0). *\nEAX\nBits 31-00: Reports the maximum input value for supported leaf 7 sub-leaves.\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEBX\nBit 00: FSGSBASE. Supports RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE if 1. Bit 01: IA32_TSC_ADJUST MSR is supported if 1. Bit 02: Reserved Bit 03: BMI1 Bit 04: HLE Bit 05: AVX2 Bit 06: Reserved Bit 07: SMEP. Supports Supervisor-Mode Execution Prevention if 1. Bit 08: BMI2 Bit 09: Supports Enhanced REP MOVSB/STOSB if 1. Bit 10: INVPCID. If 1, supports INVPCID instruction for system software that manages process-context identifiers. Bit 11: RTM Bit 12: Supports Platform Quality of Service Monitoring (PQM) capability if 1. Bit 13: Deprecates FPU CS and FPU DS values if 1. Bit 14: Reserved. Bit 15: Supports Platform Quality of Service Enforcement (PQE) capability if 1. Bits 17:16: Reserved Bit 18: RDSEED Bit 19: ADX Bit 20: SMAP Bits 31:21: Reserved\nECX\nBit 00: PREFETCHWT1 Bit 31-01: Reserved\nEDX\nReserved\nNOTE:\n* If ECX contains an invalid sub-leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf index n is invalid if n\nexceeds the value that sub-leaf 0 returns in EAX.\n\n\nDirect Cache Access Information Leaf\n\n09H\n\nValue of bits [31:0] of IA32_PLATFORM_DCA_CAP MSR (address 1F8H)\nEAX\nReserved\nEBX\nReserved\nECX\nReserved\nEDX\n\n\nArchitectural Performance Monitoring Leaf\n\n0AH\n\nEAX\nBits 07 - 00: Version ID of architectural performance monitoring Bits 15- 08: Number of general-purpose performance monitoring counter per logical processor Bits 23 - 16: Bit width of general-purpose, performance monitoring counter Bits 31 - 24: Length of EBX bit vector to enumerate architectural performance monitoring events\nEBX\nBit 00: Core cycle event not available if 1 Bit 01: Instruction retired event not available if 1 Bit 02: Reference cycles event not available if 1 Bit 03: Last-level cache reference event not available if 1 Bit 04: Last-level cache misses event not available if 1 Bit 05: Branch instruction retired event not available if 1 Bit 06: Branch mispredict retired event not available if 1 Bits 31- 07: Reserved = 0\nECX\nReserved = 0\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEDX\nBits 04 - 00: Number of fixed-function performance counters (if Version ID > 1) Bits 12- 05: Bit width of fixed-function performance counters (if Version ID > 1) Reserved = 0\n\n\nExtended Topology Enumeration Leaf\n\n\n\n\n\n\n\n\n\nNOTES:\n0BH\nMost of Leaf 0BH output depends on the initial value in ECX.\nThe EDX output of leaf 0BH is always valid and does not vary with input value in ECX.\nOutput value in ECX[7:0] always equals input value in ECX[7:0].\nFor sub-leaves that return an invalid level-type of 0 in ECX[15:8]; EAX and EBX will return 0.\n If an input value n in ECX returns the invalid level-type of 0 in ECX[15:8], other input values with ECX >\nn also return 0 in ECX[15:8].\nEAX\nBits 04-00: Number of bits to shift right on x2APIC ID to get a unique topology ID of the next level type*. All logical processors with the same next level ID share current level. Bits 31-05: Reserved.\nEBX\nBits 15 - 00: Number of logical processors at this level type. The number reflects configuration as shipped by Intel**. Bits 31- 16: Reserved.\nECX\nBits 07 - 00: Level number. Same value in ECX input Bits 15 - 08: Level type***. Bits 31 - 16:: Reserved.\nEDX\nBits 31- 00: x2APIC ID the current logical processor.\nNOTES: * Software should use this field (EAX[4:0]) to enumerate processor topology of the system.\n** Software must not use EBX[15:0] to enumerate processor topology of the system. This value in this field (EBX[15:0]) is only intended for display/diagnostic purposes. The actual number of logical processors available to BIOS/OS/Applications may be different from the value of EBX[15:0], depending on software and platform hardware configurations.\n*** The value of the “level type” field is not related to level numbers in any way, higher “level type” val-ues do not mean higher levels. Level type field has the following encoding: 0 : invalid 1 : SMT 2 : Core 3-255 : Reserved\nProcessor Extended State Enumeration Main Leaf (EAX = 0DH, ECX = 0)\nNOTES:\n0DH\nLeaf 0DH main leaf (ECX = 0).\nEAX\nBits 31-00: Reports the valid bit fields of the lower 32 bits of XCR0. If a bit is 0, the corresponding bit field in XCR0 is reserved. Bit 00: legacy x87 Bit 01: 128-bit SSE Bit 02: 256-bit AVX Bits 31- 03: Reserved\nEBX\nBits 31-00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) required by enabled features in XCR0. May be different than ECX if some features at the end of the XSAVE save area are not enabled.\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nECX\nBit 31-00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required by all supported features in the processor, i.e all the valid bit fields in XCR0.\nEDX\nBit 31-00: Reports the valid bit fields of the upper 32 bits of XCR0. If a bit is 0, the corresponding bit field in XCR0 is reserved.\n\n\nProcessor Extended State Enumeration Sub-leaf (EAX = 0DH, ECX = 1)\n\n0DH\n\nEAX\nBits 31-04: Reserved\nBit 00: XSAVEOPT is available\nBit 01: Supports XSAVEC and the compacted form of XRSTOR if set\nBit 02: Supports XGETBV with ECX = 1 if set\nBit 03: Supports XSAVES/XRSTORS and IA32_XSS if set\nEBX\nBits 31-00: The size in bytes of the XSAVE area containing all states enabled by XCRO | IA32_XSS.\nECX\nBits 31-00: Reports the valid bit fields of the lower 32 bits of IA32_XSS. If a bit is 0, the corresponding bit field in IA32_XSS is reserved.\nBits 07-00: Reserved\nBit 08: IA32_XSS[bit 8] is supported if 1\nBits 31-09: Reserved\nEDX\nBits 31-00: Reports the valid bit fields of the upper 32 bits of IA32_XSS. If a bit is 0, the corresponding bit field in IA32_XSS is reserved.\nBits 31-00: Reserved\n\n\nProcessor Extended State Enumeration Sub-leaves (EAX = 0DH, ECX = n, n > 1)\n\n\n\n\n\n\n\n\n\nNOTES:\n0DH\nLeaf 0DH output depends on the initial value in ECX.\nEach valid sub-leaf index maps to a valid bit in either the XCR0 register or the IA32_XSS MSR starting at bit position 2.\n* If ECX contains an invalid sub-leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf n (0 ≤ n ≤ 31) is invalid\nif sub-leaf 0 returns 0 in EAX[n] and sub-leaf 1 returns 0 in ECX[n]. Sub-leaf n (32 ≤ n ≤ 63) is invalid if sub-leaf 0 returns 0 in EDX[n-32] and sub-leaf 1 returns 0 in EDX[n-32].\nEAX\nBits 31-0: The size in bytes (from the offset specified in EBX) of the save area for an extended state fea-ture associated with a valid sub-leaf index, n.\nEBX\nBits 31-0: The offset in bytes of this extended state component’s save area from the beginning of the XSAVE/XRSTOR area. This field reports 0 if the sub-leaf index, n, does not map to a valid bit in the XCR0 register*.\nECX\nBit 0 is set if the sub-leaf index, n, maps to a valid bit in the IA32_XSS MSR and bit 0 is clear if n maps to a valid bit in XCR0. Bits 31-1 are reserved. This field reports 0 if the sub-leaf index, n, is invalid*.\nEDX\nThis field reports 0 if the sub-leaf index, n, is invalid*; otherwise it is reserved.\nPlatform QoS Monitoring Enumeration Sub-leaf (EAX = 0FH, ECX = 0)\nNOTES:\n0FH\nLeaf 0FH output depends on the initial value in ECX.\nSub-leaf index 0 reports valid resource type starting at bit position 1 of EDX\nEAX\nReserved.\nEBX\nBits 31-0: Maximum range (zero-based) of RMID within this physical processor of all types.\nECX\nReserved.\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEDX\nBit 00: Reserved. Bit 01: Supports L3 Cache QoS Monitoring if 1. Bits 31:02: Reserved\n\n\nL3 Cache QoS Monitoring Capability Enumeration Sub-leaf (EAX = 0FH, ECX = 1)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNOTES:\n0FH\nLeaf 0FH output depends on the initial value in ECX.\nEAX\nReserved.\nEBX\nBits 31-0: Conversion factor from reported IA32_QM_CTR value to occupancy metric (bytes).\nECX\nMaximum range (zero-based) of RMID of this resource type.\nEDX\nBit 00: Supports L3 occupancy monitoring if 1. Bits 31:01: Reserved\nPlatform QoS Enforcement Enumeration Sub-leaf (EAX = 10H, ECX = 0)\nNOTES:\n10H\nLeaf 10H output depends on the initial value in ECX.\nSub-leaf index 0 reports valid resource identification (ResID) starting at bit position 1 of EDX\nEAX\nReserved.\nEBX\nBit 00: Reserved. Bit 01: Supports L3 Cache QoS Enforcement if 1. Bits 31:02: Reserved\nECX\nReserved.\nEDX\nReserved.\nL3 Cache QoS Enforcement Enumeration Sub-leaf (EAX = 10H, ECX = ResID =1)\nNOTES:\n10H\nLeaf 10H output depends on the initial value in ECX.\nEAX\nBits 4:0: Length of the capacity bit mask for the corresponding ResID. Bits 31:05: Reserved\nEBX\nBits 31-0: Bit-granular map of isolation/contention of allocation units.\nECX\nBit 00: Reserved. Bit 01: Updates of COS should be infrequent if 1. Bits 31:02: Reserved\nEDX\nBits 15:0: Highest COS number supported for this ResID. Bits 31:16: Reserved\nIntel Processor Trace Enumeration Main Leaf (EAX = 14H, ECX = 0)\nNOTES:\n14H\nLeaf 14H main leaf (ECX = 0).\nEAX\nBits 31-0: Reports the maximum number sub-leaves that are supported in leaf 14H.\nEBX\nBit 00: If 1, Indicates that IA32_RTIT_CTL.CR3Filter can be set to 1, and that IA32_RTIT_CR3_MATCH MSR can be accessed. Bits 31- 01: Reserved\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nECX\nBit 00: If 1, Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1, hence utilizing the ToPA output scheme; IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be accessed. Bit 01: If 1, ToPA tables can hold any number of output entries, up to the maximum allowed by the Mas-kOrTableOffset field of IA32_RTIT_OUTPUT_MASK_PTRS. Bit 30:02: Reserved Bit 31: If 1, Generated packets which contain IP payloads have LIP values, which include the CS base com-ponent.\nEDX\nBits 31- 00: Reserved\n\n\nUnimplemented CPUID Leaf Functions\n\n\n40000000H\n-\n4FFFFFFFH\nInvalid. No existing or future CPU will return processor identification or feature information if the initial EAX value is in the range 40000000H to 4FFFFFFFH.\n\n\nExtended Function CPUID Information\n\n80000000H\n\nEAX\nMaximum Input Value for Extended Function CPUID Information (see Table 3-18).\nEBX\nReserved\nECX\nReserved\nEDX\nReserved\n\n80000001H\n\nEAX\nExtended Processor Signature and Feature Bits.\nEBX\nReserved\nECX\nBit 00: LAHF/SAHF available in 64-bit mode Bits 04-01 Reserved Bit 05: LZCNT Bits 07-06 Reserved Bit 08: PREFETCHW Bits 31-09 Reserved\nEDX\nBits 10-00: Reserved Bit 11: SYSCALL/SYSRET available in 64-bit mode Bits 19-12: Reserved = 0 Bit 20: Execute Disable Bit available Bits 25-21: Reserved = 0 Bit 26: 1-GByte pages are available if 1 Bit 27: RDTSCP and IA32_TSC_AUX are available if 1 Bits 28: Reserved = 0\nBit 29: Intel® 64 Architecture available if 1 Bits 31-30: Reserved = 0\n\n80000002H\n\nEAX\nProcessor Brand String\nEBX\nProcessor Brand String Continued\nECX\nProcessor Brand String Continued\nEDX\nProcessor Brand String Continued\n\n80000003H\n\nEAX\nProcessor Brand String Continued\nEBX\nProcessor Brand String Continued\nECX\nProcessor Brand String Continued\nEDX\nProcessor Brand String Continued\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n80000004H\n\nEAX\nProcessor Brand String Continued\nEBX\nProcessor Brand String Continued\nECX\nProcessor Brand String Continued\nEDX\nProcessor Brand String Continued\n\n80000005H\n\nEAX\nReserved = 0\nEBX\nReserved = 0\nECX\nReserved = 0\nEDX\nReserved = 0\n\n80000006H\n\nEAX\nReserved = 0\nEBX\nReserved = 0\nECX\nBits 07-00: Cache Line size in bytes Bits 11-08: Reserved Bits 15-12: L2 Associativity field * Bits 31-16: Cache size in 1K units\nEDX\nReserved = 0\n\n\n\n\n\n\nNOTES:\n* L2 associativity field encodings:\n00H - Disabled 01H - Direct mapped 02H - 2-way 04H - 4-way 06H - 8-way 08H - 16-way 0FH - Fully associative\n80000007H\nEAX\nReserved = 0\nEBX\nReserved = 0\nECX\nReserved = 0\nEDX\nBits 07-00: Reserved = 0 Bit 08: Invariant TSC available if 1 Bits 31-09: Reserved = 0\n80000008H\nEAX\nLinear/Physical Address size Bits 07-00: #Physical Address Bits* Bits 15-8: #Linear Address Bits Bits 31-16: Reserved = 0\nEBX\nReserved = 0\nECX\nReserved = 0\nEDX\nReserved = 0\nNOTES:\n*\nIf CPUID.80000008H:EAX[7:0] is supported, the maximum physical address number supported should come from this field.\n" }, { "Name": "CRC32", "Alias": [], "Brief": "Accumulate CRC32 Value", "Description": "\nStarting with an initial value in the first operand (destination operand), accumulates a CRC32 (polynomial 11EDC6F41H) value for the second operand (source operand) and stores the result in the destination operand. The source operand can be a register or a memory location. The destination operand must be an r32 or r64 register. If the destination is an r64 register, then the 32-bit result is stored in the least significant double word and 00000000H is stored in the most significant double word of the r64 register.\nThe initial value supplied in the destination operand is a double word integer stored in the r32 register or the least significant double word of the r64 register. To incrementally accumulate a CRC32 value, software retains the result of the previous CRC32 operation in the destination operand, then executes the CRC32 instruction again with new input data in the source operand. Data contained in the source operand is processed in reflected bit order. This means that the most significant bit of the source operand is treated as the least significant bit of the quotient, and so on, for all the bits of the source operand. Likewise, the result of the CRC operation is stored in the destination operand in reflected bit order. This means that the most significant bit of the resulting CRC (bit 31) is stored in the least significant bit of the destination operand (bit 0), and so on, for all the bits of the CRC.\n" }, { "Name": "CVTDQ2PD", "Alias": [], "Brief": "Convert Packed Dword Integers to Packed Double", "Description": "\nConverts two packed signed doubleword integers in the source operand (second operand) to two packed double-precision floating-point values in the destination operand (first operand).\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding XMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 128- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\nX3\nX2\nX1\nX0\nSRC\nX3\nX2\nX1\nX0\nDEST\nFigure 3-10. CVTDQ2PD (VEX.256 encoded version)\n" }, { "Name": "CVTDQ2PS", "Alias": [], "Brief": "Convert Packed Dword Integers to Packed Single", "Description": "\nConverts four packed signed doubleword integers in the source operand (second operand) to four packed single-precision floating-point values in the destination operand (first operand).\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding XMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "CVTPD2DQ", "Alias": [], "Brief": "Convert Packed Double", "Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The result is stored in the low quadword of the destination operand and the high quadword is cleared to all 0s.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. Bits[127:64] of the destination XMM register are zeroed. However, the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:64) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n\n\n\n\n\nX3\nX2\nX1\nX0\nSRC\nDEST\n0\nX3\nX2\nX1\nX0\nFigure 3-11. VCVTPD2DQ (VEX.256 encoded version)\n" }, { "Name": "CVTPD2PI", "Alias": [], "Brief": "Convert Packed Double", "Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech-nology register.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPD2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "CVTPD2PS", "Alias": [], "Brief": "Convert Packed Double", "Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed single-precision floating-point values in the destination operand (first operand).\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. Bits[127:64] of the destination XMM register are zeroed. However, the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:64) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nSRC\nX3\nX2\nX1\nX0\nDEST\n0\nX3\nX2\nX1\nX0\nFigure 3-12. VCVTPD2PS (VEX.256 encoded version)\n" }, { "Name": "CVTPI2PD", "Alias": [], "Brief": "Convert Packed Dword Integers to Packed Double", "Description": "\nConverts two packed signed doubleword integers in the source operand (second operand) to two packed double-precision floating-point values in the destination operand (first operand).\nThe source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an XMM register. In addition, depending on the operand configuration:\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "CVTPI2PS", "Alias": [], "Brief": "Convert Packed Dword Integers to Packed Single", "Description": "\nConverts two packed signed doubleword integers in the source operand (second operand) to two packed single-precision floating-point values in the destination operand (first operand).\nThe source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an XMM register. The results are stored in the low quadword of the destination operand, and the high quadword remains unchanged. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPI2PS instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "CVTPS2DQ", "Alias": [], "Brief": "Convert Packed Single", "Description": "\nConverts four or eight packed single-precision floating-point values in the source operand to four or eight signed doubleword integers in the destination operand.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "CVTPS2PD", "Alias": [], "Brief": "Convert Packed Single", "Description": "\nConverts two or four packed single-precision floating-point values in the source operand (second operand) to two or four packed double-precision floating-point values in the destination operand (first operand).\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 64- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nX3\nX2\nX1\nX0\nSRC\nX3\nX2\nX1\nX0\nDEST\nFigure 3-13. CVTPS2PD (VEX.256 encoded version)\n" }, { "Name": "CVTPS2PI", "Alias": [], "Brief": "Convert Packed Single", "Description": "\nConverts two packed single-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech-nology register. When the source operand is an XMM register, the two single-precision floating-point values are contained in the low quadword of the register. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indef-inite integer value (80000000H) is returned.\nCVTPS2PI causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPS2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "CVTSD2SI", "Alias": [], "Brief": "Convert Scalar Double", "Description": "\nConverts a double-precision floating-point value in the source operand (second operand) to a signed doubleword integer in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is a general-purpose register. When the source operand is an XMM register, the double-precision floating-point value is contained in the low quadword of the register.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the begin-ning of this section for encoding data and limits.\nLegacy SSE instructions: Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "CVTSD2SS", "Alias": [], "Brief": "Convert Scalar Double", "Description": "\nConverts a double-precision floating-point value in the source operand (second operand) to a single-precision floating-point value in the destination operand (first operand).\nThe source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. When the source operand is an XMM register, the double-precision floating-point value is contained in the low quadword of the register. The result is stored in the low doubleword of the destination operand, and the upper 3 doublewords are left unchanged. When the conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "CVTSI2SD", "Alias": [], "Brief": "Convert Dword Integer to Scalar Double", "Description": "\nConverts a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the second source operand to a double-precision floating-point value in the destination operand. The result is stored in the low quad-word of the destination operand, and the high quadword left unchanged. When conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nLegacy SSE instructions: Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the beginning of this section for encoding data and limits.\nThe second source operand can be a general-purpose register or a 32/64-bit memory location. The first source and destination operands are XMM registers.\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "CVTSI2SS", "Alias": [], "Brief": "Convert Dword Integer to Scalar Single", "Description": "\nConverts a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the source operand (second operand) to a single-precision floating-point value in the destination operand (first operand). The source operand can be a general-purpose register or a memory location. The destination operand is an XMM register. The result is stored in the low doubleword of the destination operand, and the upper three doublewords are left unchanged. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nLegacy SSE instructions: In 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the beginning of this section for encoding data and limits.\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "CVTSS2SD", "Alias": [], "Brief": "Convert Scalar Single", "Description": "\nConverts a single-precision floating-point value in the source operand (second operand) to a double-precision floating-point value in the destination operand (first operand). The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. When the source operand is an XMM register, the single-precision floating-point value is contained in the low doubleword of the register. The result is stored in the low quadword of the destination operand, and the high quadword is left unchanged.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "CVTSS2SI", "Alias": [], "Brief": "Convert Scalar Single", "Description": "\nConverts a single-precision floating-point value in the source operand (second operand) to a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the destination operand (first operand). The source operand can be an XMM register or a memory location. The destination operand is a general-purpose register. When the source operand is an XMM register, the single-precision floating-point value is contained in the low doubleword of the register.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the begin-ning of this section for encoding data and limits.\nLegacy SSE instructions: In 64-bit mode, Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "CVTTPD2DQ", "Alias": [], "Brief": "Convert with Truncation Packed Double", "Description": "\nConverts two or four packed double-precision floating-point values in the source operand (second operand) to two or four packed signed doubleword integers in the destination operand (first operand).\nWhen a conversion is inexact, a truncated (round toward zero) value is returned.If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n\n\n\nX2\nX1\nSRC\nX3\nX0\nDEST\n0\nX3\nX2\nX1\nX0\nFigure 3-14. VCVTTPD2DQ (VEX.256 encoded version)\n" }, { "Name": "CVTTPD2PI", "Alias": [], "Brief": "Convert with Truncation Packed Double", "Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX technology register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTTPD2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "CVTTPS2DQ", "Alias": [], "Brief": "Convert with Truncation Packed Single", "Description": "\nConverts four or eight packed single-precision floating-point values in the source operand to four or eight signed doubleword integers in the destination operand.\nWhen a conversion is inexact, a truncated (round toward zero) value is returned.If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "CVTTPS2PI", "Alias": [], "Brief": "Convert with Truncation Packed Single", "Description": "\nConverts two packed single-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is an MMX technology register. When the source operand is an XMM register, the two single-precision floating-point values are contained in the low quadword of the register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTTPS2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "CVTTSD2SI", "Alias": [], "Brief": "Convert with Truncation Scalar Double", "Description": "\nConverts a double-precision floating-point value in the source operand (second operand) to a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is a general purpose register. When the source operand is an XMM register, the double-precision floating-point value is contained in the low quadword of the register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H) is returned.\nLegacy SSE instructions: In 64-bit mode, Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "CVTTSS2SI", "Alias": [], "Brief": "Convert with Truncation Scalar Single", "Description": "\nConverts a single-precision floating-point value in the source operand (second operand) to a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the destination operand (first operand). The source operand can be an XMM register or a 32-bit memory location. The destination operand is a general-purpose register. When the source operand is an XMM register, the single-precision floating-point value is contained in the low doubleword of the register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H) is returned.\nLegacy SSE instructions: In 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "DAA", "Alias": [], "Brief": "Decimal Adjust AL after Addition", "Description": "\nAdjusts the sum of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAA instruction is only useful when it follows an ADD instruction that adds (binary addi-tion) two 2-digit, packed BCD values and stores a byte result in the AL register. The DAA instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal carry is detected, the CF and AF flags are set accordingly.\nThis instruction executes as described above in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "DAS", "Alias": [], "Brief": "Decimal Adjust AL after Subtraction", "Description": "\nAdjusts the result of the subtraction of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one 2-digit, packed BCD value from another and stores a byte result in the AL register. The DAS instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal borrow is detected, the CF and AF flags are set accordingly.\nThis instruction executes as described above in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "DEC", "Alias": [], "Brief": "Decrement by 1", "Description": "\nSubtracts 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (To perform a decrement operation that updates the CF flag, use a SUB instruction with an immediate operand of 1.)\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, DEC r16 and DEC r32 are not encodable (because opcodes 48H through 4FH are REX prefixes). Otherwise, the instruction’s 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.\nSee the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "DIV", "Alias": [], "Brief": "Unsigned Divide", "Description": "\nDivides unsigned the value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor). Division using 64-bit operand is available only in 64-bit mode.\nNon-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magni-tude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. In 64-bit mode when REX.W is applied, the instruction divides the unsigned value in RDX:RAX by the source operand and stores the quotient in RAX, the remainder in RDX.\nSee the summary chart at the beginning of this section for encoding data and limits. See Table 3-25.\nTable 3-25. DIV Action\n\n\n\nMaximum\nOperand Size\nDividend\nDivisor\nQuotient\nRemainder\nQuotient\nWord/byte\nAX\nr/m8\nAL\nAH\n255\nDoubleword/word\nDX:AX\nr/m16\nAX\nDX\n65,535\nQuadword/doubleword\nEDX:EAX\nr/m32\nEAX\nEDX\n232 − 1\nDoublequadword/\nRDX:RAX\nr/m64\nRAX\nRDX\n264 − 1\nquadword\n" }, { "Name": "DIVPD", "Alias": [], "Brief": "Divide Packed Double", "Description": "\nPerforms an SIMD divide of the two or four packed double-precision floating-point values in the first source operand by the two or four packed double-precision floating-point values in the second source operand. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a SIMD double-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "DIVPS", "Alias": [], "Brief": "Divide Packed Single", "Description": "\nPerforms an SIMD divide of the four or eight packed single-precision floating-point values in the first source operand by the four or eight packed single-precision floating-point values in the second source operand. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a SIMD single-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "DIVSD", "Alias": [], "Brief": "Divide Scalar Double", "Description": "\nDivides the low double-precision floating-point value in the first source operand by the low double-precision floating-point value in the second source operand, and stores the double-precision floating-point result in the destination operand. The second source operand can be an XMM register or a 64-bit memory location. The first source and destination hyperons are XMM registers. The high quadword of the destination operand is copied from the high quadword of the first source operand. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a scalar double-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "DIVSS", "Alias": [], "Brief": "Divide Scalar Single", "Description": "\nDivides the low single-precision floating-point value in the first source operand by the low single-precision floating-point value in the second source operand, and stores the single-precision floating-point result in the destination operand. The second source operand can be an XMM register or a 32-bit memory location. The first source and destination operands are XMM registers. The three high-order doublewords of the destination are copied from the same dwords of the first source operand. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an overview of a scalar single-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "DPPD", "Alias": [], "Brief": "Dot Product of Packed Double Precision Floating", "Description": "\nConditionally multiplies the packed double-precision floating-point values in the destination operand (first operand) with the packed double-precision floating-point values in the source (second operand) depending on a mask extracted from bits [5:4] of the immediate operand (third operand). If a condition mask bit is zero, the corre-sponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nThe two resulting double-precision values are summed into an intermediate result. The intermediate result is conditionally broadcasted to the destination using a broadcast mask specified by bits [1:0] of the immediate byte.\nIf a broadcast mask bit is \"1\", the intermediate result is copied to the corresponding qword element in the destina-tion operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero.\nDPPD follows the NaN forwarding rules stated in the Software Developer’s Manual, vol. 1, table 4.7. These rules do not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally gener-ated NaNs will have at least one NaN propagated to the destination.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nIf VDPPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "DPPS", "Alias": [], "Brief": "Dot Product of Packed Single Precision Floating", "Description": "\nConditionally multiplies the packed single precision floating-point values in the destination operand (first operand) with the packed single-precision floats in the source (second operand) depending on a mask extracted from the high 4 bits of the immediate byte (third operand). If a condition mask bit in Imm8[7:4] is zero, the corresponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 1.\nThe four resulting single-precision values are summed into an intermediate result. The intermediate result is condi-tionally broadcasted to the destination using a broadcast mask specified by bits [3:0] of the immediate byte.\nIf a broadcast mask bit is \"1\", the intermediate result is copied to the corresponding dword element in the destina-tion operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero.\nDPPS follows the NaN forwarding rules stated in the Software Developer’s Manual, vol. 1, table 4.7. These rules do not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally gener-ated NaNs will have at least one NaN propagated to the destination.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "EMMS", "Alias": [], "Brief": "Empty MMX Technology State", "Description": "\nSets the values of all the tags in the x87 FPU tag word to empty (all 1s). This operation marks the x87 FPU data registers (which are aliased to the MMX technology registers) as available for use by x87 FPU floating-point instruc-tions. (See Figure 8-7 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for the format of the x87 FPU tag word.) All other MMX instructions (other than the EMMS instruction) set all the tags in x87 FPU tag word to valid (all 0s).\nThe EMMS instruction must be used to clear the MMX technology state at the end of all MMX technology procedures or subroutines and before calling other procedures or subroutines that may execute x87 floating-point instructions. If a floating-point instruction loads one of the registers in the x87 FPU data register stack before the x87 FPU tag word has been reset by the EMMS instruction, an x87 floating-point register stack overflow can occur that will result in an x87 floating-point exception or incorrect result.\nEMMS operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "ENTER", "Alias": [], "Brief": "Make Stack Frame for Procedure Parameters", "Description": "\nCreates a stack frame for a procedure. The first operand (size operand) specifies the size of the stack frame (that is, the number of bytes of dynamic storage allocated on the stack for the procedure). The second operand (nesting level operand) gives the lexical nesting level (0 to 31) of the procedure. The nesting level determines the number of stack frame pointers that are copied into the “display area” of the new stack frame from the preceding frame. Both of these operands are immediate values.\nThe stack-size attribute determines whether the BP (16 bits), EBP (32 bits), or RBP (64 bits) register specifies the current frame pointer and whether SP (16 bits), ESP (32 bits), or RSP (64 bits) specifies the stack pointer. In 64-bit mode, stack-size attribute is always 64-bits.\nThe ENTER and companion LEAVE instructions are provided to support block structured languages. The ENTER instruction (when used) is typically the first instruction in a procedure and is used to set up a new stack frame for a procedure. The LEAVE instruction is then used at the end of the procedure (just before the RET instruction) to release the stack frame.\nIf the nesting level is 0, the processor pushes the frame pointer from the BP/EBP/RBP register onto the stack, copies the current stack pointer from the SP/ESP/RSP register into the BP/EBP/RBP register, and loads the SP/ESP/RSP register with the current stack-pointer value minus the value in the size operand. For nesting levels of 1 or greater, the processor pushes additional frame pointers on the stack before adjusting the stack pointer. These additional frame pointers provide the called procedure with access points to other nested frames on the stack. See “Procedure Calls for Block-Structured Languages” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information about the actions of the ENTER instruction.\nThe ENTER instruction causes a page fault whenever a write using the final value of the stack pointer (within the current stack segment) would do so.\nIn 64-bit mode, default operation size is 64 bits; 32-bit operation size cannot be encoded.\n" }, { "Name": "EXTRACTPS", "Alias": [], "Brief": "Extract Packed Single Precision Floating", "Description": "\nExtracts a single-precision floating-point value from the source operand (second operand) at the 32-bit offset spec-ified from imm8. Immediate bits higher than the most significant offset for the vector length are ignored.\nThe extracted single-precision floating-point value is stored in the low 32-bits of the destination operand\nIn 64-bit mode, destination register operand has default operand size of 64 bits. The upper 32-bits of the register are filled with zero. REX.W is ignored.\n128-bit Legacy SSE version: When a REX.W prefix is used in 64-bit mode with a general purpose register (GPR) as a destination operand, the packed single quantity is zero extended to 64 bits.\nVEX.128 encoded version: When VEX.128.66.0F3A.W1 17 form is used in 64-bit mode with a general purpose register (GPR) as a destination operand, the packed single quantity is zero extended to 64 bits. VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nThe source register is an XMM register. Imm8[1:0] determine the starting DWORD offset from which to extract the 32-bit floating-point value.\nIf VEXTRACTPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "F2XM1", "Alias": [], "Brief": "Compute 2x–1", "Description": "\nComputes the exponential value of 2 to the power of the source operand minus 1. The source operand is located in register ST(0) and the result is also stored in ST(0). The value of the source operand must lie in the range –1.0 to +1.0. If the source value is outside this range, the result is undefined.\nThe following table shows the results obtained when computing the exponential value of various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-26. Results Obtained from F2XM1\n\n\nST(0) SRC\nST(0) DEST\n\n− 1.0 to −0\n− 0.5 to − 0\n\n− 0\n− 0\n\n+ 0\n+ 0\n\n+ 0 to +1.0\n+ 0 to 1.0\nValues other than 2 can be exponentiated using the following formula:\nxy ← 2(y ∗ log2x)\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FABS", "Alias": [], "Brief": "Absolute Value", "Description": "\nClears the sign bit of ST(0) to create the absolute value of the operand. The following table shows the results obtained when creating the absolute value of various classes of numbers.\nTable 3-27. Results Obtained from FABS\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n+ ∞\n\n− F\n+ F\n\n− 0\n+ 0\n\n+ 0\n+ 0\n\n+ F\n+ F\n\n+ ∞\n+ ∞\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FADD", "Alias": [ "FADDP", "FIADD" ], "Brief": "Add", "Description": "\nAdds the destination and source operands and stores the sum in the destination location. The destination operand is always an FPU register; the source operand can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThe no-operand version of the instruction adds the contents of the ST(0) register to the ST(1) register. The one-operand version adds the contents of a memory location (either a floating-point or an integer value) to the contents of the ST(0) register. The two-operand version, adds the contents of the ST(0) register to the ST(i) register or vice versa. The value in ST(0) can be doubled by coding:\nFADD ST(0), ST(0);\nThe FADDP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. (The no-operand version of the floating-point add instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FADD rather than FADDP.)\nThe FIADD instructions convert an integer source operand to double extended-precision floating-point format before performing the addition.\nThe table on the following page shows the results obtained when adding various classes of numbers, assuming that neither overflow nor underflow occurs.\nWhen the sum of two operands with opposite signs is 0, the result is +0, except for the round toward −∞ mode, in which case the result is −0. When the source operand is an integer 0, it is treated as a +0.\nWhen both operand are infinities of the same sign, the result is ∞ of the expected sign. If both operands are infini-ties of opposite signs, an invalid-operation exception is generated. See Table 3-28.\nTable 3-28. FADD/FADDP/FIADD Results\nDEST\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\n*\nNaN\n− F or − I\n− ∞\n− F\n± F or ± 0\n+ ∞\nSRC\nSRC\nNaN\n−0\n− ∞\n− 0\n± 0\n+ ∞\nSRC\nDEST\nDEST\nNaN\n+ 0\n− ∞\n± 0\n+ 0\n+ ∞\nDEST\nDEST\nNaN\n+ F or + I\n− ∞\n± F or ± 0\n+ F\n+ ∞\nSRC\nSRC\nNaN\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n*\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FBLD", "Alias": [], "Brief": "Load Binary Coded Decimal", "Description": "\nConverts the BCD source operand into double extended-precision floating-point format and pushes the value onto the FPU stack. The source operand is loaded without rounding errors. The sign of the source operand is preserved, including that of −0.\nThe packed BCD digits are assumed to be in the range 0 through 9; the instruction does not check for invalid digits (AH through FH). Attempting to load an invalid encoding produces an undefined result.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FBSTP", "Alias": [], "Brief": "Store BCD Integer and Pop", "Description": "\nConverts the value in the ST(0) register to an 18-digit packed BCD integer, stores the result in the destination operand, and pops the register stack. If the source value is a non-integral value, it is rounded to an integer value, according to rounding mode specified by the RC field of the FPU control word. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThe destination operand specifies the address where the first byte destination value is to be stored. The BCD value (including its sign bit) requires 10 bytes of space in memory.\nThe following table shows the results obtained when storing various classes of numbers in packed BCD format.\nTable 3-29. FBSTP Results\n\n\nST(0)\nDEST\n\n− ∞ or Value Too Large for DEST Format\n*\n\nF ≤ − 1\n− D\n\n−1 < F < -0\n**\n\n− 0\n− 0\n\n+ 0\n+ 0\n\n+ 0 < F < +1\n**\n\nF ≥ +1\n+ D\n\n+ ∞ or Value Too Large for DEST Format\n*\n\nNaN\n*\nNOTES:\nF Means finite floating-point value.\nD Means packed-BCD number.\n*\nIndicates floating-point invalid-operation (#IA) exception.\n** ±0 or ±1, depending on the rounding mode.\nIf the converted value is too large for the destination format, or if the source operand is an ∞, SNaN, QNAN, or is in an unsupported format, an invalid-arithmetic-operand condition is signaled. If the invalid-operation exception is not masked, an invalid-arithmetic-operand exception (#IA) is generated and no value is stored in the destination operand. If the invalid-operation exception is masked, the packed BCD indefinite value is stored in memory.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FCHS", "Alias": [], "Brief": "Change Sign", "Description": "\nComplements the sign bit of ST(0). This operation changes a positive value into a negative value of equal magni-tude or vice versa. The following table shows the results obtained when changing the sign of various classes of numbers.\nTable 3-30. FCHS Results\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n+ ∞\n\n− F\n+ F\n\n− 0\n+ 0\n\n+ 0\n− 0\n\n+ F\n− F\n\n+ ∞\n− ∞\n\nNaN\nNaN\nNOTES:\n*\nF means finite floating-point value.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FCLEX", "Alias": [ "FNCLEX" ], "Brief": "Clear Exceptions", "Description": "\nClears the floating-point exception flags (PE, UE, OE, ZE, DE, and IE), the exception summary status flag (ES), the stack fault flag (SF), and the busy flag (B) in the FPU status word. The FCLEX instruction checks for and handles any pending unmasked floating-point exceptions before clearing the exception flags; the FNCLEX instruction does not.\nThe assembler issues two instructions for the FCLEX instruction (an FWAIT instruction followed by an FNCLEX instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\n" }, { "Name": "FCMOVcc", "Alias": [ "FCMOVA", "FCMOVAE", "FCMOVB", "FCMOVBE", "FCMOVC", "FCMOVE", "FCMOVG", "FCMOVGE", "FCMOVL", "FCMOVLE", "FCMOVNA", "FCMOVNAE", "FCMOVNB", "FCMOVNBE", "FCMOVNC", "FCMOVNE", "FCMOVNG", "FCMOVNGE", "FCMOVNL", "FCMOVNLE", "FCMOVNO", "FCMOVNP", "FCMOVNS", "FCMOVNZ", "FCMOVO", "FCMOVP", "FCMOVPE", "FCMOVPO", "FCMOVS", "FCMOVZ" ], "Brief": "Floating", "Description": "\nTests the status flags in the EFLAGS register and moves the source operand (second operand) to the destination operand (first operand) if the given test condition is true. The condition for each mnemonic os given in the Descrip-tion column above and in Chapter 8 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1. The source operand is always in the ST(i) register and the destination operand is always ST(0).\nThe FCMOVcc instructions are useful for optimizing small IF constructions. They also help eliminate branching over-head for IF operations and the possibility of branch mispredictions by the processor.\nA processor may not support the FCMOVcc instructions. Software can check if the FCMOVcc instructions are supported by checking the processor’s feature information with the CPUID instruction (see “COMISS—Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS” in this chapter). If both the CMOV and FPU feature bits are set, the FCMOVcc instructions are supported.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FCOM", "Alias": [ "FCOMP", "FCOMPP" ], "Brief": "Compare Floating Point Values", "Description": "\nCompares the contents of register ST(0) and source value and sets condition code flags C0, C2, and C3 in the FPU status word according to the results (see the table below). The source operand can be a data register or a memory location. If no source operand is given, the value in ST(0) is compared with the value in ST(1). The sign of zero is ignored, so that –0.0 is equal to +0.0.\nTable 3-31. FCOM/FCOMP/FCOMPP Results\n\n\nCondition\nC3\nC2\nC0\n\nST(0) > SRC\n0\n0\n0\n\nST(0) < SRC\n0\n0\n1\n\nST(0) = SRC\n1\n0\n0\n\nUnordered*\n1\n1\n1\nNOTES:\n*\nFlags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated.\nThis instruction checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). If either operand is a NaN or is in an unsupported format, an invalid-arithmetic-operand exception (#IA) is raised and, if the exception is masked, the condition flags are set to “unordered.” If the invalid-arithmetic-operand excep-tion is unmasked, the condition code flags are not set.\nThe FCOMP instruction pops the register stack following the comparison operation and the FCOMPP instruction pops the register stack twice following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThe FCOM instructions perform the same operation as the FUCOM instructions. The only difference is how they handle QNaN operands. The FCOM instructions raise an invalid-arithmetic-operand exception (#IA) when either or both of the operands is a NaN value or is in an unsupported format. The FUCOM instructions perform the same operation as the FCOM instructions, except that they do not generate an invalid-arithmetic-operand exception for QNaNs.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FCOMI", "Alias": [ "FCOMIP", " FUCOMI", "FUCOMIP" ], "Brief": "Compare Floating Point Values and Set EFLAGS", "Description": "\nPerforms an unordered comparison of the contents of registers ST(0) and ST(i) and sets the status flags ZF, PF, and CF in the EFLAGS register according to the results (see the table below). The sign of zero is ignored for compari-sons, so that –0.0 is equal to +0.0.\nTable 3-32. FCOMI/FCOMIP/ FUCOMI/FUCOMIP Results\n\n\nComparison Results*\nZF\nPF\nCF\n\nST0 > ST(i)\n0\n0\n0\n\nST0 < ST(i)\n0\n0\n1\n\nST0 = ST(i)\n1\n0\n0\n\nUnordered**\n1\n1\n1\nNOTES:\n*\nSee the IA-32 Architecture Compatibility section below.\n** Flags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated.\nAn unordered comparison checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). The FUCOMI/FUCOMIP instructions perform the same operations as the FCOMI/FCOMIP instructions. The only difference is that the FUCOMI/FUCOMIP instructions raise the invalid-arithmetic-operand exception (#IA) only when either or both operands are an SNaN or are in an unsupported format; QNaNs cause the condition code flags to be set to unordered, but do not cause an exception to be generated. The FCOMI/FCOMIP instructions raise an invalid-operation exception when either or both of the operands are a NaN value of any kind or are in an unsup-ported format.\nIf the operation results in an invalid-arithmetic-operand exception being raised, the status flags in the EFLAGS register are set only if the exception is masked.\nThe FCOMI/FCOMIP and FUCOMI/FUCOMIP instructions set the OF, SF and AF flags to zero in the EFLAGS register (regardless of whether an invalid-operation exception is detected).\nThe FCOMIP and FUCOMIP instructions also pop the register stack following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FCOS", "Alias": [], "Brief": "Cosine", "Description": "\nComputes the cosine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range −263 to +263. The following table shows the results obtained when taking the cosine of various classes of numbers.\nTable 3-33. FCOS Results\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n*\n\n− F\n−1 to +1\n\n− 0\n+ 1\n\n+ 0\n+ 1\n\n+ F\n− 1 to + 1\n\n+ ∞\n*\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FDECSTP", "Alias": [], "Brief": "Decrement Stack", "Description": "\nSubtracts one from the TOP field of the FPU status word (decrements the top-of-stack pointer). If the TOP field contains a 0, it is set to 7. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FDIV", "Alias": [ "FDIVP", "FIDIV" ], "Brief": "Divide", "Description": "\nDivides the destination operand by the source operand and stores the result in the destination location. The desti-nation operand (dividend) is always in an FPU register; the source operand (divisor) can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format, word or doubleword integer format.\nThe no-operand version of the instruction divides the contents of the ST(1) register by the contents of the ST(0) register. The one-operand version divides the contents of the ST(0) register by the contents of a memory location (either a floating-point or an integer value). The two-operand version, divides the contents of the ST(0) register by the contents of the ST(i) register or vice versa.\nThe FDIVP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point divide instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FDIV rather than FDIVP.\nThe FIDIV instructions convert an integer source operand to double extended-precision floating-point format before performing the division. When the source operand is an integer 0, it is treated as a +0.\nIf an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an ∞ of the appropriate sign is stored in the destination operand.\nThe following table shows the results obtained when dividing various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-34. FDIV/FDIVP/FIDIV Results\nDEST\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n− ∞\n+ 0\n+ 0\n− 0\n− 0\n*\n*\nNaN\n− F\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n− I\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n− 0\n+ ∞\n− ∞\nSRC\n**\n*\n*\n**\nNaN\n+ 0\n− ∞\n+ ∞\n**\n*\n*\n**\nNaN\n+ I\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n+ F\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n+ ∞\n− 0\n− 0\n+ 0\n+ 0\n*\n*\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FDIVR", "Alias": [ "FDIVRP", "FIDIVR" ], "Brief": "Reverse Divide", "Description": "\nDivides the source operand by the destination operand and stores the result in the destination location. The desti-nation operand (divisor) is always in an FPU register; the source operand (dividend) can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format, word or doubleword integer format.\nThese instructions perform the reverse operations of the FDIV, FDIVP, and FIDIV instructions. They are provided to support more efficient coding.\nThe no-operand version of the instruction divides the contents of the ST(0) register by the contents of the ST(1) register. The one-operand version divides the contents of a memory location (either a floating-point or an integer value) by the contents of the ST(0) register. The two-operand version, divides the contents of the ST(i) register by the contents of the ST(0) register or vice versa.\nThe FDIVRP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point divide instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FDIVR rather than FDIVRP.\nThe FIDIVR instructions convert an integer source operand to double extended-precision floating-point format before performing the division.\nIf an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an ∞ of the appropriate sign is stored in the destination operand.\nThe following table shows the results obtained when dividing various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-35. FDIVR/FDIVRP/FIDIVR Results\nDEST\nNaN\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\n*\n*\nNaN\n− ∞\n+ ∞\n+ ∞\n− ∞\n− ∞\n**\n**\nNaN\n− F\n+ 0\n+ F\n− F\n− 0\nSRC\n**\n**\nNaN\n− I\n+ 0\n+ F\n− F\n− 0\n*\n*\nNaN\n− 0\n+ 0\n+ 0\n− 0\n− 0\n*\n*\nNaN\n+ 0\n− 0\n− 0\n+ 0\n+ 0\n**\n**\nNaN\n+ I\n− 0\n− F\n+ F\n+ 0\n**\n**\nNaN\n+ F\n− 0\n− F\n+ F\n+ 0\n*\n*\nNaN\n+ ∞\n− ∞\n− ∞\n+ ∞\n+ ∞\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nWhen the source operand is an integer 0, it is treated as a +0. This instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FFREE", "Alias": [], "Brief": "Free Floating", "Description": "\nSets the tag in the FPU tag register associated with register ST(i) to empty (11B). The contents of ST(i) and the FPU stack-top pointer (TOP) are not affected.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FICOM", "Alias": [ "FICOMP" ], "Brief": "Compare Integer", "Description": "\nCompares the value in ST(0) with an integer source operand and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (see table below). The integer value is converted to double extended-precision floating-point format before the comparison is made.\nTable 3-36. FICOM/FICOMP Results\n\n\nCondition\nC3\nC2\nC0\n\nST(0) > SRC\n0\n0\n0\n\nST(0) < SRC\n0\n0\n1\n\nST(0) = SRC\n1\n0\n0\n\nUnordered\n1\n1\n1\nThese instructions perform an “unordered comparison.” An unordered comparison also checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). If either operand is a NaN or is in an undefined format, the condition flags are set to “unordered.”\nThe sign of zero is ignored, so that –0.0 ← +0.0.\nThe FICOMP instructions pop the register stack following the comparison. To pop the register stack, the processor marks the ST(0) register empty and increments the stack pointer (TOP) by 1.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FILD", "Alias": [], "Brief": "Load Integer", "Description": "\nConverts the signed-integer source operand into double extended-precision floating-point format and pushes the value onto the FPU register stack. The source operand can be a word, doubleword, or quadword integer. It is loaded without rounding errors. The sign of the source operand is preserved.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FMUL", "Alias": [ "FMULP", "FIMUL" ], "Brief": "Multiply", "Description": "\nMultiplies the destination and source operands and stores the product in the destination location. The destination operand is always an FPU data register; the source operand can be an FPU data register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThe no-operand version of the instruction multiplies the contents of the ST(1) register by the contents of the ST(0) register and stores the product in the ST(1) register. The one-operand version multiplies the contents of the ST(0) register by the contents of a memory location (either a floating point or an integer value) and stores the product in the ST(0) register. The two-operand version, multiplies the contents of the ST(0) register by the contents of the ST(i) register, or vice versa, with the result being stored in the register specified with the first operand (the desti-nation operand).\nThe FMULP instructions perform the additional operation of popping the FPU register stack after storing the product. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point multiply instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FMUL rather than FMULP.\nThe FIMUL instructions convert an integer source operand to double extended-precision floating-point format before performing the multiplication.\nThe sign of the result is always the exclusive-OR of the source signs, even if one or more of the values being multi-plied is 0 or ∞. When the source operand is an integer 0, it is treated as a +0.\nThe following table shows the results obtained when multiplying various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-39. FMUL/FMULP/FIMUL Results\n\n\n\n\n\n\nDEST\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\n+ ∞\n+ ∞\n*\n*\n− ∞\n− ∞\nNaN\n\n\n− F\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n\n\n− I\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n\nSRC\n− 0\n*\n+ 0\n+ 0\n− 0\n− 0\n*\nNaN\n\n\n+ 0\n*\n− 0\n− 0\n+ 0\n+ 0\n*\nNaN\n\n\n+ I\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n+ F\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n+ ∞\n− ∞\n− ∞\n*\n*\n+ ∞\n+ ∞\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans Integer.\n*\nIndicates invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FINCSTP", "Alias": [], "Brief": "Increment Stack", "Description": "\nAdds one to the TOP field of the FPU status word (increments the top-of-stack pointer). If the TOP field contains a 7, it is set to 0. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected. This operation is not equivalent to popping the stack, because the tag for the previous top-of-stack register is not marked empty.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FINIT", "Alias": [ "FNINIT" ], "Brief": "Initialize Floating", "Description": "\nSets the FPU control, status, tag, instruction pointer, and data pointer registers to their default states. The FPU control word is set to 037FH (round to nearest, all exceptions masked, 64-bit precision). The status word is cleared (no exception flags set, TOP is set to 0). The data registers in the register stack are left unchanged, but they are all tagged as empty (11B). Both the instruction and data pointers are cleared.\nThe FINIT instruction checks for and handles any pending unmasked floating-point exceptions before performing the initialization; the FNINIT instruction does not.\nThe assembler issues two instructions for the FINIT instruction (an FWAIT instruction followed by an FNINIT instruction), and the processor executes each of these instructions in separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FIST", "Alias": [ "FISTP" ], "Brief": "Store Integer", "Description": "\nThe FIST instruction converts the value in the ST(0) register to a signed integer and stores the result in the desti-nation operand. Values can be stored in word or doubleword integer format. The destination operand specifies the address where the first byte of the destination value is to be stored.\nThe FISTP instruction performs the same operation as the FIST instruction and then pops the register stack. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The FISTP instruction also stores values in quadword integer format.\nThe following table shows the results obtained when storing various classes of numbers in integer format.\nTable 3-37. FIST/FISTP Results\n\n\nST(0)\nDEST\n\n− ∞ or Value Too Large for DEST Format\n*\n\nF ≤ −1\n− I\n\n−1 < F < −0\n**\n\n− 0\n0\n\n+ 0\n0\n\n+ 0 < F < + 1\n**\n\nF ≥ + 1\n+ I\n\n+ ∞ or Value Too Large for DEST Format\n*\n\nNaN\n*\n\n\n\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-operation (#IA) exception.\n** 0 or ±1, depending on the rounding mode.\nIf the source value is a non-integral value, it is rounded to an integer value, according to the rounding mode spec-ified by the RC field of the FPU control word.\nIf the converted value is too large for the destination format, or if the source operand is an ∞, SNaN, QNAN, or is in an unsupported format, an invalid-arithmetic-operand condition is signaled. If the invalid-operation exception is not masked, an invalid-arithmetic-operand exception (#IA) is generated and no value is stored in the destination operand. If the invalid-operation exception is masked, the integer indefinite value is stored in memory.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FISTTP", "Alias": [], "Brief": "Store Integer with Truncation", "Description": "\nFISTTP converts the value in ST into a signed integer using truncation (chop) as rounding mode, transfers the result to the destination, and pop ST. FISTTP accepts word, short integer, and long integer destinations.\nThe following table shows the results obtained when storing various classes of numbers in integer format.\nTable 3-38. FISTTP Results\n\n\nST(0)\nDEST\n\n− ∞ or Value Too Large for DEST Format\n*\n\nF ≤ − 1\n− I\n\n− 1 < F < + 1\n0\n\nF Š + 1\n+ I\n\n+ ∞ or Value Too Large for DEST Format\n*\n\nNaN\n*\nNOTES:\nF Means finite floating-point value.\nΙ\nMeans integer.\n∗ Indicates floating-point invalid-operation (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSUB", "Alias": [ "FSUBP", "FISUB" ], "Brief": "Subtract", "Description": "\nSubtracts the source operand from the destination operand and stores the difference in the destination location. The destination operand is always an FPU data register; the source operand can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThe no-operand version of the instruction subtracts the contents of the ST(0) register from the ST(1) register and stores the result in ST(1). The one-operand version subtracts the contents of a memory location (either a floating-point or an integer value) from the contents of the ST(0) register and stores the result in ST(0). The two-operand version, subtracts the contents of the ST(0) register from the ST(i) register or vice versa.\nThe FSUBP instructions perform the additional operation of popping the FPU register stack following the subtrac-tion. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point subtract instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FSUB rather than FSUBP.\nThe FISUB instructions convert an integer source operand to double extended-precision floating-point format before performing the subtraction.\nTable 3-48 shows the results obtained when subtracting various classes of numbers from one another, assuming that neither overflow nor underflow occurs. Here, the SRC value is subtracted from the DEST value (DEST − SRC = result).\nWhen the difference between two operands of like sign is 0, the result is +0, except for the round toward −∞ mode, in which case the result is −0. This instruction also guarantees that +0 − (−0) = +0, and that −0 − (+0) = −0. When the source operand is an integer 0, it is treated as a +0.\nWhen one operand is ∞, the result is ∞ of the expected sign. If both operands are ∞ of the same sign, an invalid-operation exception is generated.\nTable 3-48. FSUB/FSUBP/FISUB Results\n\n\n\n\n\n\nSRC\n\n\n\n\n\n\n\n− ∞\n− F or − I\n− 0\n+ 0\n+ F or + I\n+ ∞\nNaN\n\n\n− ∞\n*\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\nNaN\n\n\n− F\n+ ∞\n±F or ±0\nDEST\nDEST\n− F\n− ∞\nNaN\n\nDEST\n− 0\n+ ∞\n−SRC\n±0\n− 0\n− SRC\n− ∞\nNaN\n\n\n+ 0\n+ ∞\n−SRC\n+ 0\n±0\n− SRC\n− ∞\nNaN\n\n\n+ F\n+ ∞\n+ F\nDEST\nDEST\n±F or ±0\n− ∞\nNaN\n\n\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSUBR", "Alias": [ "FSUBRP", "FISUBR" ], "Brief": "Reverse Subtract", "Description": "\nSubtracts the destination operand from the source operand and stores the difference in the destination location. The destination operand is always an FPU register; the source operand can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThese instructions perform the reverse operations of the FSUB, FSUBP, and FISUB instructions. They are provided to support more efficient coding.\nThe no-operand version of the instruction subtracts the contents of the ST(1) register from the ST(0) register and stores the result in ST(1). The one-operand version subtracts the contents of the ST(0) register from the contents of a memory location (either a floating-point or an integer value) and stores the result in ST(0). The two-operand version, subtracts the contents of the ST(i) register from the ST(0) register or vice versa.\nThe FSUBRP instructions perform the additional operation of popping the FPU register stack following the subtrac-tion. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point reverse subtract instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FSUBR rather than FSUBRP.\nThe FISUBR instructions convert an integer source operand to double extended-precision floating-point format before performing the subtraction.\nThe following table shows the results obtained when subtracting various classes of numbers from one another, assuming that neither overflow nor underflow occurs. Here, the DEST value is subtracted from the SRC value (SRC − DEST = result).\nWhen the difference between two operands of like sign is 0, the result is +0, except for the round toward −∞ mode, in which case the result is −0. This instruction also guarantees that +0 − (−0) = +0, and that −0 − (+0) = −0. When the source operand is an integer 0, it is treated as a +0.\nWhen one operand is ∞, the result is ∞ of the expected sign. If both operands are ∞ of the same sign, an invalid-operation exception is generated.\nTable 3-49. FSUBR/FSUBRP/FISUBR Results\n\n\n\n\n\n\nSRC\n\n\n\n\n\n\n\n− ∞\n−F or −I\n−0\n+0\n+F or +I\n+ ∞\nNaN\n\n\n− ∞\n*\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\nNaN\n\n\n− F\n− ∞\n±F or ±0\n−DEST\n−DEST\n+ F\n+ ∞\nNaN\n\nDEST\n− 0\n− ∞\nSRC\n±0\n+ 0\nSRC\n+ ∞\nNaN\n\n\n+ 0\n− ∞\nSRC\n− 0\n±0\nSRC\n+ ∞\nNaN\n\n\n+ F\n− ∞\n− F\n−DEST\n−DEST\n±F or ±0\n+ ∞\nNaN\n\n\n+ ∞\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FLD", "Alias": [], "Brief": "Load Floating Point Value", "Description": "\nPushes the source operand onto the FPU register stack. The source operand can be in single-precision, double-precision, or double extended-precision floating-point format. If the source operand is in single-precision or double-precision floating-point format, it is automatically converted to the double extended-precision floating-point format before being pushed on the stack.\nThe FLD instruction can also push the value in a selected FPU register [ST(i)] onto the stack. Here, pushing register ST(0) duplicates the stack top.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FLD1", "Alias": [ "FLDL2T", "FLDL2E", "FLDPI", "FLDLG2", "FLDLN2", "FLDZ" ], "Brief": "Load Constant", "Description": "\nPush one of seven commonly used constants (in double extended-precision floating-point format) onto the FPU register stack. The constants that can be loaded with these instructions include +1.0, +0.0, log210, log2e, π, log102, and loge2. For each constant, an internal 66-bit constant is rounded (as specified by the RC field in the FPU control word) to double extended-precision floating-point format. The inexact-result exception (#P) is not generated as a result of the rounding, nor is the C1 flag set in the x87 FPU status word if the value is rounded up.\nSee the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a description of the π constant.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FLDCW", "Alias": [], "Brief": "Load x87 FPU Control Word", "Description": "\nLoads the 16-bit source operand into the FPU control word. The source operand is a memory location. This instruc-tion is typically used to establish or change the FPU’s mode of operation.\nIf one or more exception flags are set in the FPU status word prior to loading a new FPU control word and the new control word unmasks one or more of those exceptions, a floating-point exception will be generated upon execution of the next floating-point instruction (except for the no-wait floating-point instructions, see the section titled “Soft-ware Exception Handling” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1). To avoid raising exceptions when changing FPU operating modes, clear any pending exceptions (using the FCLEX or FNCLEX instruction) before loading the new control word.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FLDENV", "Alias": [], "Brief": "Load x87 FPU Environment", "Description": "\nLoads the complete x87 FPU operating environment from memory into the FPU registers. The source operand spec-ifies the first byte of the operating-environment data in memory. This data is typically written to the specified memory location by a FSTENV or FNSTENV instruction.\nThe FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the loaded environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used.\nThe FLDENV instruction should be executed in the same operating mode as the corresponding FSTENV/FNSTENV instruction.\nIf one or more unmasked exception flags are set in the new FPU status word, a floating-point exception will be generated upon execution of the next floating-point instruction (except for the no-wait floating-point instructions, see the section titled “Software Exception Handling” in Chapter 8 of the Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1). To avoid generating exceptions when loading a new environment, clear all the exception flags in the FPU status word that is being loaded.\nIf a page or limit fault occurs during the execution of this instruction, the state of the x87 FPU registers as seen by the fault handler may be different than the state being loaded from memory. In such situations, the fault handler should ignore the status of the x87 FPU registers, handle the fault, and return. The FLDENV instruction will then complete the loading of the x87 FPU registers with no resulting context inconsistency.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FNOP", "Alias": [], "Brief": "No Operation", "Description": "\nPerforms no FPU operation. This instruction takes up space in the instruction stream but does not affect the FPU or machine context, except the EIP register.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSAVE", "Alias": [ "FNSAVE" ], "Brief": "Store x87 FPU State", "Description": "\nStores the current FPU state (operating environment and register stack) at the specified destination in memory, and then re-initializes the FPU. The FSAVE instruction checks for and handles pending unmasked floating-point exceptions before storing the FPU state; the FNSAVE instruction does not.\nThe FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately follow the operating environment image.\nThe saved image reflects the state of the FPU after all floating-point instructions preceding the FSAVE/FNSAVE instruction in the instruction stream have been executed.\nAfter the FPU state has been saved, the FPU is reset to the same default values it is set to with the FINIT/FNINIT instructions (see “FINIT/FNINIT—Initialize Floating-Point Unit” in this chapter).\nThe FSAVE/FNSAVE instructions are typically used when the operating system needs to perform a context switch, an exception handler needs to use the FPU, or an application program needs to pass a “clean” FPU to a procedure.\nThe assembler issues two instructions for the FSAVE instruction (an FWAIT instruction followed by an FNSAVE instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSTCW", "Alias": [ "FNSTCW" ], "Brief": "Store x87 FPU Control Word", "Description": "\nStores the current value of the FPU control word at the specified destination in memory. The FSTCW instruction checks for and handles pending unmasked floating-point exceptions before storing the control word; the FNSTCW instruction does not.\nThe assembler issues two instructions for the FSTCW instruction (an FWAIT instruction followed by an FNSTCW instruction), and the processor executes each of these instructions in separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSTENV", "Alias": [ "FNSTENV" ], "Brief": "Store x87 FPU Environment", "Description": "\nSaves the current FPU operating environment at the memory location specified with the destination operand, and then masks all floating-point exceptions. The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the stored environ-ment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used.\nThe FSTENV instruction checks for and handles any pending unmasked floating-point exceptions before storing the FPU environment; the FNSTENV instruction does not. The saved image reflects the state of the FPU after all floating-point instructions preceding the FSTENV/FNSTENV instruction in the instruction stream have been executed.\nThese instructions are often used by exception handlers because they provide access to the FPU instruction and data pointers. The environment is typically saved in the stack. Masking all exceptions after saving the environment prevents floating-point exceptions from interrupting the exception handler.\nThe assembler issues two instructions for the FSTENV instruction (an FWAIT instruction followed by an FNSTENV instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSTSW", "Alias": [ "FNSTSW" ], "Brief": "Store x87 FPU Status Word", "Description": "\nStores the current value of the x87 FPU status word in the destination location. The destination operand can be either a two-byte memory location or the AX register. The FSTSW instruction checks for and handles pending unmasked floating-point exceptions before storing the status word; the FNSTSW instruction does not.\nThe FNSTSW AX form of the instruction is used primarily in conditional branching (for instance, after an FPU comparison instruction or an FPREM, FPREM1, or FXAM instruction), where the direction of the branch depends on the state of the FPU condition code flags. (See the section titled “Branching and Conditional Moves on FPU Condi-tion Codes” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.) This instruction can also be used to invoke exception handlers (by examining the exception flags) in environments that do not use interrupts. When the FNSTSW AX instruction is executed, the AX register is updated before the processor executes any further instructions. The status stored in the AX register is thus guaranteed to be from the completion of the prior FPU instruction.\nThe assembler issues two instructions for the FSTSW instruction (an FWAIT instruction followed by an FNSTSW instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FPATAN", "Alias": [], "Brief": "Partial Arctangent", "Description": "\nComputes the arctangent of the source operand in register ST(1) divided by the source operand in register ST(0), stores the result in ST(1), and pops the FPU register stack. The result in register ST(0) has the same sign as the source operand ST(1) and a magnitude less than +π.\nThe FPATAN instruction returns the angle between the X axis and the line from the origin to the point (X,Y), where Y (the ordinate) is ST(1) and X (the abscissa) is ST(0). The angle depends on the sign of X and Y independently, not just on the sign of the ratio Y/X. This is because a point (−X,Y) is in the second quadrant, resulting in an angle between π/2 and π, while a point (X,−Y) is in the fourth quadrant, resulting in an angle between 0 and −π/2. A point (−X,−Y) is in the third quadrant, giving an angle between −π/2 and −π.\nThe following table shows the results obtained when computing the arctangent of various classes of numbers, assuming that underflow does not occur.\nTable 3-40. FPATAN Results\n\n\n\n\n\n\nST(0)\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\n− 3π/4*\n− π/2\n− π/2\n− π/2\n− π/2\n− π/4*\nNaN\n\nST(1)\n− F\n-p\n−π to −π/2\n−π/2\n−π/2\n−π/2 to −0\n- 0\nNaN\n\n\n− 0\n-p\n-p\n-p*\n− 0*\n− 0\n− 0\nNaN\n\n\n+ 0\n+p\n+ p\n+ π*\n+ 0*\n+ 0\n+ 0\nNaN\n\n\n+ F\n+p\n+π to +π/2\n+ π/2\n+π/2\n+π/2 to +0\n+ 0\nNaN\n\n\n+ ∞\n+3π/4*\n+π/2\n+π/2\n+π/2\n+ π/2\n+ π/4*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nTable 8-10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, specifies that the ratios 0/0 and ∞/∞ generate the floating-point invalid arithmetic-operation exception and, if this exception is masked, the floating-point QNaN indefi-nite value is returned. With the FPATAN instruction, the 0/0 or ∞/∞ value is actually not calculated using division. Instead, the arc-tangent of the two variables is derived from a standard mathematical formulation that is generalized to allow complex numbers as arguments. In this complex variable formulation, arctangent(0,0) etc. has well defined values. These values are needed to develop a library to compute transcendental functions with complex arguments, based on the FPU functions that only allow floating-point values as arguments.\nThere is no restriction on the range of source operands that FPATAN can accept.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FPREM", "Alias": [], "Brief": "Partial Remainder", "Description": "\nComputes the remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in the ST(1) register (the divisor or modulus), and stores the result in ST(0). The remainder represents the following value:\nRemainder ← ST(0) − (Q ∗ ST(1))\nHere, Q is an integer value that is obtained by truncating the floating-point number quotient of [ST(0) / ST(1)] toward zero. The sign of the remainder is the same as the sign of the dividend. The magnitude of the remainder is less than that of the modulus, unless a partial remainder was computed (as described below).\nThis instruction produces an exact result; the inexact-result exception does not occur and the rounding control has no effect. The following table shows the results obtained when computing the remainder of various classes of numbers, assuming that underflow does not occur.\nTable 3-41. FPREM Results\n\n\n\n\n\n\nST(1)\n\n\n\n\n\n\n\n-∞\n-F\n-0\n+0\n+F\n+∞\nNaN\n\n\n-∞\n*\n*\n*\n*\n*\n*\nNaN\n\nST(0)\n-F\nST(0)\n-F or -0\n**\n**\n-F or -0\nST(0)\nNaN\n\n\n-0\n-0\n-0\n*\n*\n-0\n-0\nNaN\n\n\n+0\n+0\n+0\n*\n*\n+0\n+0\nNaN\n\n\n+F\nST(0)\n+F or +0\n**\n**\n+F or +0\nST(0)\nNaN\n\n\n+∞\n*\n*\n*\n*\n*\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nWhen the result is 0, its sign is the same as that of the dividend. When the modulus is ∞, the result is equal to the value in ST(0).\nThe FPREM instruction does not compute the remainder specified in IEEE Std 754. The IEEE specified remainder can be computed with the FPREM1 instruction. The FPREM instruction is provided for compatibility with the Intel 8087 and Intel287 math coprocessors.\nThe FPREM instruction gets its name “partial remainder” because of the way it computes the remainder. This instruction arrives at a remainder through iterative subtraction. It can, however, reduce the exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is less than the modulus, the operation is complete and the C2 flag in the FPU status word is cleared. Otherwise, C2 is set, and the result in ST(0) is called the partial remainder. The exponent of the partial remainder will be less than the exponent of the original dividend by at least 32. Software can re-execute the instruction (using the partial remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing such a remainder-computation loop, a higher-priority interrupting routine that needs the FPU can force a context switch in-between the instruc-tions in the loop.)\nAn important use of the FPREM instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU\nstatus word. This information is important in argument reduction for the tangent function (using a modulus of π/4), because it locates the original angle in the correct one of eight sectors of the unit circle.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FPREM1", "Alias": [], "Brief": "Partial Remainder", "Description": "\nComputes the IEEE remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in the ST(1) register (the divisor or modulus), and stores the result in ST(0). The remainder represents the following value:\nRemainder ← ST(0) − (Q ∗ ST(1))\nHere, Q is an integer value that is obtained by rounding the floating-point number quotient of [ST(0) / ST(1)] toward the nearest integer value. The magnitude of the remainder is less than or equal to half the magnitude of the modulus, unless a partial remainder was computed (as described below).\nThis instruction produces an exact result; the precision (inexact) exception does not occur and the rounding control has no effect. The following table shows the results obtained when computing the remainder of various classes of numbers, assuming that underflow does not occur.\nTable 3-42. FPREM1 Results\n\n\n\n\n\n\nST(1)\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\n*\n*\n*\n*\n*\n*\nNaN\n\nST(0)\n− F\nST(0)\n±F or −0\n**\n**\n± F or − 0\nST(0)\nNaN\n\n\n− 0\n− 0\n− 0\n*\n*\n− 0\n-0\nNaN\n\n\n+ 0\n+ 0\n+ 0\n*\n*\n+ 0\n+0\nNaN\n\n\n+ F\nST(0)\n± F or + 0\n**\n**\n± F or + 0\nST(0)\nNaN\n\n\n+ ∞\n*\n*\n*\n*\n*\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nWhen the result is 0, its sign is the same as that of the dividend. When the modulus is ∞, the result is equal to the value in ST(0).\nThe FPREM1 instruction computes the remainder specified in IEEE Standard 754. This instruction operates differ-ently from the FPREM instruction in the way that it rounds the quotient of ST(0) divided by ST(1) to an integer (see the “Operation” section below).\nLike the FPREM instruction, FPREM1 computes the remainder through iterative subtraction, but can reduce the exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is less than one half the modulus, the operation is complete and the C2 flag in the FPU status word is cleared. Otherwise, C2 is set, and the result in ST(0) is called the partial remainder. The exponent of the partial remainder will be less than the exponent of the original dividend by at least 32. Software can re-execute the instruction (using the partial remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing such a remainder-computation loop, a higher-priority interrupting routine that needs the FPU can force a context switch in-between the instructions in the loop.)\nAn important use of the FPREM1 instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU\nstatus word. This information is important in argument reduction for the tangent function (using a modulus of π/4), because it locates the original angle in the correct one of eight sectors of the unit circle.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FPTAN", "Alias": [], "Brief": "Partial Tangent", "Description": "\nComputes the tangent of the source operand in register ST(0), stores the result in ST(0), and pushes a 1.0 onto the FPU register stack. The source operand must be given in radians and must be less than ±263. The following table shows the unmasked results obtained when computing the partial tangent of various classes of numbers, assuming that underflow does not occur.\nTable 3-43. FPTAN Results\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n*\n\n− F\n− F to + F\n\n− 0\n- 0\n\n+ 0\n+ 0\n\n+ F\n− F to + F\n\n+ ∞\n*\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThe value 1.0 is pushed onto the register stack after the tangent has been computed to maintain compatibility with the Intel 8087 and Intel287 math coprocessors. This operation also simplifies the calculation of other trigonometric functions. For instance, the cotangent (which is the reciprocal of the tangent) can be computed by executing a FDIVR instruction after the FPTAN instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FRNDINT", "Alias": [], "Brief": "Round to Integer", "Description": "\nRounds the source value in the ST(0) register to the nearest integral value, depending on the current rounding mode (setting of the RC field of the FPU control word), and stores the result in ST(0).\nIf the source value is ∞, the value is not changed. If the source value is not an integral value, the floating-point inexact-result exception (#P) is generated.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FRSTOR", "Alias": [], "Brief": "Restore x87 FPU State", "Description": "\nLoads the FPU state (operating environment and register stack) from the memory area specified with the source operand. This state data is typically written to the specified memory location by a previous FSAVE/FNSAVE instruc-tion.\nThe FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately following the operating environment image.\nThe FRSTOR instruction should be executed in the same operating mode as the corresponding FSAVE/FNSAVE instruction.\nIf one or more unmasked exception bits are set in the new FPU status word, a floating-point exception will be generated. To avoid raising exceptions when loading a new operating environment, clear all the exception flags in the FPU status word that is being loaded.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSCALE", "Alias": [], "Brief": "Scale", "Description": "\nTruncates the value in the source operand (toward 0) to an integral value and adds that value to the exponent of the destination operand. The destination and source operands are floating-point values located in registers ST(0) and ST(1), respectively. This instruction provides rapid multiplication or division by integral powers of 2. The following table shows the results obtained when scaling various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-44. FSCALE Results\n\n\nST(1)\nST(1)\n\n\n\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\nNaN\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\nNaN\n\nST(0)\n− F\n− 0\n− F\n− F\n− F\n− F\n− ∞\nNaN\n\n\n− 0\n− 0\n− 0\n− 0\n− 0\n− 0\nNaN\nNaN\n\n\n+ 0\n+ 0\n+ 0\n+ 0\n+ 0\n+ 0\nNaN\nNaN\n\n\n+ F\n+ 0\n+ F\n+ F\n+ F\n+ F\n+ ∞\nNaN\n\n\n+ ∞\nNaN\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nIn most cases, only the exponent is changed and the mantissa (significand) remains unchanged. However, when the value being scaled in ST(0) is a denormal value, the mantissa is also changed and the result may turn out to be a normalized number. Similarly, if overflow or underflow results from a scale operation, the resulting mantissa will differ from the source’s mantissa.\nThe FSCALE instruction can also be used to reverse the action of the FXTRACT instruction, as shown in the following example:\nFXTRACT;\nFSCALE;\nFSTP ST(1);\nIn this example, the FXTRACT instruction extracts the significand and exponent from the value in ST(0) and stores them in ST(0) and ST(1) respectively. The FSCALE then scales the significand in ST(0) by the exponent in ST(1), recreating the original value before the FXTRACT operation was performed. The FSTP ST(1) instruction overwrites the exponent (extracted by the FXTRACT instruction) with the recreated value, which returns the stack to its orig-inal state with only one register [ST(0)] occupied.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSIN", "Alias": [], "Brief": "Sine", "Description": "\nComputes the sine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range −263 to +263. The following table shows the results obtained when taking the sine of various classes of numbers, assuming that underflow does not occur.\nTable 3-45. FSIN Results\n\n\nSRC (ST(0))\nDEST (ST(0))\n\n− ∞\n*\n\n− F\n− 1 to + 1\n\n− 0\n−0\n\n+ 0\n+ 0\n\n+ F\n− 1 to +1\n\n+ ∞\n*\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSINCOS", "Alias": [], "Brief": "Sine and Cosine", "Description": "\nComputes both the sine and the cosine of the source operand in register ST(0), stores the sine in ST(0), and pushes the cosine onto the top of the FPU register stack. (This instruction is faster than executing the FSIN and FCOS instructions in succession.)\nThe source operand must be given in radians and must be within the range −263 to +263. The following table shows the results obtained when taking the sine and cosine of various classes of numbers, assuming that underflow does not occur.\nTable 3-46. FSINCOS Results\nSRC\nDEST\n\n\n\n\n\n\nST(0)\nST(1) Cosine\nST(0) Sine\n\n− ∞\n*\n*\n\n− F\n− 1 to + 1\n− 1 to + 1\n\n− 0\n+ 1\n− 0\n\n+ 0\n+ 1\n+ 0\n\n+ F\n− 1 to + 1\n− 1 to + 1\n\n+ ∞\n*\n*\n\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FSQRT", "Alias": [], "Brief": "Square Root", "Description": "\nComputes the square root of the source value in the ST(0) register and stores the result in ST(0).\nThe following table shows the results obtained when taking the square root of various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-47. FSQRT Results\n\n\nSRC (ST(0))\nDEST (ST(0))\n\n− ∞\n*\n\n− F\n*\n\n− 0\n− 0\n\n+ 0\n+ 0\n\n+ F\n+ F\n\n+ ∞\n+ ∞\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FST", "Alias": [ "FSTP" ], "Brief": "Store Floating Point Value", "Description": "\nThe FST instruction copies the value in the ST(0) register to the destination operand, which can be a memory loca-tion or another register in the FPU register stack. When storing the value in memory, the value is converted to single-precision or double-precision floating-point format.\nThe FSTP instruction performs the same operation as the FST instruction and then pops the register stack. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The FSTP instruction can also store values in memory in double extended-precision floating-point format.\nIf the destination operand is a memory location, the operand specifies the address where the first byte of the desti-nation value is to be stored. If the destination operand is a register, the operand specifies a register in the register stack relative to the top of the stack.\nIf the destination size is single-precision or double-precision, the significand of the value being stored is rounded to the width of the destination (according to the rounding mode specified by the RC field of the FPU control word), and the exponent is converted to the width and bias of the destination format. If the value being stored is too large for the destination format, a numeric overflow exception (#O) is generated and, if the exception is unmasked, no value is stored in the destination operand. If the value being stored is a denormal value, the denormal exception (#D) is not generated. This condition is simply signaled as a numeric underflow exception (#U) condition.\nIf the value being stored is ±0, ±∞, or a NaN, the least-significant bits of the significand and the exponent are trun-cated to fit the destination format. This operation preserves the value’s identity as a 0, ∞, or NaN.\nIf the destination operand is a non-empty register, the invalid-operation exception is not generated.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FTST", "Alias": [], "Brief": "TEST", "Description": "\nCompares the value in the ST(0) register with 0.0 and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (see table below).\nTable 3-50. FTST Results\n\n\nCondition\nC3\nC2\nC0\n\nST(0) > 0.0\n0\n0\n0\n\nST(0) < 0.0\n0\n0\n1\n\nST(0) = 0.0\n1\n0\n0\n\nUnordered\n1\n1\n1\nThis instruction performs an “unordered comparison.” An unordered comparison also checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). If the value in register ST(0) is a NaN or is in an undefined format, the condition flags are set to “unordered” and the invalid operation exception is gener-ated.\nThe sign of zero is ignored, so that (– 0.0 ← +0.0).\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FUCOM", "Alias": [ "FUCOMP", "FUCOMPP" ], "Brief": "Unordered Compare Floating Point Values", "Description": "\nPerforms an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results (see the table below). If no operand is specified, the contents of registers ST(0) and ST(1) are compared. The sign of zero is ignored, so that –0.0 is equal to +0.0.\nTable 3-51. FUCOM/FUCOMP/FUCOMPP Results\n\n\nComparison Results*\nC3\nC2\nC0\n\nST0 > ST(i)\n0\n0\n0\n\nST0 < ST(i)\n0\n0\n1\n\nST0 = ST(i)\n1\n0\n0\n\nUnordered\n1\n1\n1\nNOTES:\n*\nFlags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated.\nAn unordered comparison checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). The FUCOM/FUCOMP/FUCOMPP instructions perform the same operations as the FCOM/FCOMP/FCOMPP instructions. The only difference is that the FUCOM/FUCOMP/FUCOMPP instructions raise the invalid-arithmetic-operand exception (#IA) only when either or both operands are an SNaN or are in an unsupported format; QNaNs cause the condition code flags to be set to unordered, but do not cause an exception to be generated. The FCOM/FCOMP/FCOMPP instructions raise an invalid-operation exception when either or both of the operands are a NaN value of any kind or are in an unsupported format.\nAs with the FCOM/FCOMP/FCOMPP instructions, if the operation results in an invalid-arithmetic-operand exception being raised, the condition code flags are set only if the exception is masked.\nThe FUCOMP instruction pops the register stack following the comparison operation and the FUCOMPP instruction pops the register stack twice following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "WAIT", "Alias": [ "FWAIT" ], "Brief": "Wait", "Description": "\nCauses the processor to check for and handle pending, unmasked, floating-point exceptions before proceeding. (FWAIT is an alternate mnemonic for WAIT.)\nThis instruction is useful for synchronizing exceptions in critical sections of code. Coding a WAIT instruction after a floating-point instruction ensures that any unmasked floating-point exceptions the instruction may raise are handled before the processor can modify the instruction’s results. See the section titled “Floating-Point Exception Synchronization” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information on using the WAIT/FWAIT instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FXAM", "Alias": [], "Brief": "Examine ModR/M", "Description": "\nExamines the contents of the ST(0) register and sets the condition code flags C0, C2, and C3 in the FPU status word to indicate the class of value or number in the register (see the table below).\nTable 3-52. FXAM Results\n.\n\n\nClass\nC3\nC2\nC0\n\nUnsupported\n0\n0\n0\n\nNaN\n0\n0\n1\n\nNormal finite number\n0\n1\n0\n\nInfinity\n0\n1\n1\n\nZero\n1\n0\n0\n\nEmpty\n1\n0\n1\n\nDenormal number\n1\n1\n0\nThe C1 flag is set to the sign of the value in ST(0), regardless of whether the register is empty or full.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FXCH", "Alias": [], "Brief": "Exchange Register Contents", "Description": "\nExchanges the contents of registers ST(0) and ST(i). If no source operand is specified, the contents of ST(0) and ST(1) are exchanged.\nThis instruction provides a simple means of moving values in the FPU register stack to the top of the stack [ST(0)], so that they can be operated on by those floating-point instructions that can only operate on values in ST(0). For example, the following instruction sequence takes the square root of the third register from the top of the register stack:\nFXCH ST(3);\nFSQRT;\nFXCH ST(3);\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FXRSTOR", "Alias": [], "Brief": "Restore x87 FPU, MMX, XMM, and MXCSR State", "Description": "\nReloads the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image specified in the source operand. This data should have been written to memory previously using the FXSAVE instruction, and in the same format as required by the operating modes. The first byte of the data should be located on a 16-byte boundary. There are three distinct layouts of the FXSAVE state map: one for legacy and compatibility mode, a second format for 64-bit mode FXSAVE/FXRSTOR with REX.W=0, and the third format is for 64-bit mode with FXSAVE64/FXRSTOR64. Table 3-53 shows the layout of the legacy/compatibility mode state information in memory and describes the fields in the memory image for the FXRSTOR and FXSAVE instructions. Table 3-56 shows the layout of the 64-bit mode state information when REX.W is set (FXSAVE64/FXRSTOR64). Table 3-57 shows the layout of the 64-bit mode state information when REX.W is clear (FXSAVE/FXRSTOR).\nThe state image referenced with an FXRSTOR instruction must have been saved using an FXSAVE instruction or be in the same format as required by Table 3-53, Table 3-56, or Table 3-57. Referencing a state image saved with an FSAVE, FNSAVE instruction or incompatible field layout will result in an incorrect state restoration.\nThe FXRSTOR instruction does not flush pending x87 FPU exceptions. To check and raise exceptions when loading x87 FPU state information with the FXRSTOR instruction, use an FWAIT instruction after the FXRSTOR instruction.\nIf the OSFXSR bit in control register CR4 is not set, the FXRSTOR instruction may not restore the states of the XMM and MXCSR registers. This behavior is implementation dependent.\nIf the MXCSR state contains an unmasked exception with a corresponding status flag also set, loading the register with the FXRSTOR instruction will not result in a SIMD floating-point error condition being generated. Only the next occurrence of this unmasked exception will result in the exception being generated.\nBits 16 through 32 of the MXCSR register are defined as reserved and should be set to 0. Attempting to write a 1 in any of these bits from the saved state image will result in a general protection exception (#GP) being generated.\nBytes 464:511 of an FXSAVE image are available for software use. FXRSTOR ignores the content of bytes 464:511 in an FXSAVE state image.\n" }, { "Name": "FXSAVE", "Alias": [], "Brief": "Save x87 FPU, MMX Technology, and SSE State", "Description": "\nSaves the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory loca-tion specified in the destination operand. The content layout of the 512 byte region depends on whether the processor is operating in non-64-bit operating modes or 64-bit sub-mode of IA-32e mode.\nBytes 464:511 are available to software use. The processor does not write to bytes 464:511 of an FXSAVE area.\nThe operation of FXSAVE in non-64-bit modes is described first.\n" }, { "Name": "FXTRACT", "Alias": [], "Brief": "Extract Exponent and Significand", "Description": "\nSeparates the source value in the ST(0) register into its exponent and significand, stores the exponent in ST(0), and pushes the significand onto the register stack. Following this operation, the new top-of-stack register ST(0) contains the value of the original significand expressed as a floating-point value. The sign and significand of this value are the same as those found in the source operand, and the exponent is 3FFFH (biased value for a true expo-nent of zero). The ST(1) register contains the value of the original operand’s true (unbiased) exponent expressed as a floating-point value. (The operation performed by this instruction is a superset of the IEEE-recommended logb(x) function.)\nThis instruction and the F2XM1 instruction are useful for performing power and range scaling operations. The FXTRACT instruction is also useful for converting numbers in double extended-precision floating-point format to decimal representations (e.g., for printing or displaying).\nIf the floating-point zero-divide exception (#Z) is masked and the source operand is zero, an exponent value of – ∞ is stored in register ST(1) and 0 with the sign of the source operand is stored in register ST(0).\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FYL2X", "Alias": [], "Brief": "Compute y ∗ log2x", "Description": "\nComputes (ST(1) ∗ log2 (ST(0))), stores the result in resister ST(1), and pops the FPU register stack. The source operand in ST(0) must be a non-zero positive number.\nThe following table shows the results obtained when taking the log of various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-58. FYL2X Results\n\n\n\n\n\n\n\nST(0)\n\n\n\n\n\n\n\n− ∞\n− F\n±0\n+0<+F<+1\n+ 1\n+ F > + 1\n+ ∞\nNaN\n\n\n− ∞\n*\n*\n+ ∞\n+ ∞\n*\n− ∞\n− ∞\nNaN\n\nST(1)\n− F\n*\n*\n**\n+ F\n− 0\n− F\n− ∞\nNaN\n\n\n− 0\n*\n*\n*\n+ 0\n− 0\n− 0\n*\nNaN\n\n\n+ 0\n*\n*\n*\n− 0\n+ 0\n+ 0\n*\nNaN\n\n\n+ F\n*\n*\n**\n− F\n+ 0\n+ F\n+ ∞\nNaN\n\n\n+ ∞\n*\n*\n− ∞\n− ∞\n*\n+ ∞\n+ ∞\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-operation (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nIf the divide-by-zero exception is masked and register ST(0) contains ±0, the instruction returns ∞ with a sign that is the opposite of the sign of the source operand in register ST(1).\nThe FYL2X instruction is designed with a built-in multiplication to optimize the calculation of logarithms with an arbitrary positive base (b):\nlogbx ← (log2b)–1 ∗ log2x\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "FYL2XP1", "Alias": [], "Brief": "Compute y ∗ log2(x +1)", "Description": "\nComputes (ST(1) ∗ log2(ST(0) + 1.0)), stores the result in register ST(1), and pops the FPU register stack. The source operand in ST(0) must be in the range:\n\n\n\t-\n\t(\n\t1\n\t-\n\t\n\t\t2\n\t\t2\n\t\n\t)\n\t to \n\t(\n\t1\n\t-\n\t\n\t\t2\n\t\t2\n\t\n\t)\n\n\nThe source operand in ST(1) can range from −∞ to +∞. If the ST(0) operand is outside of its acceptable range, the result is undefined and software should not rely on an exception being generated. Under some circumstances exceptions may be generated when ST(0) is out of range, but this behavior is implementation specific and not guaranteed.\nThe following table shows the results obtained when taking the log epsilon of various classes of numbers, assuming that underflow does not occur.\nTable 3-59. FYL2XP1 Results\n\n\n\n\n\n\nST(0)\n\n\n\n\n\n\n\n\t-\n\t(\n\t1\n\t-\n\t\n\t\t2\n\t\t2\n\t\n\t)\n\t to \n\t-\n\t0\n\n\n-0\n+0\n\n\n\t+\n\t0\n\t to \n\t+\n\t(\n\t1\n\t-\n\t\n\t\t2\n\t\t2\n\t\n\t)\n\n\nNaN\n\n\n− ∞\n+∞\n*\n*\n− ∞\nNaN\n\nST(1)\n− F\n+F\n+0\n-0\n− F\nNaN\n\n\n− 0\n+0\n+0\n-0\n− 0\nNaN\n\n\n+0\n− 0\n− 0\n+0\n+0\nNaN\n\n\n+F\n− F\n− 0\n+0\n+F\nNaN\n\n\n+∞\n− ∞\n*\n*\n+∞\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-operation (#IA) exception.\nThis instruction provides optimal accuracy for values of epsilon [the value in register ST(0)] that are close to 0. For small epsilon (ε) values, more significant digits can be retained by using the FYL2XP1 instruction than by using (ε+1) as an argument to the FYL2X instruction. The (ε+1) expression is commonly found in compound interest and annuity calculations. The result can be simply converted into a value in another logarithm base by including a scale factor in the ST(1) source operand. The following equation is used to calculate the scale factor for a particular loga-rithm base, where n is the logarithm base desired for the result of the FYL2XP1 instruction:\nscale factor ← logn 2\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "HADDPD", "Alias": [], "Brief": "Packed Double", "Description": "\nAdds the double-precision floating-point values in the high and low quadwords of the destination operand and stores the result in the low quadword of the destination operand.\nAdds the double-precision floating-point values in the high and low quadwords of the source operand and stores the result in the high quadword of the destination operand.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nSee Figure 3-15 for HADDPD; see Figure 3-16 for VHADDPD.\n\nHADDPD xmm1, xmm2/m128\nxmm2\n[127:64]\n[63:0]\n/m128\n[127:64]\n[63:0]\nxmm1\nxmm2/m128[63:0] +\nResult:\nxmm1[63:0] + xmm1[127:64]\nxmm2/m128[127:64]\nxmm1\n[127:64]\n[63:0]\nOM15993\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-15. HADDPD—Packed Double-FP Horizontal Add\nX3\nX2\nX1\nX0\nSRC1\nY3\nY2\nY1\nY0\nSRC2\nDEST\nY2 + Y3\nX2 + X3\nY0 + Y1\nX0 + X1\nFigure 3-16. VHADDPD operation\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "HADDPS", "Alias": [], "Brief": "Packed Single", "Description": "\nAdds the single-precision floating-point values in the first and second dwords of the destination operand and stores the result in the first dword of the destination operand.\nAdds single-precision floating-point values in the third and fourth dword of the destination operand and stores the result in the second dword of the destination operand.\nAdds single-precision floating-point values in the first and second dword of the source operand and stores the result in the third dword of the destination operand.\nAdds single-precision floating-point values in the third and fourth dword of the source operand and stores the result in the fourth dword of the destination operand.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nSee Figure 3-17 for HADDPS; see Figure 3-18 for VHADDPS.\n\nHADDPS xmm1, xmm2/m128\nxmm2/\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nm128\nxmm1\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nxmm2/m128\nxmm2/m128\nRESULT:\nxmm1[95:64] +\nxmm1[31:0] +\n[95:64] + xmm2/\n[31:0] + xmm2/\nxmm1\nxmm1[127:96]\nxmm1[63:32]\nm128[127:96]\nm128[63:32]\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nOM15994\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-17. HADDPS—Packed Single-FP Horizontal Add\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\nSRC1\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\nSRC2\nDEST\nY6+Y7\nY4+Y5\nX6+X7\nX4+X5\nY2+Y3\nY0+Y1\nX2+X3\nX0+X1\nFigure 3-18. VHADDPS operation\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "HLT", "Alias": [], "Brief": "Halt", "Description": "\nStops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer (CS:EIP) points to the instruction following the HLT instruction.\nWhen a HLT instruction is executed on an Intel 64 or IA-32 processor supporting Intel Hyper-Threading Technology, only the logical processor that executes the instruction is halted. The other logical processors in the physical processor remain active, unless they are each individually halted by executing a HLT instruction.\nThe HLT instruction is a privileged instruction. When the processor is running in protected or virtual-8086 mode, the privilege level of a program or procedure must be 0 to execute the HLT instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "HSUBPD", "Alias": [], "Brief": "Packed Double", "Description": "\nThe HSUBPD instruction subtracts horizontally the packed DP FP numbers of both operands.\nSubtracts the double-precision floating-point value in the high quadword of the destination operand from the low quadword of the destination operand and stores the result in the low quadword of the destination operand.\nSubtracts the double-precision floating-point value in the high quadword of the source operand from the low quad-word of the source operand and stores the result in the high quadword of the destination operand.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nSee Figure 3-19 for HSUBPD; see Figure 3-20 for VHSUBPD.\n\nHSUBPD xmm1, xmm2/m128\nxmm2\n[127:64]\n[63:0]\n/m128\nxmm1\n[127:64]\n[63:0]\nResult:\nxmm2/m128[63:0] -\nxmm1[63:0] - xmm1[127:64]\nxmm1\nxmm2/m128[127:64]\n[127:64]\n[63:0]\nOM15995\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-19. HSUBPD—Packed Double-FP Horizontal Subtract\nX3\nX2\nX1\nX0\nSRC1\nY3\nY2\nY1\nY0\nSRC2\nDEST\nY2 - Y3\nX2 - X3\nY0 - Y1\nX0 - X1\nFigure 3-20. VHSUBPD operation\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "HSUBPS", "Alias": [], "Brief": "Packed Single", "Description": "\nSubtracts the single-precision floating-point value in the second dword of the destination operand from the first dword of the destination operand and stores the result in the first dword of the destination operand.\nSubtracts the single-precision floating-point value in the fourth dword of the destination operand from the third dword of the destination operand and stores the result in the second dword of the destination operand.\nSubtracts the single-precision floating-point value in the second dword of the source operand from the first dword of the source operand and stores the result in the third dword of the destination operand.\nSubtracts the single-precision floating-point value in the fourth dword of the source operand from the third dword of the source operand and stores the result in the fourth dword of the destination operand.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nSee Figure 3-21 for HSUBPS; see Figure 3-22 for VHSUBPS.\n\nHSUBPS xmm1, xmm2/m128\nxmm2/\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nm128\nxmm1\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nxmm2/m128\nxmm2/m128\nRESULT:\nxmm1[95:64] -\nxmm1[31:0] -\n[95:64] - xmm2/\n[31:0] - xmm2/\nxmm1\nxmm1[127:96]\nxmm1[63:32]\nm128[127:96]\nm128[63:32]\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nOM15996\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-21. HSUBPS—Packed Single-FP Horizontal Subtract\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\nSRC1\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\nSRC2\nDEST\nY6-Y7\nY4-Y5\nX6-X7\nX4-X5\nY2-Y3\nX2-X3\nX0-X1\nFigure 3-22. VHSUBPS operation\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "IDIV", "Alias": [], "Brief": "Signed Divide", "Description": "\nDivides the (signed) value in the AX, DX:AX, or EDX:EAX (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, or EDX:EAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor).\nNon-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magni-tude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. In 64-bit mode when REX.W is applied, the instruction divides the signed value in RDX:RAX by the source operand. RAX contains a 64-bit quotient; RDX contains a 64-bit remainder.\nSee the summary chart at the beginning of this section for encoding data and limits. See Table 3-60.\nTable 3-60. IDIV Results\n\n\nOperand Size\nDividend\nDivisor\nQuotient\nRemainder\nQuotient Range\n\n\nWord/byte\nDoubleword/word\nQuadword/doubleword\nDoublequadword/ quadword\n\nAX\nDX:AX\nEDX:EAX\nRDX:RAX\n\nr/m8\nr/m16\nr/m32\nr/m64\n\nAL\nAX\nEAX\nRAX\n\nAH\nDX\nEDX\nRDX\n\n−128 to +127\n−32,768 to +32,767\n−231 to 231 − 1\n−263 to 263 − 1\n" }, { "Name": "IMUL", "Alias": [], "Brief": "Signed Multiply", "Description": "\nPerforms a signed multiplication of two operands. This instruction has three forms, depending on the number of operands.\nWhen an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.\nThe CF and OF flags are set when the signed integer value of the intermediate product differs from the sign extended operand-size-truncated product, otherwise the CF and OF flags are cleared.\nThe three forms of the IMUL instruction are similar in that the length of the product is calculated to twice the length of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and three- operand forms, however, the result is truncated to the length of the destination before it is stored in the destination register. Because of this truncation, the CF or OF flag should be tested to ensure that no significant bits are lost.\nThe two- and three-operand forms may also be used with unsigned operands because the lower half of the product is the same regardless if the operands are signed or unsigned. The CF and OF flags, however, cannot be used to determine if the upper half of the result is non-zero.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. Use of REX.W modifies the three forms of the instruction as follows.\n" }, { "Name": "IN", "Alias": [], "Brief": "Input from Port", "Description": "\nCopies the value from the I/O port specified with the second operand (source operand) to the destination operand (first operand). The source operand can be a byte-immediate or the DX register; the destination operand can be register AL, AX, or EAX, depending on the size of the port being accessed (8, 16, or 32 bits, respectively). Using the DX register as a source operand allows I/O port addresses from 0 to 65,535 to be accessed; using a byte imme-diate allows I/O port addresses 0 to 255 to be accessed.\nWhen accessing an 8-bit I/O port, the opcode determines the port size; when accessing a 16- and 32-bit I/O port, the operand-size attribute determines the port size. At the machine code level, I/O instructions are shorter when accessing 8-bit I/O ports. Here, the upper eight bits of the port address will be 0.\nThis instruction is only useful for accessing I/O ports located in the processor’s I/O address space. See Chapter 16, “Input/Output,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more infor-mation on accessing I/O ports in the I/O address space.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "INC", "Alias": [], "Brief": "Increment by 1", "Description": "\nAdds 1 to the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (Use a ADD instruction with an immediate operand of 1 to perform an increment operation that does updates the CF flag.)\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, INC r16 and INC r32 are not encodable (because opcodes 40H through 47H are REX prefixes). Otherwise, the instruction’s 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.\n" }, { "Name": "INS", "Alias": [ "INSB", "INSW", "INSD" ], "Brief": "Input from Port to String", "Description": "\nCopies the data from the I/O port specified with the source operand (second operand) to the destination operand (first operand). The source operand is an I/O port address (from 0 to 65,535) that is read from the DX register. The destination operand is a memory location, the address of which is read from either the ES:DI, ES:EDI or the RDI registers (depending on the address-size attribute of the instruction, 16, 32 or 64, respectively). (The ES segment cannot be overridden with a segment override prefix.) The size of the I/O port being accessed (that is, the size of the source and destination operands) is determined by the opcode for an 8-bit I/O port or by the operand-size attri-bute of the instruction for a 16- or 32-bit I/O port.\nAt the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the INS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source operand must be “DX,” and the destination operand should be a symbol that indicates the size of the I/O port and the destination address. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the destination operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the ES:(E)DI registers, which must be loaded correctly before the INS instruction is executed.\nThe no-operands form provides “short forms” of the byte, word, and doubleword versions of the INS instructions. Here also DX is assumed by the processor to be the source operand and ES:(E)DI is assumed to be the destination operand. The size of the I/O port is specified with the choice of mnemonic: INSB (byte), INSW (word), or INSD (doubleword).\nAfter the byte, word, or doubleword is transfer from the I/O port to the memory location, the DI/EDI/RDI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)DI register is incremented; if the DF flag is 1, the (E)DI register is decremented.) The (E)DI register is incremented or decremented by 1 for byte operations, by 2 for word operations, or by 4 for doubleword operations.\nThe INS, INSB, INSW, and INSD instructions can be preceded by the REP prefix for block input of ECX bytes, words, or doublewords. See “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, for a description of the REP prefix.\nThese instructions are only useful for accessing I/O ports located in the processor’s I/O address space. See Chapter 16, “Input/Output,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information on accessing I/O ports in the I/O address space.\nIn 64-bit mode, default address size is 64 bits, 32 bit address size is supported using the prefix 67H. The address of the memory destination is specified by RDI or EDI. 16-bit address size is not supported in 64-bit mode. The operand size is not promoted.\n" }, { "Name": "INSERTPS", "Alias": [], "Brief": "Insert Packed Single Precision Floating", "Description": "\n(register source form)\nSelect a single precision floating-point element from second source as indicated by Count_S bits of the immediate operand and insert it into the first source at the location indicated by the Count_D bits of the immediate operand. Store in the destination and zero out destination elements based on the ZMask bits of the immediate operand.\n(memory source form)\nLoad a floating-point element from a 32-bit memory location and insert it into the first source at the location indi-cated by the Count_D bits of the immediate operand. Store in the destination and zero out destination elements based on the ZMask bits of the immediate operand.\n128-bit Legacy SSE version: The first source register is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version. The destination and first source register is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nIf VINSERTPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "INT n", "Alias": [ "INTO", "INT 3" ], "Brief": "Call to Interrupt Procedure", "Description": "\nThe INT n instruction generates a call to the interrupt or exception handler specified with the destination operand (see the section titled “Interrupts and Exceptions” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1). The destination operand specifies a vector from 0 to 255, encoded as an 8-bit unsigned intermediate value. Each vector provides an index to a gate descriptor in the IDT. The first 32 vectors are reserved by Intel for system use. Some of these vectors are used for internally generated exceptions.\nThe INT n instruction is the general mnemonic for executing a software-generated call to an interrupt handler. The INTO instruction is a special mnemonic for calling overflow exception (#OF), exception 4. The overflow interrupt checks the OF flag in the EFLAGS register and calls the overflow interrupt handler if the OF flag is set to 1. (The INTO instruction cannot be used in 64-bit mode.)\nThe INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the debug exception handler. (This one byte form is valuable because it can be used to replace the first byte of any instruction with a breakpoint, including other one byte instructions, without over-writing other code). To further support its function as a debug breakpoint, the interrupt generated with the CC opcode also differs from the regular software interrupts as follows:\nNote that the “normal” 2-byte opcode for INT 3 (CD03) does not have these special features. Intel and Microsoft assemblers will not generate the CD03 opcode from any mnemonic, but this opcode can be created by direct numeric code definition or by self-modifying code.\nThe action of the INT n instruction (including the INTO and INT 3 instructions) is similar to that of a far call made with the CALL instruction. The primary difference is that with the INT n instruction, the EFLAGS register is pushed onto the stack before the return address. (The return address is a far address consisting of the current values of the CS and EIP registers.) Returns from interrupt procedures are handled with the IRET instruction, which pops the EFLAGS information and return address from the stack.\nThe vector specifies an interrupt descriptor in the interrupt descriptor table (IDT); that is, it provides index into the IDT. The selected interrupt descriptor in turn contains a pointer to an interrupt or exception handler procedure. In protected mode, the IDT contains an array of 8-byte descriptors, each of which is an interrupt gate, trap gate, or task gate. In real-address mode, the IDT is an array of 4-byte far pointers (2-byte code segment selector and a 2-byte instruction pointer), each of which point directly to a procedure in the selected segment. (Note that in real-address mode, the IDT is called the interrupt vector table, and its pointers are called interrupt vectors.)\nThe following decision table indicates which action in the lower portion of the table is taken given the conditions in the upper portion of the table. Each Y in the lower section of the decision table represents a procedure defined in the “Operation” section for this instruction (except #GP).\nTable 3-61. Decision Table\n\n\nPE\n0\n1\n1\n1\n1\n1\n1\n1\nTable 3-61. Decision Table (Contd.)\n\n\nVM\n–\n–\n–\n–\n–\n0\n1\n1\n\nIOPL\n–\n–\n–\n–\n–\n–\n<3\n=3\n\nDPL/CPL RELATIONSHIP\n–\nDPL< CPL\n–\nDPL> CPL\nDPL= CPL or C\nDPL< CPL & NC\n–\n–\n\nINTERRUPT TYPE\n–\nS/W\n–\n–\n–\n–\n–\n–\n\nGATE TYPE\n–\n–\nTask\nTrap or Interrupt\nTrap or Interrupt\nTrap or Interrupt\nTrap or Interrupt\nTrap or Interrupt\n\nREAL-ADDRESS-MODE\nY\n\n\n\n\n\n\n\n\nPROTECTED-MODE\n\nY\nY\nY\nY\nY\nY\nY\n\nTRAP-OR-INTERRUPT-GATE\n\n\n\nY\nY\nY\nY\nY\n\nINTER-PRIVILEGE-LEVEL-INTERRUPT\n\n\n\n\n\nY\n\n\n\nINTRA-PRIVILEGE-LEVEL-INTERRUPT\n\n\n\n\nY\n\n\n\n\nINTERRUPT-FROM-VIRTUAL-8086-MODE\n\n\n\n\n\n\n\nY\n\nTASK-GATE\n\n\nY\n\n\n\n\n\n\n#GP\n\nY\n\nY\n\n\nY\n\nNOTES:\n−\nDon't Care.\nY\nYes, action taken.\nBlank\nAction not taken.\nWhen the processor is executing in virtual-8086 mode, the IOPL determines the action of the INT n instruction. If the IOPL is less than 3, the processor generates a #GP(selector) exception; if the IOPL is 3, the processor executes a protected mode interrupt to privilege level 0. The interrupt gate's DPL must be set to 3 and the target CPL of the interrupt handler procedure must be 0 to execute the protected mode interrupt to privilege level 0.\nThe interrupt descriptor table register (IDTR) specifies the base linear address and limit of the IDT. The initial base address value of the IDTR after the processor is powered up or reset is 0.\n" }, { "Name": "INVD", "Alias": [], "Brief": "Invalidate Internal Caches", "Description": "\nInvalidates (flushes) the processor’s internal caches and issues a special-function bus cycle that directs external caches to also flush themselves. Data held in internal caches is not written back to main memory.\nAfter executing this instruction, the processor does not wait for the external caches to complete their flushing oper-ation before proceeding with instruction execution. It is the responsibility of hardware to respond to the cache flush signal.\nThe INVD instruction is a privileged instruction. When the processor is running in protected mode, the CPL of a program or procedure must be 0 to execute this instruction.\nThe INVD instruction may be used when the cache is used as temporary memory and the cache contents need to be invalidated rather than written back to memory. When the cache is used as temporary memory, no external device should be actively writing data to main memory.\nUse this instruction with care. Data cached internally and not written back to main memory will be lost. Note that any data from an external device to main memory (for example, via a PCIWrite) can be temporarily stored in the caches; these data can be lost when an INVD instruction is executed. Unless there is a specific requirement or benefit to flushing caches without writing back modified cache lines (for example, temporary memory, testing, or fault recovery where cache coherency with main memory is not a concern), software should instead use the WBINVD instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "INVLPG", "Alias": [], "Brief": "Invalidate TLB Entries", "Description": "\nInvalidates any translation lookaside buffer (TLB) entries specified with the source operand. The source operand is a memory address. The processor determines the page that contains that address and flushes all TLB entries for that page.1\nThe INVLPG instruction is a privileged instruction. When the processor is running in protected mode, the CPL must be 0 to execute this instruction.\nThe INVLPG instruction normally flushes TLB entries only for the specified page; however, in some cases, it may flush more entries, even the entire TLB. The instruction is guaranteed to invalidates only TLB entries associated with the current PCID. (If PCIDs are disabled — CR4.PCIDE = 0 — the current PCID is 000H.) The instruction also invalidates any global TLB entries for the specified page, regardless of PCID.\nFor more details on operations that flush the TLB, see “MOV—Move to/from Control Registers” and Section 4.10.4.1, “Operations that Invalidate TLBs and Paging-Structure Caches,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\nThis instruction’s operation is the same in all non-64-bit modes. It also operates the same in 64-bit mode, except if the memory address is in non-canonical form. In this case, INVLPG is the same as a NOP.\n" }, { "Name": "INVPCID", "Alias": [], "Brief": "Invalidate Process", "Description": "\nInvalidates mappings in the translation lookaside buffers (TLBs) and paging-structure caches based on process-context identifier (PCID). (See Section 4.10, “Caching Translation Information,” in Intel 64 and IA-32 Architecture Software Developer’s Manual, Volume 3A.) Invalidation is based on the INVPCID type specified in the register operand and the INVPCID descriptor specified in the memory operand.\nOutside 64-bit mode, the register operand is always 32 bits, regardless of the value of CS.D. In 64-bit mode the register operand has 64 bits.\nThere are four INVPCID types currently defined:\nThe INVPCID descriptor comprises 128 bits and consists of a PCID and a linear address as shown in Figure 3-23. For INVPCID type 0, the processor uses the full 64 bits of the linear address even outside 64-bit mode; the linear address is not used for other INVPCID types.\n1.\nIf the paging structures map the linear address using a page larger than 4 KBytes and there are multiple TLB entries for that page (see Section 4.10.2.3, “Details of TLB Use,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A), the instruction invalidates all of them.\n\n127\n64 63\n12\n11\n0\n\nLinear Address\nReserved (must be zero)\nPCID\n\nFigure 3-23. INVPCID Descriptor\nIf CR4.PCIDE = 0, a logical processor does not cache information for any PCID other than 000H. In this case, executions with INVPCID types 0 and 1 are allowed only if the PCID specified in the INVPCID descriptor is 000H; executions with INVPCID types 2 and 3 invalidate mappings only for PCID 000H. Note that CR4.PCIDE must be 0 outside 64-bit mode (see Chapter 4.10.1, “Process-Context Identifiers (PCIDs)‚” of the Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 3A).\n" }, { "Name": "IRET", "Alias": [ "IRETD" ], "Brief": "Interrupt Return", "Description": "\nReturns program control from an exception or interrupt handler to a program or procedure that was interrupted by an exception, an external interrupt, or a software-generated interrupt. These instructions are also used to perform a return from a nested task. (A nested task is created when a CALL instruction is used to initiate a task switch or when an interrupt or exception causes a task switch to an interrupt or exception handler.) See the section titled “Task Linking” in Chapter 7 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\nIRET and IRETD are mnemonics for the same opcode. The IRETD mnemonic (interrupt return double) is intended for use when returning from an interrupt when using the 32-bit operand size; however, most assemblers use the IRET mnemonic interchangeably for both operand sizes.\nIn Real-Address Mode, the IRET instruction preforms a far return to the interrupted program or procedure. During this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure.\nIn Protected Mode, the action of the IRET instruction depends on the settings of the NT (nested task) and VM flags in the EFLAGS register and the VM flag in the EFLAGS image stored on the current stack. Depending on the setting of these flags, the processor performs the following types of interrupt returns:\nIf the NT flag (EFLAGS register) is cleared, the IRET instruction performs a far return from the interrupt procedure, without a task switch. The code segment being returned to must be equally or less privileged than the interrupt handler routine (as indicated by the RPL field of the code segment selector popped from the stack).\nAs with a real-address mode interrupt return, the IRET instruction pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure. If the return is to another privilege level, the IRET instruction also pops the stack pointer and SS from the stack, before resuming program execution. If the return is to virtual-8086 mode, the processor also pops the data segment registers from the stack.\nIf the NT flag is set, the IRET instruction performs a task switch (return) from a nested task (a task called with a CALL instruction, an interrupt, or an exception) back to the calling or interrupted task. The updated state of the task executing the IRET instruction is saved in its TSS. If the task is re-entered later, the code that follows the IRET instruction is executed.\nIf the NT flag is set and the processor is in IA-32e mode, the IRET instruction causes a general protection excep-tion.\nIf nonmaskable interrupts (NMIs) are blocked (see Section 6.7.1, “Handling Multiple NMIs” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A), execution of the IRET instruction unblocks NMIs.\nThis unblocking occurs even if the instruction causes a fault. In such a case, NMIs are unmasked before the excep-tion handler is invoked.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.W prefix promotes operation to 64 bits (IRETQ). See the summary chart at the beginning of this section for encoding data and limits.\nSee “Changes to Instruction Behavior in VMX Non-Root Operation” in Chapter 25 of the Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.\n" }, { "Name": "JMP", "Alias": [], "Brief": "Jump", "Description": "\nTransfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.\nThis instruction can be used to execute four different types of jumps:\nA task switch can only be executed in protected mode (see Chapter 7, in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on performing task switches with the JMP instruction).\nNear and Short Jumps. When executing a near jump, the processor jumps to the address (within the current code segment) that is specified with the target operand. The target operand specifies either an absolute offset (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current\nvalue of the instruction pointer in the EIP register). A near jump to a relative offset of 8-bits (rel8) is referred to as a short jump. The CS register is not changed on near and short jumps.\nAn absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16 or r/m32). The operand-size attribute determines the size of the target operand (16 or 32 bits). Absolute offsets are loaded directly into the EIP register. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits.\nA relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed 8-, 16-, or 32-bit immediate value. This value is added to the value in the EIP register. (Here, the EIP register contains the address of the instruction following the JMP instruction). When using relative offsets, the opcode (for short vs. near jumps) and the operand-size attribute (for near relative jumps) determines the size of the target operand (8, 16, or 32 bits).\nFar Jumps in Real-Address or Virtual-8086 Mode. When executing a far jump in real-address or virtual-8086 mode, the processor jumps to the code segment and offset specified with the target operand. Here the target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and address of the called procedure is encoded in the instruction, using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address imme-diate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared.\nFar Jumps in Protected Mode. When the processor is operating in protected mode, the JMP instruction can be used to perform the following three types of far jumps:\n(The JMP instruction cannot be used to perform inter-privilege-level far jumps.)\nIn protected mode, the processor always uses the segment selector part of the far address to access the corre-sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access rights determine the type of jump to be performed.\nIf the selected descriptor is for a code segment, a far jump to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far jump to the same privilege level in protected mode is very similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register, and the offset from the instruction is loaded into the EIP register. Note that a call gate (described in the next paragraph) can also be used to perform far call to a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making jumps between 16-bit and 32-bit code segments.\nWhen executing a far jump through a call gate, the segment selector specified by the target operand identifies the call gate. (The offset part of the target operand is ignored.) The processor then jumps to the code segment speci-fied in the call gate descriptor and begins executing the instruction at the offset specified in the call gate. No stack switch occurs. Here again, the target operand can specify the far address of the call gate either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32).\nExecuting a task switch with the JMP instruction is somewhat similar to executing a jump through a call gate. Here the target operand specifies the segment selector of the task gate for the task being switched to (and the offset part of the target operand is ignored). The task gate in turn points to the TSS for the task, which contains the segment selectors for the task’s code and stack segments. The TSS also contains the EIP value for the next instruc-tion that was to be executed before the task was suspended. This instruction pointer value is loaded into the EIP register so that the task begins executing again at this next instruction.\nThe JMP instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the task gate. See Chapter 7 in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for detailed information on the mechanics of a task switch.\nNote that when you execute at task switch with a JMP instruction, the nested task flag (NT) is not set in the EFLAGS register and the new TSS’s previous task link field is not loaded with the old task’s TSS selector. A return to the previous task can thus not be carried out by executing the IRET instruction. Switching tasks with the JMP instruc-tion differs in this regard from the CALL instruction which does set the NT flag and save the previous task link infor-mation, allowing a return to the calling task with an IRET instruction.\nIn 64-Bit Mode — The instruction’s operation size is fixed at 64 bits. If a selector points to a gate, then RIP equals the 64-bit displacement taken from gate; else RIP equals the zero-extended offset from the far pointer referenced in the instruction.\nSee the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "Jcc", "Alias": [ "JA", "JAE", "JB", "JBE", "JC", "JE", "JG", "JGE", "JL", "JLE", "JNA", "JNAE", "JNB", "JNBE", "JNC", "JNE", "JNG", "JNGE", "JNL", "JNLE", "JNO", "JNP", "JNS", "JNZ", "JO", "JP", "JPE", "JPO", "JS", "JZ" ], "Brief": "Jump if Condition Is Met", "Description": "\nChecks the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and, if the flags are in the specified state (condition), performs a jump to the target instruction specified by the destination operand. A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, the jump is not performed and execution continues with the instruction following the Jcc instruction.\nThe target instruction is specified with a relative offset (a signed offset relative to the current value of the instruc-tion pointer in the EIP register). A relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 8-bit or 32-bit immediate value, which is added to the instruction pointer. Instruction coding is most efficient for offsets of –128 to +127. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits.\nThe conditions for each Jcc mnemonic are given in the “Description” column of the table on the preceding page. The terms “less” and “greater” are used for comparisons of signed integers and the terms “above” and “below” are used for unsigned integers.\nBecause a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the JA (jump if above) instruction and the JNBE (jump if not below or equal) instruction are alternate mnemonics for the opcode 77H.\nThe Jcc instruction does not support far jumps (jumps to other code segments). When the target for the conditional jump is in a different segment, use the opposite condition from the condition being tested for the Jcc instruction, and then access the target with an unconditional far jump (JMP instruction) to the other segment. For example, the following conditional far jump is illegal:\nJZ FARLABEL;\nTo accomplish this far jump, use the following two instructions:\nJNZ BEYOND;\nJMP FARLABEL;\nBEYOND:\nThe JRCXZ, JECXZ and JCXZ instructions differ from other Jcc instructions because they do not check status flags. Instead, they check RCX, ECX or CX for 0. The register checked is determined by the address-size attribute. These instructions are useful when used at the beginning of a loop that terminates with a conditional loop instruction (such as LOOPNE). They can be used to prevent an instruction sequence from entering a loop when RCX, ECX or CX is 0. This would cause the loop to execute 264, 232 or 64K times (not zero times).\nAll conditional jumps are converted to code fetches of one or two cache lines, regardless of jump address or cache-ability.\nIn 64-bit mode, operand size is fixed at 64 bits. JMP Short is RIP = RIP + 8-bit offset sign extended to 64 bits. JMP Near is RIP = RIP + 32-bit offset sign extended to 64-bits.\n" }, { "Name": "LAHF", "Alias": [], "Brief": "Load Status Flags into AH Register", "Description": "\nThis instruction executes as described above in compatibility mode and legacy mode. It is valid in 64-bit mode only if CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1.\n" }, { "Name": "LAR", "Alias": [], "Brief": "Load Access Rights Byte", "Description": "\nLoads the access rights from the segment descriptor specified by the second operand (source operand) into the first operand (destination operand) and sets the ZF flag in the flag register. The source operand (which can be a register or a memory location) contains the segment selector for the segment descriptor being accessed. If the source operand is a memory address, only 16 bits of data are accessed. The destination operand is a general-purpose register.\nThe processor performs access checks as part of the loading process. Once loaded in the destination register, soft-ware can perform additional checks on the access rights information.\nThe access rights for a segment descriptor include fields located in the second doubleword (bytes 4–7) of the segment descriptor. The following fields are loaded by the LAR instruction:\n— Bits 19:16 are undefined.\n— Bit 20 returns the software-available bit in the descriptor.\n— Bit 21 returns the L flag.\n— Bit 22 returns the D/B flag.\n— Bit 23 returns the G flag.\n— Bits 31:24 are returned as 0.\nThis instruction performs the following checks before it loads the access rights in the destination register:\nIf the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no access rights are loaded in the destination operand.\nThe LAR instruction can only be executed in protected mode and IA-32e mode.\nTable 3-62. Segment and Gate Types\n\n\nType\nProtected Mode\n\nIA-32e Mode\n\n\n\nName\nValid\nName\nValid\n\n\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\nC\nD\nE\nF\n\nReserved\nAvailable 16-bit TSS\nLDT\nBusy 16-bit TSS\n16-bit call gate\n16-bit/32-bit task gate\n16-bit interrupt gate\n16-bit trap gate\nReserved\nAvailable 32-bit TSS\nReserved\nBusy 32-bit TSS\n32-bit call gate\nReserved\n32-bit interrupt gate\n32-bit trap gate\n\nNo\nYes\nYes\nYes\nYes\nYes\nNo\nNo\nNo\nYes\nNo\nYes\nYes\nNo\nNo\nNo\n\nReserved\nReserved\nLDT\nReserved\nReserved\nReserved\nReserved\nReserved\nReserved\nAvailable 64-bit TSS\nReserved\nBusy 64-bit TSS\n64-bit call gate\nReserved\n64-bit interrupt gate\n64-bit trap gate\n\nNo\nNo\nNo\nNo\nNo\nNo\nNo\nNo\nNo\nYes\nNo\nYes\nYes\nNo\nNo\nNo\n" }, { "Name": "LDDQU", "Alias": [], "Brief": "Load Unaligned Integer 128 Bits", "Description": "\nThe instruction is functionally similar to (V)MOVDQU ymm/xmm, m256/m128 for loading from memory. That is: 32/16 bytes of data starting at an address specified by the source memory operand (second operand) are fetched from memory and placed in a destination register (first operand). The source operand need not be aligned on a 32/16-byte boundary. Up to 64/32 bytes may be loaded from memory; this is implementation dependent.\nThis instruction may improve performance relative to (V)MOVDQU if the source operand crosses a cache line boundary. In situations that require the data loaded by (V)LDDQU be modified and stored to the same location, use (V)MOVDQU or (V)MOVDQA instead of (V)LDDQU. To move a double quadword to or from memory locations that are known to be aligned on 16-byte boundaries, use the (V)MOVDQA instruction.\n" }, { "Name": "LDMXCSR", "Alias": [], "Brief": "Load MXCSR Register", "Description": "\nLoads the source operand into the MXCSR control/status register. The source operand is a 32-bit memory location. See “MXCSR Control and Status Register” in Chapter 10, of the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for a description of the MXCSR register and its contents.\nThe LDMXCSR instruction is typically used in conjunction with the (V)STMXCSR instruction, which stores the contents of the MXCSR register in memory.\nThe default MXCSR value at reset is 1F80H.\nIf a (V)LDMXCSR instruction clears a SIMD floating-point exception mask bit and sets the corresponding exception flag bit, a SIMD floating-point exception will not be immediately generated. The exception will be generated only upon the execution of the next instruction that meets both conditions below:\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\nIf VLDMXCSR is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "LDS", "Alias": [ "LES", "LFS", "LGS", "LSS" ], "Brief": "Load Far Pointer", "Description": "\nLoads a far pointer (segment selector and offset) from the second operand (source operand) into a segment register and the first operand (destination operand). The source operand specifies a 48-bit or a 32-bit pointer in memory depending on the current setting of the operand-size attribute (32 bits or 16 bits, respectively). The instruction opcode and the destination operand specify a segment register/general-purpose register pair. The 16-bit segment selector from the source operand is loaded into the segment register specified with the opcode (DS, SS, ES, FS, or GS). The 32-bit or 16-bit offset is loaded into the register specified with the destination operand.\nIf one of these instructions is executed in protected mode, additional information from the segment descriptor pointed to by the segment selector in the source operand is loaded in the hidden part of the selected segment register.\nAlso in protected mode, a NULL selector (values 0000 through 0003) can be loaded into DS, ES, FS, or GS registers without causing a protection exception. (Any subsequent reference to a segment whose corresponding segment register is loaded with a NULL selector, causes a general-protection exception (#GP) and no memory reference to the segment occurs.)\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.W promotes operation to specify a source operand referencing an 80-bit pointer (16-bit selector, 64-bit offset) in memory. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "LEA", "Alias": [], "Brief": "Load Effective Address", "Description": "\nComputes the effective address of the second operand (the source operand) and stores it in the first operand (destination operand). The source operand is a memory address (offset part) specified with one of the processors addressing modes; the destination operand is a general-purpose register. The address-size and operand-size attri-butes affect the action performed by this instruction, as shown in the following table. The operand-size attribute of the instruction is determined by the chosen register; the address-size attribute is determined by the attribute of the code segment.\nTable 3-63. Non-64-bit Mode LEA Operation with Address and Operand Size Attributes\n\n\nOperand Size\nAddress Size\nAction Performed\n\n16\n16\n16-bit effective address is calculated and stored in requested 16-bit register destination.\n\n16\n32\n32-bit effective address is calculated. The lower 16 bits of the address are stored in the requested 16-bit register destination.\n\n32\n16\n16-bit effective address is calculated. The 16-bit address is zero-extended and stored in the requested 32-bit register destination.\n\n32\n32\n32-bit effective address is calculated and stored in the requested 32-bit register destination.\nDifferent assemblers may use different algorithms based on the size attribute and symbolic reference of the source operand.\nIn 64-bit mode, the instruction’s destination operand is governed by operand size attribute, the default operand size is 32 bits. Address calculation is governed by address size attribute, the default address size is 64-bits. In 64-bit mode, address size of 16 bits is not encodable. See Table 3-64.\nTable 3-64. 64-bit Mode LEA Operation with Address and Operand Size Attributes\n\n\nOperand Size\nAddress Size\nAction Performed\n\n16\n32\n32-bit effective address is calculated (using 67H prefix). The lower 16 bits of the address are stored in the requested 16-bit register destination (using 66H prefix).\n\n16\n64\n64-bit effective address is calculated (default address size). The lower 16 bits of the address are stored in the requested 16-bit register destination (using 66H prefix).\n\n32\n32\n32-bit effective address is calculated (using 67H prefix) and stored in the requested 32-bit register destination.\n\n32\n64\n64-bit effective address is calculated (default address size) and the lower 32 bits of the address are stored in the requested 32-bit register destination.\n\n64\n32\n32-bit effective address is calculated (using 67H prefix), zero-extended to 64-bits, and stored in the requested 64-bit register destination (using REX.W).\n\n64\n64\n64-bit effective address is calculated (default address size) and all 64-bits of the address are stored in the requested 64-bit register destination (using REX.W).\n" }, { "Name": "LEAVE", "Alias": [], "Brief": "High Level Procedure Exit", "Description": "\nReleases the stack frame set up by an earlier ENTER instruction. The LEAVE instruction copies the frame pointer (in the EBP register) into the stack pointer register (ESP), which releases the stack space allocated to the stack frame. The old frame pointer (the frame pointer for the calling procedure that was saved by the ENTER instruction) is then popped from the stack into the EBP register, restoring the calling procedure’s stack frame.\nA RET instruction is commonly executed following a LEAVE instruction to return program control to the calling procedure.\nSee “Procedure Calls for Block-Structured Languages” in Chapter 7 of the Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1, for detailed information on the use of the ENTER and LEAVE instructions.\nIn 64-bit mode, the instruction’s default operation size is 64 bits; 32-bit operation cannot be encoded. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "LFENCE", "Alias": [], "Brief": "Load Fence", "Description": "\nPerforms a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruc-tion. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruc-tion begins execution until LFENCE completes. In particular, an instruction that loads from memory and that precedes an LFENCE receives data from memory prior to completion of the LFENCE. (An LFENCE that follows an instruction that stores to memory might complete before the data being stored have become globally visible.) Instructions following an LFENCE may be fetched from memory before the LFENCE, but they will not execute until the LFENCE completes.\nWeakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue and speculative reads. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The LFENCE instruction provides a performance-efficient way of ensuring load ordering between routines that produce weakly-ordered results and routines that consume that data.\nProcessors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, and WT memory types. This speculative fetching can occur at any time and is not tied to instruction execution. Thus, it is not ordered with respect to executions of the LFENCE instruction; data can be brought into the caches specula-tively just before, during, or after the execution of an LFENCE instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\nSpecification of the instruction's opcode above indicates a ModR/M byte of E8. For this instruction, the processor ignores the r/m field of the ModR/M byte. Thus, LFENCE is encoded by any opcode of the form 0F AE Ex, where x is in the range 8-F.\n" }, { "Name": "LGDT", "Alias": [ "LIDT" ], "Brief": "Load Global/Interrupt Descriptor Table Register", "Description": "\nLoads the values in the source operand into the global descriptor table register (GDTR) or the interrupt descriptor table register (IDTR). The source operand specifies a 6-byte memory location that contains the base address (a linear address) and the limit (size of table in bytes) of the global descriptor table (GDT) or the interrupt descriptor table (IDT). If operand-size attribute is 32 bits, a 16-bit limit (lower 2 bytes of the 6-byte data operand) and a 32-bit base address (upper 4 bytes of the data operand) are loaded into the register. If the operand-size attribute is 16 bits, a 16-bit limit (lower 2 bytes) and a 24-bit base address (third, fourth, and fifth byte) are loaded. Here, the high-order byte of the operand is not used and the high-order byte of the base address in the GDTR or IDTR is filled with zeros.\nThe LGDT and LIDT instructions are used only in operating-system software; they are not used in application programs. They are the only instructions that directly load a linear address (that is, not a segment-relative address) and a limit in protected mode. They are commonly executed in real-address mode to allow processor initialization prior to switching to protected mode.\nIn 64-bit mode, the instruction’s operand size is fixed at 8+2 bytes (an 8-byte base and a 2-byte limit). See the summary chart at the beginning of this section for encoding data and limits.\nSee “SGDT—Store Global Descriptor Table Register” in Chapter 4, Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, for information on storing the contents of the GDTR and IDTR.\n" }, { "Name": "LLDT", "Alias": [], "Brief": "Load Local Descriptor Table Register", "Description": "\nLoads the source operand into the segment selector field of the local descriptor table register (LDTR). The source operand (a general-purpose register or a memory location) contains a segment selector that points to a local descriptor table (LDT). After the segment selector is loaded in the LDTR, the processor uses the segment selector to locate the segment descriptor for the LDT in the global descriptor table (GDT). It then loads the segment limit and base address for the LDT from the segment descriptor into the LDTR. The segment registers DS, ES, SS, FS, GS, and CS are not affected by this instruction, nor is the LDTR field in the task state segment (TSS) for the current task.\nIf bits 2-15 of the source operand are 0, LDTR is marked invalid and the LLDT instruction completes silently. However, all subsequent references to descriptors in the LDT (except by the LAR, VERR, VERW or LSL instructions) cause a general protection exception (#GP).\nThe operand-size attribute has no effect on this instruction.\nThe LLDT instruction is provided for use in operating-system software; it should not be used in application programs. This instruction can only be executed in protected mode or 64-bit mode.\nIn 64-bit mode, the operand size is fixed at 16 bits.\n" }, { "Name": "LMSW", "Alias": [], "Brief": "Load Machine Status Word", "Description": "\nLoads the source operand into the machine status word, bits 0 through 15 of register CR0. The source operand can be a 16-bit general-purpose register or a memory location. Only the low-order 4 bits of the source operand (which contains the PE, MP, EM, and TS flags) are loaded into CR0. The PG, CD, NW, AM, WP, NE, and ET flags of CR0 are not affected. The operand-size attribute has no effect on this instruction.\nIf the PE flag of the source operand (bit 0) is set to 1, the instruction causes the processor to switch to protected mode. While in protected mode, the LMSW instruction cannot be used to clear the PE flag and force a switch back to real-address mode.\nThe LMSW instruction is provided for use in operating-system software; it should not be used in application programs. In protected or virtual-8086 mode, it can only be executed at CPL 0.\nThis instruction is provided for compatibility with the Intel 286 processor; programs and procedures intended to run on the Pentium 4, Intel Xeon, P6 family, Pentium, Intel486, and Intel386 processors should use the MOV (control registers) instruction to load the whole CR0 register. The MOV CR0 instruction can be used to set and clear the PE flag in CR0, allowing a procedure or program to switch between protected and real-address modes.\nThis instruction is a serializing instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode. Note that the operand size is fixed at 16 bits.\nSee “Changes to Instruction Behavior in VMX Non-Root Operation” in Chapter 25 of the Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.\n" }, { "Name": "LOCK", "Alias": [], "Brief": "Assert LOCK# Signal Prefix", "Description": "\nCauses the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted.\nNote that, in later Intel 64 and IA-32 processors (including the Pentium 4, Intel Xeon, and P6 family processors), locking may occur without the LOCK# signal being asserted. See the “IA-32 Architecture Compatibility” section below.\nThe LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception (#UD) may be generated. An undefined opcode exception will also be generated if the LOCK prefix is used with any instruction not in the above list. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix.\nThe LOCK prefix is typically used with the BTS instruction to perform a read-modify-write operation on a memory location in shared memory environment.\nThe integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "LODS", "Alias": [ "LODSB", "LODSW", "LODSD", "LODSQ" ], "Brief": "Load String", "Description": "\nLoads a byte, word, or doubleword from the source operand into the AL, AX, or EAX register, respectively. The source operand is a memory location, the address of which is read from the DS:ESI or the DS:SI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The DS segment may be over-ridden with a segment override prefix.\nAt the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the LODS mnemonic) allows the source operand to be specified explicitly. Here, the source operand should be a symbol that indicates the size and location of the source value. The destination operand is then automatically selected to match the size of the source operand (the AL register for byte operands, AX for word operands, and EAX for doubleword operands). This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the DS:(E)SI registers, which must be loaded correctly before the load string instruction is executed.\nThe no-operands form provides “short forms” of the byte, word, and doubleword versions of the LODS instructions. Here also DS:(E)SI is assumed to be the source operand and the AL, AX, or EAX register is assumed to be the desti-nation operand. The size of the source and destination operands is selected with the mnemonic: LODSB (byte loaded into register AL), LODSW (word loaded into AX), or LODSD (doubleword loaded into EAX).\nAfter the byte, word, or doubleword is transferred from the memory location into the AL, AX, or EAX register, the (E)SI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI register is incremented; if the DF flag is 1, the ESI register is decremented.) The (E)SI register is incremented or decremented by 1 for byte operations, by 2 for word operations, or by 4 for doubleword operations.\nIn 64-bit mode, use of the REX.W prefix promotes operation to 64 bits. LODS/LODSQ load the quadword at address (R)SI into RAX. The (R)SI register is then incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register.\nThe LODS, LODSB, LODSW, and LODSD instructions can be preceded by the REP prefix for block loads of ECX bytes, words, or doublewords. More often, however, these instructions are used within a LOOP construct because further processing of the data moved into the register is usually necessary before the next transfer can be made. See “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, for a description of the REP prefix.\n" }, { "Name": "LOOP", "Alias": [ "LOOPcc","LOOPE", "LOOPNE", "LOOPNZ", "LOOPZ" ], "Brief": "Loop According to ECX Counter", "Description": "\nPerforms a loop operation using the RCX, ECX or CX register as a counter (depending on whether address size is 64 bits, 32 bits, or 16 bits). Note that the LOOP instruction ignores REX.W; but 64-bit address size can be over-ridden using a 67H prefix.\nEach time the LOOP instruction is executed, the count register is decremented, then checked for 0. If the count is 0, the loop is terminated and program execution continues with the instruction following the LOOP instruction. If the count is not zero, a near jump is performed to the destination (target) operand, which is presumably the instruction at the beginning of the loop.\nThe target instruction is specified with a relative offset (a signed offset relative to the current value of the instruc-tion pointer in the IP/EIP/RIP register). This offset is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 8-bit immediate value, which is added to the instruction pointer. Offsets of –128 to +127 are allowed with this instruction.\nSome forms of the loop instruction (LOOPcc) also accept the ZF flag as a condition for terminating the loop before the count reaches zero. With these forms of the instruction, a condition code (cc) is associated with each instruc-tion to indicate the condition being tested for. Here, the LOOPcc instruction itself does not affect the state of the ZF flag; the ZF flag is changed by other instructions in the loop.\n" }, { "Name": "LSL", "Alias": [], "Brief": "Load Segment Limit", "Description": "\nLoads the unscrambled segment limit from the segment descriptor specified with the second operand (source operand) into the first operand (destination operand) and sets the ZF flag in the EFLAGS register. The source operand (which can be a register or a memory location) contains the segment selector for the segment descriptor being accessed. The destination operand is a general-purpose register.\nThe processor performs access checks as part of the loading process. Once loaded in the destination register, soft-ware can compare the segment limit with the offset of a pointer.\nThe segment limit is a 20-bit value contained in bytes 0 and 1 and in the first 4 bits of byte 6 of the segment descriptor. If the descriptor has a byte granular segment limit (the granularity flag is set to 0), the destination operand is loaded with a byte granular value (byte limit). If the descriptor has a page granular segment limit (the granularity flag is set to 1), the LSL instruction will translate the page granular limit (page limit) into a byte limit before loading it into the destination operand. The translation is performed by shifting the 20-bit “raw” limit left 12 bits and filling the low-order 12 bits with 1s.\nWhen the operand size is 32 bits, the 32-bit byte limit is stored in the destination operand. When the operand size is 16 bits, a valid 32-bit limit is computed; however, the upper 16 bits are truncated and only the low-order 16 bits are loaded into the destination operand.\nThis instruction performs the following checks before it loads the segment limit into the destination register:\nIf the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no value is loaded in the destination operand.\nTable 3-65. Segment and Gate Descriptor Types\n\n\nType\nProtected Mode\n\nIA-32e Mode\n\n\n\nName\nValid\nName\nValid\n\n\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\nC\nD\nE\nF\n\nReserved\nAvailable 16-bit TSS\nLDT\nBusy 16-bit TSS\n16-bit call gate\n16-bit/32-bit task gate\n16-bit interrupt gate\n16-bit trap gate\nReserved\nAvailable 32-bit TSS\nReserved\nBusy 32-bit TSS\n32-bit call gate\nReserved\n32-bit interrupt gate\n32-bit trap gate\n\nNo\nYes\nYes\nYes\nNo\nNo\nNo\nNo\nNo\nYes\nNo\nYes\nNo\nNo\nNo\nNo\n\nUpper 8 byte of a 16-Byte descriptor\nReserved\nLDT\nReserved\nReserved\nReserved\nReserved\nReserved\nReserved\n64-bit TSS\nReserved\nBusy 64-bit TSS\n64-bit call gate\nReserved\n64-bit interrupt gate\n64-bit trap gate\n\nYes\nNo\nYes\nNo\nNo\nNo\nNo\nNo\nNo\nYes\nNo\nYes\nNo\nNo\nNo\nNo\n" }, { "Name": "LTR", "Alias": [], "Brief": "Load Task Register", "Description": "\nLoads the source operand into the segment selector field of the task register. The source operand (a general-purpose register or a memory location) contains a segment selector that points to a task state segment (TSS). After the segment selector is loaded in the task register, the processor uses the segment selector to locate the segment descriptor for the TSS in the global descriptor table (GDT). It then loads the segment limit and base address for the TSS from the segment descriptor into the task register. The task pointed to by the task register is marked busy, but a switch to the task does not occur.\nThe LTR instruction is provided for use in operating-system software; it should not be used in application programs. It can only be executed in protected mode when the CPL is 0. It is commonly used in initialization code to establish the first task to be executed.\nThe operand-size attribute has no effect on this instruction.\nIn 64-bit mode, the operand size is still fixed at 16 bits. The instruction references a 16-byte descriptor to load the 64-bit base.\n" }, { "Name": "LZCNT", "Alias": [], "Brief": "Count the Number of Leading Zero Bits", "Description": "\nCounts the number of leading most significant zero bits in a source operand (second operand) returning the result into a destination (first operand).\nLZCNT differs from BSR. For example, LZCNT will produce the operand size when the input operand is zero. It should be noted that on processors that do not support LZCNT, the instruction byte encoding is executed as BSR.\nIn 64-bit mode 64-bit operand size requires REX.W=1.\n" }, { "Name": "MASKMOVDQU", "Alias": [], "Brief": "Store Selected Bytes of Double Quadword", "Description": "\nStores selected bytes from the source operand (first operand) into an 128-bit memory location. The mask operand (second operand) selects which bytes from the source operand are written to memory. The source and mask oper-ands are XMM registers. The memory location specified by the effective address in the DI/EDI/RDI register (the default segment register is DS, but this may be overridden with a segment-override prefix). The memory location does not need to be aligned on a natural boundary. (The size of the store address depends on the address-size attribute.)\nThe most significant bit in each byte of the mask operand determines whether the corresponding byte in the source operand is written to the corresponding byte location in memory: 0 indicates no write and 1 indicates write.\nThe MASKMOVDQU instruction generates a non-temporal hint to the processor to minimize cache pollution. The non-temporal hint is implemented by using a write combining (WC) memory type protocol (see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10, of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1). Because the WC protocol uses a weakly-ordered memory consistency model, a fencing opera-tion implemented with the SFENCE or MFENCE instruction should be used in conjunction with MASKMOVDQU instructions if multiple processors might use different memory types to read/write the destination memory loca-tions.\nBehavior with a mask of all 0s is as follows:\nThe MASKMOVDQU instruction can be used to improve performance of algorithms that need to merge data on a byte-by-byte basis. MASKMOVDQU should not cause a read for ownership; doing so generates unnecessary band-width since data is to be written directly using the byte-mask without allocating old data prior to the store.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nIf VMASKMOVDQU is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n1.ModRM.MOD = 011B required\n" }, { "Name": "MASKMOVQ", "Alias": [], "Brief": "Store Selected Bytes of Quadword", "Description": "\nStores selected bytes from the source operand (first operand) into a 64-bit memory location. The mask operand (second operand) selects which bytes from the source operand are written to memory. The source and mask oper-ands are MMX technology registers. The memory location specified by the effective address in the DI/EDI/RDI register (the default segment register is DS, but this may be overridden with a segment-override prefix). The memory location does not need to be aligned on a natural boundary. (The size of the store address depends on the address-size attribute.)\nThe most significant bit in each byte of the mask operand determines whether the corresponding byte in the source operand is written to the corresponding byte location in memory: 0 indicates no write and 1 indicates write.\nThe MASKMOVQ instruction generates a non-temporal hint to the processor to minimize cache pollution. The non-temporal hint is implemented by using a write combining (WC) memory type protocol (see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10, of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1). Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation imple-mented with the SFENCE or MFENCE instruction should be used in conjunction with MASKMOVQ instructions if multiple processors might use different memory types to read/write the destination memory locations.\nThis instruction causes a transition from x87 FPU to MMX technology state (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]).\nThe behavior of the MASKMOVQ instruction with a mask of all 0s is as follows:\nThe MASKMOVQ instruction can be used to improve performance for algorithms that need to merge data on a byte-by-byte basis. It should not cause a read for ownership; doing so generates unnecessary bandwidth since data is to be written directly using the byte-mask without allocating old data prior to the store.\nIn 64-bit mode, the memory address is specified by DS:RDI.\n" }, { "Name": "MAXPD", "Alias": [], "Brief": "Return Maximum Packed Double", "Description": "\nPerforms an SIMD compare of the packed double-precision floating-point values in the first source operand and the second source operand and returns the maximum value for each pair of values to the destination operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MAXPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "MAXPS", "Alias": [], "Brief": "Return Maximum Packed Single", "Description": "\nPerforms an SIMD compare of the packed single-precision floating-point values in the first source operand and the second source operand and returns the maximum value for each pair of values to the destination operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MAXPS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "MAXSD", "Alias": [], "Brief": "Return Maximum Scalar Double", "Description": "\nCompares the low double-precision floating-point values in the first source operand and second the source operand, and returns the maximum value to the low quadword of the destination operand. The second source operand can be an XMM register or a 64-bit memory location. The first source and destination operands are XMM registers. When the second source operand is a memory operand, only 64 bits are accessed. The high quadword of the destination operand is copied from the same bits of first source operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN of either source operand be returned, the action of MAXSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nThe second source operand can be an XMM register or a 64-bit memory location. The first source and destination operands are XMM registers.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "MAXSS", "Alias": [], "Brief": "Return Maximum Scalar Single", "Description": "\nCompares the low single-precision floating-point values in the first source operand and the second source operand, and returns the maximum value to the low doubleword of the destination operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN from either source operand be returned, the action of MAXSS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nThe second source operand can be an XMM register or a 32-bit memory location. The first source and destination operands are XMM registers.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "MFENCE", "Alias": [], "Brief": "Memory Fence", "Description": "\nPerforms a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.1 The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any LFENCE and SFENCE instructions, and any serializing instructions (such as the CPUID instruction). MFENCE does not serialize the instruction stream.\nWeakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue, speculative reads, write-combining, and write-collapsing. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The MFENCE instruction provides a performance-efficient way of ensuring load and store ordering between routines that produce weakly-ordered results and routines that consume that data.\nProcessors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, and WT memory types. This speculative fetching can occur at any time and is not tied to instruction execution. Thus, it is not ordered with respect to executions of the MFENCE instruction; data can be brought into the caches specula-tively just before, during, or after the execution of an MFENCE instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\nSpecification of the instruction's opcode above indicates a ModR/M byte of F0. For this instruction, the processor ignores the r/m field of the ModR/M byte. Thus, MFENCE is encoded by any opcode of the form 0F AE Fx, where x is in the range 0-7.\n" }, { "Name": "MINPD", "Alias": [], "Brief": "Return Minimum Packed Double", "Description": "\nPerforms an SIMD compare of the packed double-precision floating-point values in the first source operand and the second source operand and returns the minimum value for each pair of values to the destination operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\n" }, { "Name": "MINPS", "Alias": [], "Brief": "Return Minimum Packed Single", "Description": "\nPerforms an SIMD compare of the packed single-precision floating-point values in the first source operand and the second source operand and returns the minimum value for each pair of values to the destination operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "MINSD", "Alias": [], "Brief": "Return Minimum Scalar Double", "Description": "\nCompares the low double-precision floating-point values in the first source operand and the second source operand, and returns the minimum value to the low quadword of the destination operand. When the source operand is a memory operand, only the 64 bits are accessed. The high quadword of the destination operand is copied from the same bits in the first source operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second source) be returned, the action of MINSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nThe second source operand can be an XMM register or a 64-bit memory location. The first source and destination operands are XMM registers.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "MINSS", "Alias": [], "Brief": "Return Minimum Scalar Single", "Description": "\nCompares the low single-precision floating-point values in the first source operand and the second source operand and returns the minimum value to the low doubleword of the destination operand.\nIf the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).\nIf only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN in either source operand be returned, the action of MINSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.\nThe second source operand can be an XMM register or a 32-bit memory location. The first source and destination operands are XMM registers.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "MONITOR", "Alias": [], "Brief": "Set Up Monitor Address", "Description": "\nThe MONITOR instruction arms address monitoring hardware using an address specified in EAX (the address range that the monitoring hardware checks for store operations can be determined by using CPUID). A store to an address within the specified address range triggers the monitoring hardware. The state of monitor hardware is used by MWAIT.\nThe content of EAX is an effective address (in 64-bit mode, RAX is used). By default, the DS segment is used to create a linear address that is monitored. Segment overrides can be used.\nECX and EDX are also used. They communicate other information to MONITOR. ECX specifies optional extensions. EDX specifies optional hints; it does not change the architectural behavior of the instruction. For the Pentium 4 processor (family 15, model 3), no extensions or hints are defined. Undefined hints in EDX are ignored by the processor; undefined extensions in ECX raises a general protection fault.\nThe address range must use memory of the write-back type. Only write-back memory will correctly trigger the monitoring hardware. Additional information on determining what address range to use in order to prevent false wake-ups is described in Chapter 8, “Multiple-Processor Management” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\nThe MONITOR instruction is ordered as a load operation with respect to other memory transactions. The instruction is subject to the permission checking and faults associated with a byte load. Like a load, MONITOR sets the A-bit but not the D-bit in page tables.\nCPUID.01H:ECX.MONITOR[bit 3] indicates the availability of MONITOR and MWAIT in the processor. When set, MONITOR may be executed only at privilege level 0 (use at any other privilege level results in an invalid-opcode exception). The operating system or system BIOS may disable this instruction by using the IA32_MISC_ENABLE MSR; disabling MONITOR clears the CPUID feature flag and causes execution to generate an invalid-opcode excep-tion.\nThe instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "MOV", "Alias": [], "Brief": "Move", "Description": "\nCopies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, a doubleword, or a quadword.\nThe MOV instruction cannot be used to load the CS register. Attempting to do so results in an invalid opcode excep-tion (#UD). To load the CS register, use the far JMP, CALL, or RET instruction.\nIf the destination operand is a segment register (DS, ES, FS, GS, or SS), the source operand must be a valid segment selector. In protected mode, moving a segment selector into a segment register automatically causes the segment descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register. While loading this information, the segment selector and segment descriptor information is validated (see the “Operation” algorithm below). The segment descriptor data is obtained from the GDT or LDT entry for the specified segment selector.\nA NULL segment selector (values 0000-0003) can be loaded into the DS, ES, FS, and GS registers without causing a protection exception. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a NULL value causes a general protection exception (#GP) and no memory reference occurs.\nLoading the SS register with a MOV instruction inhibits all interrupts until after the execution of the next instruc-tion. This operation allows a stack pointer to be loaded into the ESP register with the next instruction (MOV ESP, stack-pointer value) before an interrupt occurs1. Be aware that the LSS instruction offers a more efficient method of loading the SS and ESP registers.\nWhen operating in 32-bit mode and moving data between a segment register and a general-purpose register, the 32-bit IA-32 processors do not require the use of the 16-bit operand-size prefix (a byte with the value 66H) with\n1.\nIf a code instruction breakpoint (for debug) is placed on an instruction located immediately after a MOV SS instruction, the break-point may not be triggered. However, in a sequence of instructions that load the SS register, only the first instruction in the sequence is guaranteed to delay an interrupt.\nIn the following sequence, interrupts may be recognized before MOV ESP, EBP executes:\nMOV SS, EDX MOV SS, EAX MOV ESP, EBP\nthis instruction, but most assemblers will insert it if the standard form of the instruction is used (for example, MOV DS, AX). The processor will execute this instruction correctly, but it will usually require an extra clock. With most assemblers, using the instruction form MOV DS, EAX will avoid this unneeded 66H prefix. When the processor executes the instruction with a 32-bit general-purpose register, it assumes that the 16 least-significant bits of the general-purpose register are the destination or source operand. If the register is a destination operand, the resulting value in the two high-order bytes of the register is implementation dependent. For the Pentium 4, Intel Xeon, and P6 family processors, the two high-order bytes are filled with zeros; for earlier 32-bit IA-32 processors, the two high order bytes are undefined.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "MOVAPD", "Alias": [], "Brief": "Move Aligned Packed Double", "Description": "\nMoves 2 or 4 double-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM or YMM register from an 128-bit or 256-bit memory location, to store the contents of an XMM or YMM register into a 128-bit or 256-bit memory location, or to move data between two XMM or two YMM registers. When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version) or 32-byte (VEX.256 encoded version) boundary or a general-protection exception (#GP) will be generated.\nTo move double-precision floating-point values to and from unaligned memory locations, use the (V)MOVUPD instruction.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit versions: Moves 128 bits of packed double-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM register from a 128-bit memory location, to store the contents of an XMM register into a 128-bit memory location, or to move data between two XMM registers. When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated. To move single-precision floating-point values to and from unaligned memory locations, use the VMOVUPD instruction.\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register destination are zeroed.\nVEX.256 encoded version: Moves 256 bits of packed double-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load a YMM register from a 256-bit memory location, to store the contents of a YMM register into a 256-bit memory location, or to move data between two YMM registers. When the source or destination operand is a memory operand, the operand must be aligned on a 32-byte boundary or a general-protection exception (#GP) will be generated. To move single-precision floating-point values to and from unaligned memory locations, use the VMOVUPD instruc-tion.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "MOVAPS", "Alias": [], "Brief": "Move Aligned Packed Single", "Description": "\nMoves 4 or8 single-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM or YMM register from an 128-bit or 256-bit memory location, to store the contents of an XMM or YMM register into a 128-bit or 256-bit memory location, or to move data between two XMM or two YMM registers. When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version) or 32-byte (VEX.256 encoded version) boundary or a general-protection exception (#GP) will be generated.\nTo move single-precision floating-point values to and from unaligned memory locations, use the (V)MOVUPS instruction.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n128-bit versions:\nMoves 128 bits of packed single-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM register from a 128-bit memory location, to store the contents of an XMM register into a 128-bit memory location, or to move data between two XMM registers. When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated. To move single-precision floating-point values to and from unaligned memory locations, use the VMOVUPS instruction.\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version:\nMoves 256 bits of packed single-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load a YMM register from a 256-bit memory location, to store the contents of a YMM register into a 256-bit memory location, or to move data between two YMM registers.\n" }, { "Name": "MOVBE", "Alias": [], "Brief": "Move Data After Swapping Bytes", "Description": "\nPerforms a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). The source operand can be a general-purpose register, or memory loca-tion; the destination register can be a general-purpose register, or a memory location; however, both operands can not be registers, and only one operand can be a memory location. Both operands must be the same size, which can be a word, a doubleword or quadword.\nThe MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.\nIn 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "MOVD", "Alias": [ "MOVQ" ], "Brief": "Move Doubleword/Move Quadword", "Description": "\nCopies a doubleword from the source operand (second operand) to the destination operand (first operand). The source and destination operands can be general-purpose registers, MMX technology registers, XMM registers, or 32-bit memory locations. This instruction can be used to move a doubleword to and from the low doubleword of an MMX technology register and a general-purpose register or a 32-bit memory location, or to and from the low doubleword of an XMM register and a general-purpose register or a 32-bit memory location. The instruction cannot be used to transfer data between MMX technology registers, between XMM registers, between general-purpose registers, or between memory locations.\nWhen the destination operand is an MMX technology register, the source operand is written to the low doubleword of the register, and the register is zero-extended to 64 bits. When the destination operand is an XMM register, the source operand is written to the low doubleword of the register, and the register is zero-extended to 128 bits.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "MOVDDUP", "Alias": [], "Brief": "Move One Double", "Description": "\nThe linear address corresponds to the address of the least-significant byte of the referenced memory data. When a memory address is indicated, the 8 bytes of data at memory location m64 are loaded. When the register-register form of this operation is used, the lower half of the 128-bit source register is duplicated and copied into the 128-bit destination register. See Figure 3-24.\n\nMOVDDUP xmm1, xmm2/m64\nxmm2/m64\n[63:0]\nRESULT:\nxmm1[127:64] xmm2/m64[63:0]\nxmm1[63:0] xmm2/m64[63:0]\nxmm1\n[127:64]\n[63:0]\nOM15997\n\n\n\n\n\n\n\n\nFigure 3-24. MOVDDUP—Move One Double-FP and Duplicate\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "MOVDQ2Q", "Alias": [], "Brief": "Move Quadword from XMM to MMX Technology Register", "Description": "\nMoves the low quadword from the source operand (second operand) to the destination operand (first operand). The source operand is an XMM register and the destination operand is an MMX technology register.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the MOVDQ2Q instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "MOVDQA", "Alias": [], "Brief": "Move Aligned Double Quadword", "Description": "\n128-bit versions:\nMoves 128 bits of packed integer values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM register from a 128-bit memory location, to store the contents of an XMM register into a 128-bit memory location, or to move data between two XMM registers.\nWhen the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated. To move integer data to and from unaligned memory locations, use the VMOVDQU instruction.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version:\nMoves 256 bits of packed integer values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load a YMM register from a 256-bit memory location, to store the contents of a YMM register into a 256-bit memory location, or to move data between two YMM registers.\nWhen the source or destination operand is a memory operand, the operand must be aligned on a 32-byte boundary or a general-protection exception (#GP) will be generated. To move integer data to and from unaligned memory locations, use the VMOVDQU instruction.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "MOVDQU", "Alias": [], "Brief": "Move Unaligned Double Quadword", "Description": "\n128-bit versions:\nMoves 128 bits of packed integer values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM register from a 128-bit memory location, to store the contents of an XMM register into a 128-bit memory location, or to move data between two XMM registers. When the source or destination operand is a memory operand, the operand may be unaligned on a 16-byte boundary without causing a general-protection exception (#GP) to be generated.1\nTo move a double quadword to or from memory locations that are known to be aligned on 16-byte boundaries, use the MOVDQA instruction.\nWhile executing in 16-bit addressing mode, a linear address for a 128-bit data access that overlaps the end of a 16-bit segment is not allowed and is defined as reserved behavior. A specific processor implementation may or may not generate a general-protection exception (#GP) in this situation, and the address that spans the end of the segment may or may not wrap around to the beginning of the segment.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nWhen the source or destination operand is a memory operand, the operand may be unaligned to any alignment without causing a general-protection exception (#GP) to be generated\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: Moves 256 bits of packed integer values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load a YMM register from a 256-bit memory\n1.\nIf alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary.\nlocation, to store the contents of a YMM register into a 256-bit memory location, or to move data between two YMM registers.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "MOVHLPS", "Alias": [], "Brief": "Move Packed Single", "Description": "\nThis instruction cannot be used for memory to register moves.\n128-bit two-argument form:\nMoves two packed single-precision floating-point values from the high quadword of the second XMM argument (second operand) to the low quadword of the first XMM register (first argument). The high quadword of the desti-nation operand is left unchanged. Bits (VLMAX-1:64) of the corresponding YMM destination register are unmodi-fied.\n128-bit three-argument form\nMoves two packed single-precision floating-point values from the high quadword of the third XMM argument (third operand) to the low quadword of the destination (first operand). Copies the high quadword from the second XMM argument (second operand) to the high quadword of the destination (first operand). Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nIf VMOVHLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "MOVHPD", "Alias": [], "Brief": "Move High Packed Double", "Description": "\nThis instruction cannot be used for register to register or memory to memory moves.\n128-bit Legacy SSE load:\nMoves a double-precision floating-point value from the source 64-bit memory operand and stores it in the high 64-bits of the destination XMM register. The lower 64bits of the XMM register are preserved. The upper 128-bits of the corresponding YMM destination register are preserved.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nVEX.128 encoded load:\nLoads a double-precision floating-point value from the source 64-bit memory operand (third operand) and stores it in the upper 64-bits of the destination XMM register (first operand). The low 64-bits from second XMM register (second operand) are stored in the lower 64-bits of the destination. The upper 128-bits of the destination YMM register are zeroed.\n128-bit store:\nStores a double-precision floating-point value from the high 64-bits of the XMM register source (second operand) to the 64-bit memory location (first operand).\nNote: VMOVHPD (store) (VEX.128.66.0F 17 /r) is legal and has the same behavior as the existing 66 0F 17 store. For VMOVHPD (store) (VEX.128.66.0F 17 /r) instruction version, VEX.vvvv is reserved and must be 1111b other-wise instruction will #UD.\nIf VMOVHPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "MOVHPS", "Alias": [], "Brief": "Move High Packed Single", "Description": "\nThis instruction cannot be used for register to register or memory to memory moves.\n128-bit Legacy SSE load:\nMoves two packed single-precision floating-point values from the source 64-bit memory operand and stores them in the high 64-bits of the destination XMM register. The lower 64bits of the XMM register are preserved. The upper 128-bits of the corresponding YMM destination register are preserved.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nVEX.128 encoded load:\nLoads two single-precision floating-point values from the source 64-bit memory operand (third operand) and stores it in the upper 64-bits of the destination XMM register (first operand). The low 64-bits from second XMM register (second operand) are stored in the lower 64-bits of the destination. The upper 128-bits of the destination YMM register are zeroed.\n128-bit store:\nStores two packed single-precision floating-point values from the high 64-bits of the XMM register source (second operand) to the 64-bit memory location (first operand).\nNote: VMOVHPS (store) (VEX.NDS.128.0F 17 /r) is legal and has the same behavior as the existing 0F 17 store.\nFor VMOVHPS (store) (VEX.NDS.128.0F 17 /r) instruction version, VEX.vvvv is reserved and must be 1111b other-\nwise instruction will #UD.\nIf VMOVHPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "MOVLHPS", "Alias": [], "Brief": "Move Packed Single", "Description": "\nThis instruction cannot be used for memory to register moves.\n128-bit two-argument form:\nMoves two packed single-precision floating-point values from the low quadword of the second XMM argument (second operand) to the high quadword of the first XMM register (first argument). The low quadword of the desti-nation operand is left unchanged. The upper 128 bits of the corresponding YMM destination register are unmodi-fied.\n128-bit three-argument form\nMoves two packed single-precision floating-point values from the low quadword of the third XMM argument (third operand) to the high quadword of the destination (first operand). Copies the low quadword from the second XMM argument (second operand) to the low quadword of the destination (first operand). The upper 128-bits of the destination YMM register are zeroed.\nIf VMOVLHPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "MOVLPD", "Alias": [], "Brief": "Move Low Packed Double", "Description": "\nThis instruction cannot be used for register to register or memory to memory moves.\n128-bit Legacy SSE load:\nMoves a double-precision floating-point value from the source 64-bit memory operand and stores it in the low 64-bits of the destination XMM register. The upper 64bits of the XMM register are preserved. The upper 128-bits of the corresponding YMM destination register are preserved.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nVEX.128 encoded load:\nLoads a double-precision floating-point value from the source 64-bit memory operand (third operand), merges it with the upper 64-bits of the first source XMM register (second operand), and stores it in the low 128-bits of the destination XMM register (first operand). The upper 128-bits of the destination YMM register are zeroed.\n128-bit store:\nStores a double-precision floating-point value from the low 64-bits of the XMM register source (second operand) to the 64-bit memory location (first operand).\nNote: VMOVLPD (store) (VEX.128.66.0F 13 /r) is legal and has the same behavior as the existing 66 0F 13 store. For VMOVLPD (store) (VEX.128.66.0F 13 /r) instruction version, VEX.vvvv is reserved and must be 1111b other-wise instruction will #UD.\nIf VMOVLPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "MOVLPS", "Alias": [], "Brief": "Move Low Packed Single", "Description": "\nThis instruction cannot be used for register to register or memory to memory moves.\n128-bit Legacy SSE load:\nMoves two packed single-precision floating-point values from the source 64-bit memory operand and stores them in the low 64-bits of the destination XMM register. The upper 64bits of the XMM register are preserved. The upper 128-bits of the corresponding YMM destination register are preserved.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nVEX.128 encoded load:\nLoads two packed single-precision floating-point values from the source 64-bit memory operand (third operand), merges them with the upper 64-bits of the first source XMM register (second operand), and stores them in the low 128-bits of the destination XMM register (first operand). The upper 128-bits of the destination YMM register are zeroed.\n128-bit store:\nLoads two packed single-precision floating-point values from the low 64-bits of the XMM register source (second operand) to the 64-bit memory location (first operand).\nNote: VMOVLPS (store) (VEX.128.0F 13 /r) is legal and has the same behavior as the existing 0F 13 store. For VMOVLPS (store) (VEX.128.0F 13 /r) instruction version, VEX.vvvv is reserved and must be 1111b otherwise instruction will #UD.\nIf VMOVLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "MOVMSKPD", "Alias": [], "Brief": "Extract Packed Double", "Description": "\nExtracts the sign bits from the packed double-precision floating-point values in the source operand (second operand), formats them into a 2-bit mask, and stores the mask in the destination operand (first operand). The source operand is an XMM register, and the destination operand is a general-purpose register. The mask is stored in the 2 low-order bits of the destination operand. Zero-extend the upper bits of the destination.\nIn 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. The default operand size is 64-bit in 64-bit mode.\n128-bit versions: The source operand is a YMM register. The destination operand is a general purpose register.\nVEX.256 encoded version: The source operand is a YMM register. The destination operand is a general purpose register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "MOVMSKPS", "Alias": [], "Brief": "Extract Packed Single", "Description": "\nExtracts the sign bits from the packed single-precision floating-point values in the source operand (second operand), formats them into a 4- or 8-bit mask, and stores the mask in the destination operand (first operand). The source operand is an XMM or YMM register, and the destination operand is a general-purpose register. The mask is stored in the 4 or 8 low-order bits of the destination operand. The upper bits of the destination operand beyond the mask are filled with zeros.\nIn 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. The default operand size is 64-bit in 64-bit mode.\n128-bit versions: The source operand is a YMM register. The destination operand is a general purpose register.\nVEX.256 encoded version: The source operand is a YMM register. The destination operand is a general purpose register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "MOVNTDQ", "Alias": [], "Brief": "Store Double Quadword Using Non", "Description": "\nMoves the packed integers in the source operand (second operand) to the destination operand (first operand) using a non-temporal hint to prevent caching of the data during the write to memory. The source operand is an XMM register or YMM register, which is assumed to contain integer data (packed bytes, words, doublewords, or quad-words). The destination operand is a 128-bit or 256-bit memory location. The memory operand must be aligned on a 16-byte (128-bit version) or 32-byte (VEX.256 encoded version) boundary otherwise a general-protection exception (#GP) will be generated.\nThe non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being written to can override the non-temporal hint, if the memory address specified for the non-temporal store is in an uncacheable (UC) or write protected (WP) memory region. For more information on non-temporal stores, see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nBecause the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MOVNTDQ instructions if multiple processors might use different memory types to read/write the destination memory locations.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-128 encoded versions, VEX.vvvv is reserved and must be 1111b, VEX.L must be 0; otherwise instruc-tions will #UD.\n" }, { "Name": "MOVNTDQA", "Alias": [], "Brief": "Load Double Quadword Non", "Description": "\n(V)MOVNTDQA loads a double quadword from the source operand (second operand) to the destination operand (first operand) using a non-temporal hint. A processor implementation may make use of the non-temporal hint associated with this instruction if the memory source is WC (write combining) memory type. An implementation may also make use of the non-temporal hint associated with this instruction if the memory source is WB (write back) memory type.\nA processor’s implementation of the non-temporal hint does not override the effective memory type semantics, but the implementation of the hint is processor dependent. For example, a processor implementation may choose to ignore the hint and process the instruction as a normal MOVDQA for any memory type. Another implementation of the hint for WC memory type may optimize data transfer throughput of WC reads. A third implementation may optimize cache reads generated by (V)MOVNTDQA on WB memory type to reduce cache evictions.\nWC Streaming Load Hint\nFor WC memory type in particular, the processor never appears to read the data into the cache hierarchy. Instead, the non-temporal hint may be implemented by loading a temporary internal buffer with the equivalent of an aligned cache line without filling this data to the cache. Any memory-type aliased lines in the cache will be snooped and flushed. Subsequent MOVNTDQA reads to unread portions of the WC cache line will receive data from the temporary internal buffer if data is available. The temporary internal buffer may be flushed by the processor at any time for any reason, for example:\nThe memory type of the region being read can override the non-temporal hint, if the memory address specified for the non-temporal read is not a WC memory region. Information on non-temporal reads and writes can be found in Chapter 11, “Memory Cache Control” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\nBecause the WC protocol uses a weakly-ordered memory consistency model, an MFENCE or locked instruction should be used in conjunction with MOVNTDQA instructions if multiple processors might reference the same WC memory locations or in order to synchronize reads of a processor with writes by other agents in the system. Because of the speculative nature of fetching due to MOVNTDQA, Streaming loads must not be used to reference memory addresses that are mapped to I/O devices having side effects or when reads to these devices are destruc-\ntive. For additional information on MOVNTDQA usages, see Section 12.10.3 in Chapter 12, “Programming with SSE3, SSSE3 and SSE4” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nThe 128-bit (V)MOVNTDQA addresses must be 16-byte aligned or the instruction will cause a #GP.\nThe 256-bit VMOVNTDQA addresses must be 32-byte aligned or the instruction will cause a #GP.\nNote: In VEX-128 encoded versions, VEX.vvvv is reserved and must be 1111b, VEX.L must be 0; otherwise instruc-tions will #UD.\n" }, { "Name": "MOVNTI", "Alias": [], "Brief": "Store Doubleword Using Non", "Description": "\nMoves the doubleword integer in the source operand (second operand) to the destination operand (first operand) using a non-temporal hint to minimize cache pollution during the write to memory. The source operand is a general-purpose register. The destination operand is a 32-bit memory location.\nThe non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being written to can override the non-temporal hint, if the memory address specified for the non-temporal store is in an uncacheable (UC) or write protected (WP) memory region. For more information on non-temporal stores, see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nBecause the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MOVNTI instructions if multiple proces-sors might use different memory types to read/write the destination memory locations.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "MOVNTPD", "Alias": [], "Brief": "Store Packed Double", "Description": "\nMoves the packed double-precision floating-point values in the source operand (second operand) to the destination operand (first operand) using a non-temporal hint to prevent caching of the data during the write to memory. The source operand is an XMM register or YMM register, which is assumed to contain packed double-precision, floating-pointing data. The destination operand is a 128-bit or 256-bit memory location. The memory operand must be aligned on a 16-byte (128-bit version) or 32-byte (VEX.256 encoded version) boundary otherwise a general-protection exception (#GP) will be generated.\nThe non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being written to can override the non-temporal hint, if the memory address specified for the non-temporal store is in an uncacheable (UC) or write protected (WP) memory region. For more information on non-temporal stores, see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nBecause the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MOVNTPD instructions if multiple processors might use different memory types to read/write the destination memory locations.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-128 encoded versions, VEX.vvvv is reserved and must be 1111b, VEX.L must be 0; otherwise instruc-tions will #UD.\n" }, { "Name": "MOVNTPS", "Alias": [], "Brief": "Store Packed Single", "Description": "\nMoves the packed single-precision floating-point values in the source operand (second operand) to the destination operand (first operand) using a non-temporal hint to prevent caching of the data during the write to memory. The source operand is an XMM register or YMM register, which is assumed to contain packed single-precision, floating-pointing. The destination operand is a 128-bit or 256-bit memory location. The memory operand must be aligned on a 16-byte (128-bit version) or 32-byte (VEX.256 encoded version) boundary otherwise a general-protection exception (#GP) will be generated.\nThe non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being written to can override the non-temporal hint, if the memory address specified for the non-temporal store is in an uncacheable (UC) or write protected (WP) memory region. For more information on non-temporal stores, see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nBecause the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MOVNTPS instructions if multiple processors might use different memory types to read/write the destination memory locations.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "MOVNTQ", "Alias": [], "Brief": "Store of Quadword Using Non", "Description": "\nMoves the quadword in the source operand (second operand) to the destination operand (first operand) using a non-temporal hint to minimize cache pollution during the write to memory. The source operand is an MMX tech-nology register, which is assumed to contain packed integer data (packed bytes, words, or doublewords). The destination operand is a 64-bit memory location.\nThe non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being written to can override the non-temporal hint, if the memory address specified for the non-temporal store is in an uncacheable (UC) or write protected (WP) memory region. For more information on non-temporal stores, see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nBecause the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MOVNTQ instructions if multiple proces-sors might use different memory types to read/write the destination memory locations.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "MOVQ2DQ", "Alias": [], "Brief": "Move Quadword from MMX Technology to XMM Register", "Description": "\nMoves the quadword from the source operand (second operand) to the low quadword of the destination operand (first operand). The source operand is an MMX technology register and the destination operand is an XMM register.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the MOVQ2DQ instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "MOVS", "Alias": [ "MOVSB", "MOVSW", "MOVSD", "MOVSQ" ], "Brief": "Move Data from String to String", "Description": "\nMoves the byte, word, or doubleword specified with the second operand (source operand) to the location specified with the first operand (destination operand). Both the source and destination operands are located in memory. The address of the source operand is read from the DS:ESI or the DS:SI registers (depending on the address-size attri-bute of the instruction, 32 or 16, respectively). The address of the destination operand is read from the ES:EDI or the ES:DI registers (again depending on the address-size attribute of the instruction). The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden.\nAt the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the MOVS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source and destination operands should be symbols that indicate the size and location of the source value and the destination, respectively. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source and destination operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords), but they do not have to specify the correct location. The locations of the source and destination operands are always specified by the DS:(E)SI and ES:(E)DI registers, which must be loaded correctly before the move string instruction is executed.\nThe no-operands form provides “short forms” of the byte, word, and doubleword versions of the MOVS instruc-tions. Here also DS:(E)SI and ES:(E)DI are assumed to be the source and destination operands, respectively. The size of the source and destination operands is selected with the mnemonic: MOVSB (byte move), MOVSW (word move), or MOVSD (doubleword move).\nAfter the move operation, the (E)SI and (E)DI registers are incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI and (E)DI register are incre-\nmented; if the DF flag is 1, the (E)SI and (E)DI registers are decremented.) The registers are incremented or decremented by 1 for byte operations, by 2 for word operations, or by 4 for doubleword operations.\nNOTE\nTo improve performance, more recent processors support modifications to the processor’s operation during the string store operations initiated with MOVS and MOVSB. See Section 7.3.9.3 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 for additional information on fast-string operation.\nThe MOVS, MOVSB, MOVSW, and MOVSD instructions can be preceded by the REP prefix (see “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, for a description of the REP prefix) for block moves of ECX bytes, words, or doublewords.\nIn 64-bit mode, the instruction’s default address size is 64 bits, 32-bit address size is supported using the prefix 67H. The 64-bit addresses are specified by RSI and RDI; 32-bit address are specified by ESI and EDI. Use of the REX.W prefix promotes doubleword operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "MOVSHDUP", "Alias": [], "Brief": "Move Packed Single", "Description": "\nThe linear address corresponds to the address of the least-significant byte of the referenced memory data. When a memory address is indicated, the 16 bytes of data at memory location m128 are loaded and the single-precision elements in positions 1 and 3 are duplicated. When the register-register form of this operation is used, the same operation is performed but with data coming from the 128-bit source register. See Figure 3-25.\n\nMOVSHDUP xmm1, xmm2/m128\nxmm2/\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nm128\nxmm1[127:96]\nxmm1[95:64]\nxmm1[63:32]\nxmm1[31:0]\nRESULT:\nxmm2/\nxmm2/\nxmm2/\nxmm2/\nxmm1\nm128[127:96]\nm128[127:96]\nm128[63:32]\nm128[63:32]\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nOM15998\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-25. MOVSHDUP—Move Packed Single-FP High and Duplicate\nIn 64-bit mode, use of the REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "MOVSLDUP", "Alias": [], "Brief": "Move Packed Single", "Description": "\nThe linear address corresponds to the address of the least-significant byte of the referenced memory data. When a memory address is indicated, the 16 bytes of data at memory location m128 are loaded and the single-precision elements in positions 0 and 2 are duplicated. When the register-register form of this operation is used, the same operation is performed but with data coming from the 128-bit source register.\nSee Figure 3-26.\n\nMOVSLDUP xmm1, xmm2/m128\nxmm2/\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nm128\nxmm1[127:96]\nxmm1[95:64]\nxmm1[63:32]\nxmm1[31:0]\nRESULT:\nxmm2/\nxmm2/\nxmm2/\nxmm2/\nxmm1\nm128[95:64]\nm128[95:64]\nm128[31:0]\nm128[31:0]\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nOM15999\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-26. MOVSLDUP—Move Packed Single-FP Low and Duplicate\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "MOVSS", "Alias": [], "Brief": "Move Scalar Single", "Description": "\nMoves a scalar single-precision floating-point value from the source operand (second operand) to the destination operand (first operand). The source and destination operands can be XMM registers or 32-bit memory locations. This instruction can be used to move a single-precision floating-point value to and from the low doubleword of an XMM register and a 32-bit memory location, or to move a single-precision floating-point value between the low doublewords of two XMM registers. The instruction cannot be used to transfer data between memory locations.\nFor non-VEX encoded syntax and when the source and destination operands are XMM registers, the high double-words of the destination operand remains unchanged. When the source operand is a memory location and destina-tion operand is an XMM registers, the high doublewords of the destination operand is cleared to all 0s.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nVEX encoded instruction syntax supports two source operands and a destination operand if ModR/M.mod field is 11B. VEX.vvvv is used to encode the first source operand (the second operand). The low 128 bits of the destination operand stores the result of merging the low dword of the second source operand with three dwords in bits 127:32 of the first source operand. The upper bits of the destination operand are cleared.\nNote: For the “VMOVSS m32, xmm1” (memory store form) instruction version, VEX.vvvv is reserved and must be 1111b otherwise instruction will #UD.\nNote: For the “VMOVSS xmm1, m32” (memory load form) instruction version, VEX.vvvv is reserved and must be 1111b otherwise instruction will #UD.\n" }, { "Name": "MOVSX", "Alias": [ "MOVSXD" ], "Brief": "Move with Sign", "Description": "\nCopies the contents of the source operand (register or memory location) to the destination operand (register) and sign extends the value to 16 or 32 bits (see Figure 7-6 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1). The size of the converted value depends on the operand-size attribute.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "MOVUPD", "Alias": [], "Brief": "Move Unaligned Packed Double", "Description": "\n128-bit versions:\nMoves a double quadword containing two packed double-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM register from a 128-bit memory location, store the contents of an XMM register into a 128-bit memory location, or move data between two XMM registers.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nWhen the source or destination operand is a memory operand, the operand may be unaligned on a 16-byte boundary without causing a general-protection exception (#GP) to be generated.1\nTo move double-precision floating-point values to and from memory locations that are known to be aligned on 16-byte boundaries, use the MOVAPD instruction.\nWhile executing in 16-bit addressing mode, a linear address for a 128-bit data access that overlaps the end of a 16-bit segment is not allowed and is defined as reserved behavior. A specific processor implementation may or may not generate a general-protection exception (#GP) in this situation, and the address that spans the end of the segment may or may not wrap around to the beginning of the segment.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: Moves 256 bits of packed double-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load a YMM register from a 256-bit memory location, to store the contents of a YMM register into a 256-bit memory location, or to move data between two YMM registers.\n1.\nIf alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "MOVUPS", "Alias": [], "Brief": "Move Unaligned Packed Single", "Description": "\n128-bit versions: Moves a double quadword containing four packed single-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load an XMM register from a 128-bit memory location, store the contents of an XMM register into a 128-bit memory location, or move data between two XMM registers.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nWhen the source or destination operand is a memory operand, the operand may be unaligned on a 16-byte boundary without causing a general-protection exception (#GP) to be generated.1\nTo move packed single-precision floating-point values to and from memory locations that are known to be aligned on 16-byte boundaries, use the MOVAPS instruction.\nWhile executing in 16-bit addressing mode, a linear address for a 128-bit data access that overlaps the end of a 16-bit segment is not allowed and is defined as reserved behavior. A specific processor implementation may or may not generate a general-protection exception (#GP) in this situation, and the address that spans the end of the segment may or may not wrap around to the beginning of the segment.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: Moves 256 bits of packed single-precision floating-point values from the source operand (second operand) to the destination operand (first operand). This instruction can be used to load a YMM register from a 256-bit memory location, to store the contents of a YMM register into a 256-bit memory location, or to move data between two YMM registers.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n1.\nIf alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary.\n" }, { "Name": "MOVZX", "Alias": [], "Brief": "Move with Zero", "Description": "\nCopies the contents of the source operand (register or memory location) to the destination operand (register) and zero extends the value. The size of the converted value depends on the operand-size attribute.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bit operands. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "MPSADBW", "Alias": [], "Brief": "Compute Multiple Packed Sums of Absolute Difference", "Description": "\n(V)MPSADBW calculates packed word results of sum-absolute-difference (SAD) of unsigned bytes from two blocks of 32-bit dword elements, using two select fields in the immediate byte to select the offsets of the two blocks within the first source operand and the second operand. Packed SAD word results are calculated within each 128-bit lane. Each SAD word result is calculated between a stationary block_2 (whose offset within the second source operand is selected by a two bit select control, multiplied by 32 bits) and a sliding block_1 at consecutive byte-granular position within the first source operand. The offset of the first 32-bit block of block_1 is selectable using a one bit select control, multiplied by 32 bits.\n128-bit Legacy SSE version: Imm8[1:0]*32 specifies the bit offset of block_2 within the second source operand. Imm[2]*32 specifies the initial bit offset of the block_1 within the first source operand. The first source operand and destination operand are the same. The first source and destination operands are XMM registers. The second source operand is either an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. Bits 7:3 of the immediate byte are ignored.\nVEX.128 encoded version: Imm8[1:0]*32 specifies the bit offset of block_2 within the second source operand. Imm[2]*32 specifies the initial bit offset of the block_1 within the first source operand. The first source and desti-nation operands are XMM registers. The second source operand is either an XMM register or a 128-bit memory location. Bits (127:128) of the corresponding YMM register are zeroed. Bits 7:3 of the immediate byte are ignored.\nVEX.256 encoded version: The sum-absolute-difference (SAD) operation is repeated 8 times for MPSADW between the same block_2 (fixed offset within the second source operand) and a variable block_1 (offset is shifted by 8 bits for each SAD operation) in the first source operand. Each 16-bit result of eight SAD operations between block_2 and block_1 is written to the respective word in the lower 128 bits of the destination operand.\nAdditionally, VMPSADBW performs another eight SAD operations on block_4 of the second source operand and block_3 of the first source operand. (Imm8[4:3]*32 + 128) specifies the bit offset of block_4 within the second source operand. (Imm[5]*32+128) specifies the initial bit offset of the block_3 within the first source operand. Each 16-bit result of eight SAD operations between block_4 and block_3 is written to the respective word in the upper 128 bits of the destination operand.\nThe first source operand is a YMM register. The second source register can be a YMM register or a 256-bit memory location. The destination operand is a YMM register. Bits 7:6 of the immediate byte are ignored.\nNote: If VMPSADBW is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n\nImm[4:3]*32+128\n128\n255\n224\n192\nSrc2\nAbs. Diff.\nImm[5]*32+128\nSrc1\n144\n128\n255\nDestination\nImm[1:0]*32\n0\n127\n96\n64\nSrc2\nAbs. Diff.\nImm[2]*32\nSrc1\n16\n0\n127\nDestination\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSum\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSum\nFigure 3-27. 256-bit VMPSADBW Operation\n" }, { "Name": "MUL", "Alias": [], "Brief": "Unsigned Multiply", "Description": "\nPerforms an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL, AX or EAX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location. The action of this instruction and the location of the result depends on the opcode and the operand size as shown in Table 3-66.\nThe result is stored in register AX, register pair DX:AX, or register pair EDX:EAX (depending on the operand size), with the high-order bits of the product contained in register AH, DX, or EDX, respectively. If the high-order bits of the product are 0, the CF and OF flags are cleared; otherwise, the flags are set.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.\nSee the summary chart at the beginning of this section for encoding data and limits.\nTable 3-66. MUL Results\n\n\nOperand Size\nSource 1\nSource 2\nDestination\n\nByte\nAL\nr/m8\nAX\n\nWord\nAX\nr/m16\nDX:AX\n\nDoubleword\nEAX\nr/m32\nEDX:EAX\n\nQuadword\nRAX\nr/m64\nRDX:RAX\n" }, { "Name": "MULPD", "Alias": [], "Brief": "Multiply Packed Double", "Description": "\nPerforms a SIMD multiply of the two or four packed double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed double-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. See Figure 11-3 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD double-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the destination YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "MULPS", "Alias": [], "Brief": "Multiply Packed Single", "Description": "\nPerforms a SIMD multiply of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. See Figure 10-5 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an illustration of a SIMD single-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the destination YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "MULSD", "Alias": [], "Brief": "Multiply Scalar Double", "Description": "\nMultiplies the low double-precision floating-point value in the source operand (second operand) by the low double-precision floating-point value in the destination operand (first operand), and stores the double-precision floating-point result in the destination operand. The source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. The high quadword of the destination operand remains unchanged. See Figure 11-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustra-tion of a scalar double-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "MULSS", "Alias": [], "Brief": "Multiply Scalar Single", "Description": "\nMultiplies the low single-precision floating-point value from the source operand (second operand) by the low single-precision floating-point value in the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a scalar single-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "MWAIT", "Alias": [], "Brief": "Monitor Wait", "Description": "\nMWAIT instruction provides hints to allow the processor to enter an implementation-dependent optimized state. There are two principal targeted usages: address-range monitor and advanced power management. Both usages of MWAIT require the use of the MONITOR instruction.\nCPUID.01H:ECX.MONITOR[bit 3] indicates the availability of MONITOR and MWAIT in the processor. When set, MWAIT may be executed only at privilege level 0 (use at any other privilege level results in an invalid-opcode exception). The operating system or system BIOS may disable this instruction by using the IA32_MISC_ENABLE MSR; disabling MWAIT clears the CPUID feature flag and causes execution to generate an invalid-opcode excep-tion.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\nECX specifies optional extensions for the MWAIT instruction. EAX may contain hints such as the preferred opti-mized state the processor should enter. The first processors to implement MWAIT supported only the zero value for EAX and ECX. Later processors allowed setting ECX[0] to enable masked interrupts as break events for MWAIT (see below). Software can use the CPUID instruction to determine the extensions and hints supported by the processor.\n" }, { "Name": "MULX", "Alias": [], "Brief": "Unsigned Multiply Without Affecting Flags", "Description": "\nPerforms an unsigned multiplication of the implicit source operand (EDX/RDX) and the specified source operand (the third operand) and stores the low half of the result in the second destination (second operand), the high half of the result in the first destination operand (first operand), without reading or writing the arithmetic flags. This enables efficient programming where the software can interleave add with carry operations and multiplications.\nIf the first and second operand are identical, it will contain the high half of the multiplication result.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "NEG", "Alias": [], "Brief": "Two's Complement Negation", "Description": "\nReplaces the value of operand (the destination operand) with its two's complement. (This operation is equivalent to subtracting the operand from 0.) The destination operand is located in a general-purpose register or a memory location.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "NOP", "Alias": [], "Brief": "No Operation", "Description": "\nThis instruction performs no operation. It is a one-byte or multi-byte NOP that takes up space in the instruction stream but does not impact machine context, except for the EIP register.\nThe multi-byte form of NOP is available on processors with model encoding:\nThe multi-byte NOP instruction does not alter the content of a register and will not issue a memory operation. The instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "NOT", "Alias": [], "Brief": "One's Complement Negation", "Description": "\nPerforms a bitwise NOT operation (each 1 is set to 0, and each 0 is set to 1) on the destination operand and stores the result in the destination operand location. The destination operand can be a register or a memory location.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "OR", "Alias": [], "Brief": "Logical Inclusive OR", "Description": "\nPerforms a bitwise inclusive OR operation between the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result of the OR instruction is set to 0 if both corresponding bits of the first and second operands are 0; otherwise, each bit is set to 1.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "ORPD", "Alias": [], "Brief": "Bitwise Logical OR of Double", "Description": "\nPerforms a bitwise logical OR of the two or four packed double-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the destination YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: If VORPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "ORPS", "Alias": [], "Brief": "Bitwise Logical OR of Single", "Description": "\nPerforms a bitwise logical OR of the four or eight packed single-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the destination YMM register destination are zeroed.\nVEX.256 Encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: If VORPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n" }, { "Name": "OUT", "Alias": [], "Brief": "Output to Port", "Description": "\nCopies the value from the second operand (source operand) to the I/O port specified with the destination operand (first operand). The source operand can be register AL, AX, or EAX, depending on the size of the port being accessed (8, 16, or 32 bits, respectively); the destination operand can be a byte-immediate or the DX register. Using a byte immediate allows I/O port addresses 0 to 255 to be accessed; using the DX register as a source operand allows I/O ports from 0 to 65,535 to be accessed.\nThe size of the I/O port being accessed is determined by the opcode for an 8-bit I/O port or by the operand-size attribute of the instruction for a 16- or 32-bit I/O port.\nAt the machine code level, I/O instructions are shorter when accessing 8-bit I/O ports. Here, the upper eight bits of the port address will be 0.\nThis instruction is only useful for accessing I/O ports located in the processor’s I/O address space. See Chapter 16, “Input/Output,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more infor-mation on accessing I/O ports in the I/O address space.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "OUTS", "Alias": [ "OUTSB", "OUTSW", "OUTSD" ], "Brief": "Output String to Port", "Description": "\nCopies data from the source operand (second operand) to the I/O port specified with the destination operand (first operand). The source operand is a memory location, the address of which is read from either the DS:SI, DS:ESI or the RSI registers (depending on the address-size attribute of the instruction, 16, 32 or 64, respectively). (The DS segment may be overridden with a segment override prefix.) The destination operand is an I/O port address (from 0 to 65,535) that is read from the DX register. The size of the I/O port being accessed (that is, the size of the source and destination operands) is determined by the opcode for an 8-bit I/O port or by the operand-size attribute of the instruction for a 16- or 32-bit I/O port.\nAt the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the OUTS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source operand should be a symbol that indicates the size of the I/O port and the source address, and the destination operand must be DX. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the DS:(E)SI or RSI registers, which must be loaded correctly before the OUTS instruction is executed.\nThe no-operands form provides “short forms” of the byte, word, and doubleword versions of the OUTS instructions. Here also DS:(E)SI is assumed to be the source operand and DX is assumed to be the destination operand. The size of the I/O port is specified with the choice of mnemonic: OUTSB (byte), OUTSW (word), or OUTSD (doubleword).\nAfter the byte, word, or doubleword is transferred from the memory location to the I/O port, the SI/ESI/RSI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI register is incremented; if the DF flag is 1, the SI/ESI/RSI register is decremented.) The SI/ESI/RSI register is incremented or decremented by 1 for byte operations, by 2 for word operations, and by 4 for doubleword operations.\nThe OUTS, OUTSB, OUTSW, and OUTSD instructions can be preceded by the REP prefix for block input of ECX bytes, words, or doublewords. See “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in this chapter for a description of the REP prefix. This instruction is only useful for accessing I/O ports located in the processor’s I/O address space. See Chapter 16, “Input/Output,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information on accessing I/O ports in the I/O address space.\nIn 64-bit mode, the default operand size is 32 bits; operand size is not promoted by the use of REX.W. In 64-bit mode, the default address size is 64 bits, and 64-bit address is specified using RSI by default. 32-bit address using ESI is support using the prefix 67H, but 16-bit address is not supported in 64-bit mode.\n" }, { "Name": "PABSB", "Alias": [ "PABSW", "PABSD " ], "Brief": "Packed Absolute Value", "Description": "\n(V)PABSB/W/D computes the absolute value of each data element of the source operand (the second operand) and stores the UNSIGNED results in the destination operand (the first operand). (V)PABSB operates on signed bytes, (V)PABSW operates on 16-bit words, and (V)PABSD operates on signed 32-bit integers. The source operand can be an MMX register or a 64-bit memory location, or it can be an XMM register, a YMM register, a 128-bit memory loca-tion, or a 256-bit memory location. The destination operand can be an MMX, an XMM or a YMM register. Both oper-ands can be MMX registers or XMM registers. When the source operand is a 128-bit memory operand, the operand must be aligned on a 16byte boundary or a general-protection exception (#GP) will be generated.\nIn 64-bit mode, use the REX prefix to access additional registers.\n128-bit Legacy SSE version: The source operand can be an XMM register or a 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.vvvv is reserved and must be 1111b, VEX.L must be 0; otherwise instructions will #UD.\n" }, { "Name": "PACKSSWB", "Alias": [ "PACKSSDW" ], "Brief": "Pack with Signed Saturation", "Description": "\nConverts packed signed word integers into packed signed byte integers (PACKSSWB) or converts packed signed doubleword integers into packed signed word integers (PACKSSDW), using saturation to handle overflow condi-tions. See Figure 4-2 for an example of the packing operation.\n\n64-Bit SRC\n64-Bit DEST\n64-Bit DEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nC\nD\nB’\nC’\nD’\nA’\nB\nA\nFigure 4-2. Operation of the PACKSSDW Instruction Using 64-bit Operands\nThe (V)PACKSSWB instruction converts 4, 8 or 16 signed word integers from the destination operand (first operand) and 4, 8 or 16 signed word integers from the source operand (second operand) into 8, 16 or 32 signed byte integers and stores the result in the destination operand. If a signed word integer value is beyond the range of a signed byte integer (that is, greater than 7FH for a positive integer or greater than 80H for a negative integer), the saturated signed byte integer value of 7FH or 80H, respectively, is stored in the destination.\nThe (V)PACKSSDW instruction packs 2, 4 or 8 signed doublewords from the destination operand (first operand) and 2, 4 or 8 signed doublewords from the source operand (second operand) into 4, 8 or 16 signed words in the desti-nation operand (see Figure 4-2). If a signed doubleword integer value is beyond the range of a signed word (that is, greater than 7FFFH for a positive integer or greater than 8000H for a negative integer), the saturated signed word integer value of 7FFFH or 8000H, respectively, is stored into the destination.\nThe (V)PACKSSWB and (V)PACKSSDW instructions operate on either 64-bit, 128-bit operands or 256-bit operands. When operating on 64-bit operands, the destination operand must be an MMX technology register and the source operand can be either an MMX technology register or a 64-bit memory location. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PACKUSDW", "Alias": [], "Brief": "Pack with Unsigned Saturation", "Description": "\nConverts packed signed doubleword integers into packed unsigned word integers using unsigned saturation to handle overflow conditions. If the signed doubleword value is beyond the range of an unsigned word (that is, greater than FFFFH or less than 0000H), the saturated unsigned word integer value of FFFFH or 0000H, respec-tively, is stored in the destination.\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PACKUSWB", "Alias": [], "Brief": "Pack with Unsigned Saturation", "Description": "\nConverts 4, 8 or 16 signed word integers from the destination operand (first operand) and 4, 8 or 16 signed word integers from the source operand (second operand) into 8, 16 or 32 unsigned byte integers and stores the result in the destination operand. (See Figure 4-2 for an example of the packing operation.) If a signed word integer value is beyond the range of an unsigned byte integer (that is, greater than FFH or less than 00H), the saturated unsigned byte integer value of FFH or 00H, respectively, is stored in the destination.\nThe PACKUSWB instruction operates on either 64-bit, 128-bit or 256-bit operands. When operating on 64-bit oper-ands, the destination operand must be an MMX technology register and the source operand can be either an MMX technology register or a 64-bit memory location. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "PADDB", "Alias": [ "PADDW", "PADDD" ], "Brief": "Add Packed Integers", "Description": "\nPerforms a SIMD add of the packed integers from the source operand (second operand) and the destination operand (first operand), and stores the packed integer results in the destination operand. See Figure 9-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD operation. Overflow is handled with wraparound, as described in the following paragraphs.\nAdds the packed byte, word, doubleword, or quadword integers in the first source operand to the second source operand and stores the result in the destination operand. When a result is too large to be represented in the\n8/16/32 integer (overflow), the result is wrapped around and the low bits are written to the destination element (that is, the carry is ignored).\nNote that these instructions can operate on either unsigned or signed (two’s complement notation) integers; however, it does not set bits in the EFLAGS register to indicate overflow and/or a carry. To prevent undetected overflow conditions, software must control the ranges of the values operated on.\nThese instructions can operate on either 64-bit, 128-bit or 256-bit operands. When operating on 64-bit operands, the destination operand must be an MMX technology register and the source operand can be either an MMX tech-nology register or a 64-bit memory location. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PADDQ", "Alias": [], "Brief": "Add Packed Quadword Integers", "Description": "\nAdds the first operand (destination operand) to the second operand (source operand) and stores the result in the destination operand. The source operand can be a quadword integer stored in an MMX technology register or a 64-bit memory location, or it can be two packed quadword integers stored in an XMM register or an 128-bit memory location. The destination operand can be a quadword integer stored in an MMX technology register or two packed quadword integers stored in an XMM register. When packed quadword operands are used, a SIMD add is performed. When a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).\nNote that the (V)PADDQ instruction can operate on either unsigned or signed (two’s complement notation) inte-gers; however, it does not set bits in the EFLAGS register to indicate overflow and/or a carry. To prevent undetected overflow conditions, software must control the ranges of the values operated on.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PADDSB", "Alias": [ "PADDSW" ], "Brief": "Add Packed Signed Integers with Signed Saturation", "Description": "\nPerforms a SIMD add of the packed signed integers from the source operand (second operand) and the destination operand (first operand), and stores the packed integer results in the destination operand. See Figure 9-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD operation. Overflow is handled with signed saturation, as described in the following paragraphs.\nThe PADDSB instruction adds packed signed byte integers. When an individual byte result is beyond the range of a signed byte integer (that is, greater than 7FH or less than 80H), the saturated value of 7FH or 80H, respectively, is written to the destination operand.\nThe PADDSW instruction adds packed signed word integers. When an individual word result is beyond the range of a signed word integer (that is, greater than 7FFFH or less than 8000H), the saturated value of 7FFFH or 8000H, respectively, is written to the destination operand.\nThese instructions can operate on either 64-bit, 128-bit or 256-bit operands. When operating on 64-bit operands, the destination operand must be an MMX technology register and the source operand can be either an MMX tech-nology register or a 64-bit memory location. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PADDUSB", "Alias": [ "PADDUSW" ], "Brief": "Add Packed Unsigned Integers with Unsigned Saturation", "Description": "\nPerforms a SIMD add of the packed unsigned integers from the source operand (second operand) and the destina-tion operand (first operand), and stores the packed integer results in the destination operand. See Figure 9-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD operation. Overflow is handled with unsigned saturation, as described in the following paragraphs.\nThe (V)PADDUSB instruction adds packed unsigned byte integers. When an individual byte result is beyond the range of an unsigned byte integer (that is, greater than FFH), the saturated value of FFH is written to the destina-tion operand.\nThe (V)PADDUSW instruction adds packed unsigned word integers. When an individual word result is beyond the range of an unsigned word integer (that is, greater than FFFFH), the saturated value of FFFFH is written to the destination operand.\nThese instructions can operate on either 64-bit, 128-bit or 256-bit operands. When operating on 64-bit operands, the destination operand must be an MMX technology register and the source operand can be either an MMX tech-\nnology register or a 64-bit memory location. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PALIGNR", "Alias": [], "Brief": "Packed Align Right", "Description": "\n(V)PALIGNR concatenates the destination operand (the first operand) and the source operand (the second operand) into an intermediate composite, shifts the composite at byte granularity to the right by a constant imme-diate, and extracts the right-aligned result into the destination. The first and the second operands can be an MMX, XMM or a YMM register. The immediate value is considered unsigned. Immediate shift counts larger than the 2L (i.e. 32 for 128-bit operands, or 16 for 64-bit operands) produce a zero result. Both operands can be MMX regis-ters, XMM registers or YMM registers. When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nIn 64-bit mode, use the REX prefix to access additional registers.\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register and contains two 16-byte blocks. The second source operand is a YMM register or a 256-bit memory location containing two 16-byte block. The destination operand is a YMM register and contain two 16-byte results. The imm8[7:0] is the common shift count used for the two lower 16-byte block sources and the two upper 16-byte block sources. The low 16-byte block of the two source\noperands produce the low 16-byte result of the destination operand, the high 16-byte block of the two source oper-ands produce the high 16-byte result of the destination operand.\nConcatenation is done with 128-bit data in the first and second source operand for both 128-bit and 256-bit instructions. The high 128-bits of the intermediate composite 256-bit result came from the 128-bit data from the first source operand; the low 128-bits of the intermediate result came from the 128-bit data of the second source operand.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n0\n127\n0\n127\nSRC1\nSRC2\nImm8[7:0]*8\n255\n128\n128\n255\nSRC1\nSRC2\nImm8[7:0]*8\n127\n0\n128\n255\nDEST\nDEST\nFigure 4-3. 256-bit VPALIGN Instruction Operation\n" }, { "Name": "PAND", "Alias": [], "Brief": "Logical AND", "Description": "\nPerforms a bitwise logical AND operation on the first source operand and second source operand and stores the result in the destination operand. Each bit of the result is set to 1 if the corresponding bits of the first and second operands are 1, otherwise it is set to 0.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PANDN", "Alias": [], "Brief": "Logical AND NOT", "Description": "\nPerforms a bitwise logical NOT operation on the first source operand, then performs bitwise AND with second source operand and stores the result in the destination operand. Each bit of the result is set to 1 if the corre-sponding bit in the first operand is 0 and the corresponding bit in the second operand is 1, otherwise it is set to 0.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PAUSE", "Alias": [], "Brief": "Spin Loop Hint", "Description": "\nImproves the performance of spin-wait loops. When executing a “spin-wait loop,” processors will suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.\nAn additional function of the PAUSE instruction is to reduce the power consumed by a processor while executing a spin loop. A processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption.\nThis instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation).\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "PAVGB", "Alias": [ "PAVGW" ], "Brief": "Average Packed Integers", "Description": "\nPerforms a SIMD average of the packed unsigned integers from the source operand (second operand) and the destination operand (first operand), and stores the results in the destination operand. For each corresponding pair of data elements in the first and second operands, the elements are added together, a 1 is added to the temporary sum, and that result is shifted right one bit position.\nThe (V)PAVGB instruction operates on packed unsigned bytes and the (V)PAVGW instruction operates on packed unsigned words.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "PBLENDVB", "Alias": [], "Brief": "Variable Blend Packed Bytes", "Description": "\nConditionally copies byte elements from the source operand (second operand) to the destination operand (first operand) depending on mask bits defined in the implicit third register argument, XMM0. The mask bits are the most significant bit in each byte element of the XMM0 register.\nIf a mask bit is “1\", then the corresponding byte element in the source operand is copied to the destination, else the byte element in the destination operand is left unchanged.\nThe register assignment of the implicit third operand is defined to be the architectural register XMM0.\n128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. The mask register operand is implicitly defined to be the architectural register XMM0. An attempt to execute PBLENDVB with a VEX prefix will cause #UD.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. The upper bits (VLMAX-1:128) of the corresponding YMM register (destination register) are zeroed. VEX.L must be 0, otherwise the instruction will #UD. VEX.W must be 0, otherwise, the instruction will #UD.\nVEX.256 encoded version: The first source operand and the destination operand are YMM registers. The second source operand is an YMM register or 256-bit memory location. The third source register is an YMM register and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored.\nVPBLENDVB permits the mask to be any XMM or YMM register. In contrast, PBLENDVB treats XMM0 implicitly as the mask and do not support non-destructive destination operation. An attempt to execute PBLENDVB encoded with a VEX prefix will cause a #UD exception.\n" }, { "Name": "PBLENDW", "Alias": [], "Brief": "Blend Packed Words", "Description": "\nWords from the source operand (second operand) are conditionally written to the destination operand (first operand) depending on bits in the immediate operand (third operand). The immediate bits (bits 7:0) form a mask that determines whether the corresponding word in the destination is copied from the source. If a bit in the mask, corresponding to a word, is “1\", then the word is copied, else the word element in the destination operand is unchanged.\n128-bit Legacy SSE version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "PCLMULQDQ", "Alias": [], "Brief": "Carry", "Description": "\nPerforms a carry-less multiplication of two quadwords, selected from the first source and second source operand according to the value of the immediate byte. Bits 4 and 0 are used to select which 64-bit half of each operand to use according to Table 4-10, other bits of the immediate byte are ignored.\nTable 4-10. PCLMULQDQ Quadword Selection of Immediate Byte\n\n\nImm[4]\nImm[0]\nPCLMULQDQ Operation\n\n0\n0\nCL_MUL( SRC21[63:0], SRC1[63:0] )\n\n0\n1\nCL_MUL( SRC2[63:0], SRC1[127:64] )\n\n1\n0\nCL_MUL( SRC2[127:64], SRC1[63:0] )\n\n1\n1\nCL_MUL( SRC2[127:64], SRC1[127:64] )\nNOTES:\n1. SRC2 denotes the second source operand, which can be a register or memory; SRC1 denotes the first source and destination oper-\nand.\n The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nCompilers and assemblers may implement the following pseudo-op syntax to simply programming and emit the required encoding for Imm8.\nTable 4-11. Pseudo-Op and PCLMULQDQ Implementation\n\n\nPseudo-Op\nImm8 Encoding\n\nPCLMULLQLQDQ xmm1, xmm2\n0000_0000B\n\nPCLMULHQLQDQ xmm1, xmm2\n0000_0001B\n\nPCLMULLQHDQ xmm1, xmm2\n0001_0000B\n\nPCLMULHQHDQ xmm1, xmm2\n0001_0001B\n" }, { "Name": "PCMPEQB", "Alias": [ "PCMPEQW", "PCMPEQD" ], "Brief": "Compare Packed Data for Equal", "Description": "\nPerforms a SIMD compare for equality of the packed bytes, words, or doublewords in the destination operand (first operand) and the source operand (second operand). If a pair of data elements is equal, the corresponding data element in the destination operand is set to all 1s; otherwise, it is set to all 0s.\nThe (V)PCMPEQB instruction compares the corresponding bytes in the destination and source operands; the (V)PCMPEQW instruction compares the corresponding words in the destination and source operands; and the (V)PCMPEQD instruction compares the corresponding doublewords in the destination and source operands.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PCMPEQQ", "Alias": [], "Brief": "Compare Packed Qword Data for Equal", "Description": "\nPerforms an SIMD compare for equality of the packed quadwords in the destination operand (first operand) and the source operand (second operand). If a pair of data elements is equal, the corresponding data element in the desti-nation is set to all 1s; otherwise, it is set to 0s.\n128-bit Legacy SSE version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PCMPESTRI", "Alias": [], "Brief": "Packed Compare Explicit Length Strings, Return Index", "Description": "\nThe instruction compares and processes data from two string fragments based on the encoded value in the Imm8 Control Byte (see Section 4.1, “Imm8 Control Byte Operation for PCMPESTRI / PCMPESTRM / PCMPISTRI / PCMP-ISTRM”), and generates an index stored to the count register (ECX/RCX).\nEach string fragment is represented by two values. The first value is an xmm (or possibly m128 for the second operand) which contains the data elements of the string (byte or word data). The second value is stored in an input length register. The input length register is EAX/RAX (for xmm1) or EDX/RDX (for xmm2/m128). The length repre-sents the number of bytes/words which are valid for the respective xmm/m128 data.\nThe length of each input is interpreted as being the absolute-value of the value in the length register. The absolute-value computation saturates to 16 (for bytes) and 8 (for words), based on the value of imm8[bit3] when the value in the length register is greater than 16 (8) or less than -16 (-8).\nThe comparison and aggregation operations are performed according to the encoded value of Imm8 bit fields (see Section 4.1). The index of the first (or last, according to imm8[6]) set bit of IntRes2 (see Section 4.1.4) is returned in ECX. If no bits are set in IntRes2, ECX is set to 16 (8).\nNote that the Arithmetic Flags are written in a non-standard manner in order to supply the most relevant informa-tion:\nCFlag – Reset if IntRes2 is equal to zero, set otherwise\nZFlag – Set if absolute-value of EDX is < 16 (8), reset otherwise\nSFlag – Set if absolute-value of EAX is < 16 (8), reset otherwise\nOFlag – IntRes2[0]\nAFlag – Reset\nPFlag – Reset\n" }, { "Name": "PCMPESTRM", "Alias": [], "Brief": "Packed Compare Explicit Length Strings, Return Mask", "Description": "\nThe instruction compares data from two string fragments based on the encoded value in the imm8 contol byte (see Section 4.1, “Imm8 Control Byte Operation for PCMPESTRI / PCMPESTRM / PCMPISTRI / PCMPISTRM”), and gener-ates a mask stored to XMM0.\nEach string fragment is represented by two values. The first value is an xmm (or possibly m128 for the second operand) which contains the data elements of the string (byte or word data). The second value is stored in an input length register. The input length register is EAX/RAX (for xmm1) or EDX/RDX (for xmm2/m128). The length repre-sents the number of bytes/words which are valid for the respective xmm/m128 data.\nThe length of each input is interpreted as being the absolute-value of the value in the length register. The absolute-value computation saturates to 16 (for bytes) and 8 (for words), based on the value of imm8[bit3] when the value in the length register is greater than 16 (8) or less than -16 (-8).\nThe comparison and aggregation operations are performed according to the encoded value of Imm8 bit fields (see Section 4.1). As defined by imm8[6], IntRes2 is then either stored to the least significant bits of XMM0 (zero extended to 128 bits) or expanded into a byte/word-mask and then stored to XMM0.\nNote that the Arithmetic Flags are written in a non-standard manner in order to supply the most relevant informa-tion:\nCFlag – Reset if IntRes2 is equal to zero, set otherwise\nZFlag – Set if absolute-value of EDX is < 16 (8), reset otherwise\nSFlag – Set if absolute-value of EAX is < 16 (8), reset otherwise\nOFlag –IntRes2[0]\nAFlag – Reset\nPFlag – Reset\nNote: In VEX.128 encoded versions, bits (VLMAX-1:128) of XMM0 are zeroed. VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PCMPGTB", "Alias": [ "PCMPGTW", "PCMPGTD" ], "Brief": "Compare Packed Signed Integers for Greater Than", "Description": "\nPerforms an SIMD signed compare for the greater value of the packed byte, word, or doubleword integers in the destination operand (first operand) and the source operand (second operand). If a data element in the destination operand is greater than the corresponding date element in the source operand, the corresponding data element in the destination operand is set to all 1s; otherwise, it is set to all 0s.\nThe PCMPGTB instruction compares the corresponding signed byte integers in the destination and source oper-ands; the PCMPGTW instruction compares the corresponding signed word integers in the destination and source\noperands; and the PCMPGTD instruction compares the corresponding signed doubleword integers in the destina-tion and source operands.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The second source operand can be an XMM register or a 128-bit memory location. The first source operand and destination operand are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source operand and destination operand are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PCMPGTQ", "Alias": [], "Brief": "Compare Packed Data for Greater Than", "Description": "\nPerforms an SIMD signed compare for the packed quadwords in the destination operand (first operand) and the source operand (second operand). If the data element in the first (destination) operand is greater than the corresponding element in the second (source) operand, the corresponding data element in the destination is set to all 1s; otherwise, it is set to 0s.\n128-bit Legacy SSE version: The second source operand can be an XMM register or a 128-bit memory location. The first source operand and destination operand are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source operand and destination operand are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PCMPISTRI", "Alias": [], "Brief": "Packed Compare Implicit Length Strings, Return Index", "Description": "\nThe instruction compares data from two strings based on the encoded value in the Imm8 Control Byte (see Section 4.1, “Imm8 Control Byte Operation for PCMPESTRI / PCMPESTRM / PCMPISTRI / PCMPISTRM”), and generates an index stored to ECX.\nEach string is represented by a single value. The value is an xmm (or possibly m128 for the second operand) which contains the data elements of the string (byte or word data). Each input byte/word is augmented with a valid/invalid tag. A byte/word is considered valid only if it has a lower index than the least significant null byte/word. (The least significant null byte/word is also considered invalid.)\nThe comparison and aggregation operations are performed according to the encoded value of Imm8 bit fields (see Section 4.1). The index of the first (or last, according to imm8[6]) set bit of IntRes2 is returned in ECX. If no bits are set in IntRes2, ECX is set to 16 (8).\nNote that the Arithmetic Flags are written in a non-standard manner in order to supply the most relevant informa-tion:\nCFlag – Reset if IntRes2 is equal to zero, set otherwise\nZFlag – Set if any byte/word of xmm2/mem128 is null, reset otherwise\nSFlag – Set if any byte/word of xmm1 is null, reset otherwise\nOFlag –IntRes2[0]\nAFlag – Reset\nPFlag – Reset\nNote: In VEX.128 encoded version, VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PCMPISTRM", "Alias": [], "Brief": "Packed Compare Implicit Length Strings, Return Mask", "Description": "\nThe instruction compares data from two strings based on the encoded value in the imm8 byte (see Section 4.1, “Imm8 Control Byte Operation for PCMPESTRI / PCMPESTRM / PCMPISTRI / PCMPISTRM”) generating a mask stored to XMM0.\nEach string is represented by a single value. The value is an xmm (or possibly m128 for the second operand) which contains the data elements of the string (byte or word data). Each input byte/word is augmented with a valid/invalid tag. A byte/word is considered valid only if it has a lower index than the least significant null byte/word. (The least significant null byte/word is also considered invalid.)\nThe comparison and aggregation operation are performed according to the encoded value of Imm8 bit fields (see Section 4.1). As defined by imm8[6], IntRes2 is then either stored to the least significant bits of XMM0 (zero extended to 128 bits) or expanded into a byte/word-mask and then stored to XMM0.\nNote that the Arithmetic Flags are written in a non-standard manner in order to supply the most relevant informa-tion:\nCFlag – Reset if IntRes2 is equal to zero, set otherwise\nZFlag – Set if any byte/word of xmm2/mem128 is null, reset otherwise\nSFlag – Set if any byte/word of xmm1 is null, reset otherwise\nOFlag – IntRes2[0]\nAFlag – Reset\nPFlag – Reset\nNote: In VEX.128 encoded versions, bits (VLMAX-1:128) of XMM0 are zeroed. VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PDEP", "Alias": [], "Brief": "Parallel Bits Deposit", "Description": "\nPDEP uses a mask in the second source operand (the third operand) to transfer/scatter contiguous low order bits in the first source operand (the second operand) into the destination (the first operand). PDEP takes the low bits from the first source operand and deposit them in the destination operand at the corresponding bit locations that are set in the second source operand (mask). All other bits (bits not set in mask) in destination are set to zero.\nSRC1\nS31 S30 S29 S28 S27\nS7\nS6\nS5\nS4\nS3\nS2\nS1\nS0\nSRC2\n0\n0\n0\n1\n0\n1\n0\n1\n0\n0\n1\n0\n0\n(mask)\nDEST\n0\n0\n0\n0\n0\n0\n0\n0\n0\nS1\nS0\nS3\nS2\nbit 0\nbit 31\nFigure 4-4. PDEP Example\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "PEXT", "Alias": [], "Brief": "Parallel Bits Extract", "Description": "\nPEXT uses a mask in the second source operand (the third operand) to transfer either contiguous or non-contig-uous bits in the first source operand (the second operand) to contiguous low order bit positions in the destination (the first operand). For each bit set in the MASK, PEXT extracts the corresponding bits from the first source operand and writes them into contiguous lower bits of destination operand. The remaining upper bits of destination are zeroed.\nSRC1\nS31 S30 S29 S28 S27\nS7\nS6\nS5\nS4\nS3\nS2\nS1\nS0\nSRC2\n0\n0\n0\n0\n1\n0\n1\n0\n1\n0\n1\n0\n0\n(mask)\nS2\n0\n0\n0\n0\n0\n0\n0\n0\n0\nS7\nS5\nDEST\nS28\nbit 0\nbit 31\nFigure 4-5. PEXT Example\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "PEXTRB", "Alias": [ "PEXTRD", "PEXTRQ " ], "Brief": "Extract Byte/Dword/Qword", "Description": "\nExtract a byte/dword/qword integer value from the source XMM register at a byte/dword/qword offset determined from imm8[3:0]. The destination can be a register or byte/dword/qword memory location. If the destination is a register, the upper bits of the register are zero extended.\nIn legacy non-VEX encoded version and if the destination operand is a register, the default operand size in 64-bit mode for PEXTRB/PEXTRD is 64 bits, the bits above the least significant byte/dword data are filled with zeros. PEXTRQ is not encodable in non-64-bit modes and requires REX.W in 64-bit mode.\nNote: In VEX.128 encoded versions, VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD. If the destination operand is a register, the default operand size in 64-bit mode for VPEXTRB/VPEXTRD is 64 bits, the bits above the least significant byte/word/dword data are filled with zeros. Attempt to execute VPEXTRQ in non-64-bit mode will cause #UD.\n" }, { "Name": "PEXTRW", "Alias": [], "Brief": "Extract Word", "Description": "\nCopies the word in the source operand (second operand) specified by the count operand (third operand) to the destination operand (first operand). The source operand can be an MMX technology register or an XMM register. The destination operand can be the low word of a general-purpose register or a 16-bit memory address. The count operand is an 8-bit immediate. When specifying a word location in an MMX technology register, the 2 least-signifi-cant bits of the count operand specify the location; for an XMM register, the 3 least-significant bits specify the loca-tion. The content of the destination register above bit 16 is cleared (set to all 0s).\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15, R8-15). If the destination operand is a general-purpose register, the default operand size is 64-bits in 64-bit mode.\nNote: In VEX.128 encoded versions, VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD. If the destination operand is a register, the default operand size in 64-bit mode for VPEXTRW is 64 bits, the bits above the least significant byte/word/dword data are filled with zeros.\n" }, { "Name": "PHADDW", "Alias": [ "PHADDD " ], "Brief": "Packed Horizontal Add", "Description": "\n(V)PHADDW adds two adjacent 16-bit signed integers horizontally from the source and destination operands and packs the 16-bit signed results to the destination operand (first operand). (V)PHADDD adds two adjacent 32-bit signed integers horizontally from the source and destination operands and packs the 32-bit signed results to the destination operand (first operand). When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nNote that these instructions can operate on either unsigned or signed (two’s complement notation) integers; however, it does not set bits in the EFLAGS register to indicate overflow and/or a carry. To prevent undetected overflow conditions, software must control the ranges of the values operated on.\nLegacy SSE instructions: Both operands can be MMX registers. The second source operand can be an MMX register or a 64-bit memory location.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nIn 64-bit mode, use the REX prefix to access additional registers.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: Horizontal addition of two adjacent data elements of the low 16-bytes of the first and second source operands are packed into the low 16-bytes of the destination operand. Horizontal addition of two adjacent data elements of the high 16-bytes of the first and second source operands are packed into the high 16-bytes of the destination operand. The first source and destination operands are YMM registers. The second source operand can be an YMM register or a 256-bit memory location.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n\n\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\n\n\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\nSRC2\nSRC1\nS4\nS3\nS0\nS7\nS3\nS3\nS2\nS1\n255\n0\nDest\nFigure 4-6. 256-bit VPHADDD Instruction Operation\n" }, { "Name": "PHADDSW", "Alias": [], "Brief": "Packed Horizontal Add and Saturate", "Description": "\n(V)PHADDSW adds two adjacent signed 16-bit integers horizontally from the source and destination operands and saturates the signed results; packs the signed, saturated 16-bit results to the destination operand (first operand) When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nLegacy SSE version: Both operands can be MMX registers. The second source operand can be an MMX register or a 64-bit memory location.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nIn 64-bit mode, use the REX prefix to access additional registers.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The first source and destination operands are YMM registers. The second source operand can be an YMM register or a 256-bit memory location.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PHMINPOSUW", "Alias": [], "Brief": "Packed Horizontal Word Minimum", "Description": "\nDetermine the minimum unsigned word value in the source operand (second operand) and place the unsigned word in the low word (bits 0-15) of the destination operand (first operand). The word index of the minimum value is stored in bits 16-18 of the destination operand. The remaining upper bits of the destination are set to zero.\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PHSUBW", "Alias": [ "PHSUBD " ], "Brief": "Packed Horizontal Subtract", "Description": "\n(V)PHSUBW performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands, and packs the signed 16-bit results to the destination operand (first operand). (V)PHSUBD performs horizontal subtraction on each adjacent pair of 32-bit signed integers by subtracting the most significant doubleword from the least signifi-cant doubleword of each pair, and packs the signed 32-bit result to the destination operand. When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nLegacy SSE version: Both operands can be MMX registers. The second source operand can be an MMX register or a 64-bit memory location.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nIn 64-bit mode, use the REX prefix to access additional registers.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The first source and destination operands are YMM registers. The second source operand can be an YMM register or a 256-bit memory location.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PHSUBSW", "Alias": [], "Brief": "Packed Horizontal Subtract and Saturate", "Description": "\n(V)PHSUBSW performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed, saturated 16-bit results are packed to the destination operand (first operand). When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nLegacy SSE version: Both operands can be MMX registers. The second source operand can be an MMX register or a 64-bit memory location.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nIn 64-bit mode, use the REX prefix to access additional registers.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The first source and destination operands are YMM registers. The second source operand can be an YMM register or a 256-bit memory location.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PINSRB", "Alias": [ "PINSRD", "PINSRQ " ], "Brief": "Insert Byte/Dword/Qword", "Description": "\nCopies a byte/dword/qword from the source operand (second operand) and inserts it in the destination operand (first operand) at the location specified with the count operand (third operand). (The other elements in the desti-nation register are left untouched.) The source operand can be a general-purpose register or a memory location. (When the source operand is a general-purpose register, PINSRB copies the low byte of the register.) The destina-tion operand is an XMM register. The count operand is an 8-bit immediate. When specifying a qword[dword, byte] location in an an XMM register, the [2, 4] least-significant bit(s) of the count operand specify the location.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15, R8-15). Use of REX.W permits the use of 64 bit general purpose registers.\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, other-wise the instruction will #UD. Attempt to execute VPINSRQ in non-64-bit mode will cause #UD.\n" }, { "Name": "PINSRW", "Alias": [], "Brief": "Insert Word", "Description": "\nCopies a word from the source operand (second operand) and inserts it in the destination operand (first operand) at the location specified with the count operand (third operand). (The other words in the destination register are left untouched.) The source operand can be a general-purpose register or a 16-bit memory location. (When the source operand is a general-purpose register, the low word of the register is copied.) The destination operand can be an MMX technology register or an XMM register. The count operand is an 8-bit immediate. When specifying a word location in an MMX technology register, the 2 least-significant bits of the count operand specify the location; for an XMM register, the 3 least-significant bits specify the location.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15, R8-15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, other-wise the instruction will #UD.\n" }, { "Name": "PMADDUBSW", "Alias": [], "Brief": "Multiply and Add Packed Signed and Unsigned Bytes", "Description": "\n(V)PMADDUBSW multiplies vertically each unsigned byte of the destination operand (first operand) with the corre-sponding signed byte of the source operand (second operand), producing intermediate signed 16-bit integers. Each adjacent pair of signed words is added and the saturated result is packed to the destination operand. For example, the lowest-order bytes (bits 7-0) in the source and destination operands are multiplied and the intermediate signed word result is added with the corresponding intermediate result from the 2nd lowest-order bytes (bits 15-8) of the operands; the sign-saturated result is stored in the lowest word of the destination register (15-0). The same oper-ation is performed on the other pairs of adjacent bytes. Both operands can be MMX register or XMM registers. When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nIn 64-bit mode, use the REX prefix to access additional registers.\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The first source and destination operands are YMM registers. The second source operand can be an YMM register or a 256-bit memory location.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMADDWD", "Alias": [], "Brief": "Multiply and Add Packed Integers", "Description": "\nMultiplies the individual signed words of the destination operand (first operand) by the corresponding signed words of the source operand (second operand), producing temporary signed, doubleword results. The adjacent double-word results are then summed and stored in the destination operand. For example, the corresponding low-order words (15-0) and (31-16) in the source and destination operands are multiplied by one another and the double-word results are added together and stored in the low doubleword of the destination register (31-0). The same operation is performed on the other pairs of adjacent words. (Figure 4-7 shows this operation when using 64-bit operands).\nThe (V)PMADDWD instruction wraps around only in one situation: when the 2 pairs of words being operated on in a group are all 8000H. In this case, the result wraps around to 80000000H.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The first source and destination operands are MMX registers. The second source operand is an MMX register or a 64-bit memory location.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n\nSRC\nDEST\nTEMP\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX3\nX1\nX0\nY2\nY1\nY0\nY3\nX2\nX3 ∗ Y3\nX2 ∗ Y2\nX0 ∗ Y0\nX1 ∗ Y1\n(X3∗Y3) + (X2∗Y2)\n(X1∗Y1) + (X0∗Y0)\nFigure 4-7. PMADDWD Execution Model Using 64-bit Operands\n" }, { "Name": "PMAXSB", "Alias": [], "Brief": "Maximum of Packed Signed Byte Integers", "Description": "\nCompares packed signed byte integers in the destination operand (first operand) and the source operand (second operand), and returns the maximum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMAXSD", "Alias": [], "Brief": "Maximum of Packed Signed Dword Integers", "Description": "\nCompares packed signed dword integers in the destination operand (first operand) and the source operand (second operand), and returns the maximum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMAXSW", "Alias": [], "Brief": "Maximum of Packed Signed Word Integers", "Description": "\nPerforms a SIMD compare of the packed signed word integers in the destination operand (first operand) and the source operand (second operand), and returns the maximum value for each pair of word integers to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMAXUB", "Alias": [], "Brief": "Maximum of Packed Unsigned Byte Integers", "Description": "\nPerforms a SIMD compare of the packed unsigned byte integers in the destination operand (first operand) and the source operand (second operand), and returns the maximum value for each pair of byte integers to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMAXUD", "Alias": [], "Brief": "Maximum of Packed Unsigned Dword Integers", "Description": "\nCompares packed unsigned dword integers in the destination operand (first operand) and the source operand (second operand), and returns the maximum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMAXUW", "Alias": [], "Brief": "Maximum of Packed Word Integers", "Description": "\nCompares packed unsigned word integers in the destination operand (first operand) and the source operand (second operand), and returns the maximum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMINSB", "Alias": [], "Brief": "Minimum of Packed Signed Byte Integers", "Description": "\nCompares packed signed byte integers in the destination operand (first operand) and the source operand (second operand), and returns the minimum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMINSD", "Alias": [], "Brief": "Minimum of Packed Dword Integers", "Description": "\nCompares packed signed dword integers in the destination operand (first operand) and the source operand (second operand), and returns the minimum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMINSW", "Alias": [], "Brief": "Minimum of Packed Signed Word Integers", "Description": "\nPerforms a SIMD compare of the packed signed word integers in the destination operand (first operand) and the source operand (second operand), and returns the minimum value for each pair of word integers to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMINUB", "Alias": [], "Brief": "Minimum of Packed Unsigned Byte Integers", "Description": "\nPerforms a SIMD compare of the packed unsigned byte integers in the destination operand (first operand) and the source operand (second operand), and returns the minimum value for each pair of byte integers to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand can be an MMX technology register.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMINUD", "Alias": [], "Brief": "Minimum of Packed Dword Integers", "Description": "\nCompares packed unsigned dword integers in the destination operand (first operand) and the source operand (second operand), and returns the minimum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMINUW", "Alias": [], "Brief": "Minimum of Packed Word Integers", "Description": "\nCompares packed unsigned word integers in the destination operand (first operand) and the source operand (second operand), and returns the minimum for each packed value in the destination operand.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMOVMSKB", "Alias": [], "Brief": "Move Byte Mask", "Description": "\nCreates a mask made up of the most significant bit of each byte of the source operand (second operand) and stores the result in the low byte or word of the destination operand (first operand).\nThe byte mask is 8 bits for 64-bit source operand, 16 bits for 128-bit source operand and 32 bits for 256-bit source operand. The destination operand is a general-purpose register.\nIn 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. The default operand size is 64-bit in 64-bit mode.\nLegacy SSE version: The source operand is an MMX technology register.\n128-bit Legacy SSE version: The source operand is an XMM register.\nVEX.128 encoded version: The source operand is an XMM register.\nVEX.256 encoded version: The source operand is a YMM register.\nNote: VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMOVSX", "Alias": [], "Brief": "Packed Move with Sign Extend", "Description": "\nSign-extend the low byte/word/dword values in each word/dword/qword element of the source operand (second operand) to word/dword/qword integers and stored as packed data in the destination operand (first operand).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The destination register is YMM Register.\nNote: VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMOVZX", "Alias": [], "Brief": "Packed Move with Zero Extend", "Description": "\nZero-extend the low byte/word/dword values in each word/dword/qword element of the source operand (second operand) to word/dword/qword integers and stored as packed data in the destination operand (first operand).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The destination register is YMM Register.\nNote: VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMULDQ", "Alias": [], "Brief": "Multiply Packed Signed Dword Integers", "Description": "\nMultiplies the first source operand by the second source operand and stores the result in the destination operand.\nFor PMULDQ and VPMULDQ (VEX.128 encoded version), the second source operand is two packed signed double-word integers stored in the first (low) and third doublewords of an XMM register or a 128-bit memory location. The first source operand is two packed signed doubleword integers stored in the first and third doublewords of an XMM register. The destination contains two packed signed quadword integers stored in an XMM register. For 128-bit memory operands, 128 bits are fetched from memory, but only the first and third doublewords are used in the computation.\nFor VPMULDQ (VEX.256 encoded version), the second source operand is four packed signed doubleword integers stored in the first (low), third, fifth and seventh doublewords of an YMM register or a 256-bit memory location. The first source operand is four packed signed doubleword integers stored in the first, third, fifth and seventh double-words of an XMM register. The destination contains four packed signed quadword integers stored in an YMM register. For 256-bit memory operands, 256 bits are fetched from memory, but only the first, third, fifth and seventh doublewords are used in the computation.\nWhen a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\n" }, { "Name": "PMULHRSW", "Alias": [], "Brief": "Packed Multiply High with Round and Scale", "Description": "\nPMULHRSW multiplies vertically each signed 16-bit integer from the destination operand (first operand) with the corresponding signed 16-bit integer of the source operand (second operand), producing intermediate, signed 32-bit integers. Each intermediate 32-bit integer is truncated to the 18 most significant bits. Rounding is always performed by adding 1 to the least significant bit of the 18-bit intermediate result. The final result is obtained by selecting the 16 bits immediately to the right of the most significant bit of each 18-bit intermediate result and packed to the destination operand.\nWhen the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nIn 64-bit mode, use the REX prefix to access additional registers.\nLegacy SSE version: Both operands can be MMX registers. The second source operand is an MMX register or a 64-bit memory location.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\n" }, { "Name": "PMULHUW", "Alias": [], "Brief": "Multiply Packed Unsigned Integers and Store High Result", "Description": "\nPerforms a SIMD unsigned multiply of the packed unsigned word integers in the destination operand (first operand) and the source operand (second operand), and stores the high 16 bits of each 32-bit intermediate results in the destination operand. (Figure 4-8 shows this operation when using 64-bit operands.)\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\n\nSRC\nDEST\nZ0 = X0 ∗ Y0\nTEMP\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX2\nX1\nX0\nY3\nZ3[31:16]\nZ1[31:16]\nZ0[31:16]\nX3\nZ2[31:16]\nZ3 = X3 ∗ Y3\nZ1 = X1 ∗ Y1\nZ2 = X2 ∗ Y2\n\n\n\n\nY2\nY1\nY0\nFigure 4-8. PMULHUW and PMULHW Instruction Operation Using 64-bit Operands\n" }, { "Name": "PMULHW", "Alias": [], "Brief": "Multiply Packed Signed Integers and Store High Result", "Description": "\nPerforms a SIMD signed multiply of the packed signed word integers in the destination operand (first operand) and the source operand (second operand), and stores the high 16 bits of each intermediate 32-bit result in the destina-tion operand. (Figure 4-8 shows this operation when using 64-bit operands.)\nn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\n" }, { "Name": "PMULLD", "Alias": [], "Brief": "Multiply Packed Signed Dword Integers and Store Low Result", "Description": "\nPerforms four signed multiplications from four pairs of signed dword integers and stores the lower 32 bits of the four 64-bit products in the destination operand (first operand). Each dword element in the destination operand is multiplied with the corresponding dword element of the source operand (second operand) to obtain a 64-bit inter-mediate product.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PMULLW", "Alias": [], "Brief": "Multiply Packed Signed Integers and Store Low Result", "Description": "\nPerforms a SIMD signed multiply of the packed signed word integers in the destination operand (first operand) and the source operand (second operand), and stores the low 16 bits of each intermediate 32-bit result in the destina-tion operand. (Figure 4-8 shows this operation when using 64-bit operands.)\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD.\nVEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.\n\nSRC\nDEST\nTEMP\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX3\nX2\nX1\nZ2[15:0]\nZ1[15:0]\nZ3[15:0]\nX0\nZ0[15:0]\nZ2 = X2 ∗ Y2\nZ3 = X3 ∗ Y3\nZ1 = X1 ∗ Y1\nZ0 = X0 ∗ Y0\n\n\n\n\n\nY3\nY2\nY1\nY0\nFigure 4-9. PMULLU Instruction Operation Using 64-bit Operands\n" }, { "Name": "PMULUDQ", "Alias": [], "Brief": "Multiply Packed Unsigned Doubleword Integers", "Description": "\nMultiplies the first operand (destination operand) by the second operand (source operand) and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an unsigned doubleword integer stored in the low doubleword of an MMX technology register or a 64-bit memory location. The destination operand can be an unsigned doubleword integer stored in the low doubleword an MMX technology register. The result is an unsigned quadword integer stored in the destination an MMX technology register. When a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).\nFor 64-bit memory operands, 64 bits are fetched from memory, but only the low doubleword is used in the compu-tation.\n128-bit Legacy SSE version: The second source operand is two packed unsigned doubleword integers stored in the first (low) and third doublewords of an XMM register or a 128-bit memory location. For 128-bit memory operands, 128 bits are fetched from memory, but only the first and third doublewords are used in the computation.The first source operand is two packed unsigned doubleword integers stored in the first and third doublewords of an XMM register. The destination contains two packed unsigned quadword integers stored in an XMM register. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The second source operand is two packed unsigned doubleword integers stored in the first (low) and third doublewords of an XMM register or a 128-bit memory location. For 128-bit memory operands, 128 bits are fetched from memory, but only the first and third doublewords are used in the computation.The first\nsource operand is two packed unsigned doubleword integers stored in the first and third doublewords of an XMM register. The destination contains two packed unsigned quadword integers stored in an XMM register. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is four packed unsigned doubleword integers stored in the first (low), third, fifth and seventh doublewords of a YMM register or a 256-bit memory location. For 256-bit memory operands, 256 bits are fetched from memory, but only the first, third, fifth and seventh doublewords are used in the computation.The first source operand is four packed unsigned doubleword integers stored in the first, third, fifth and seventh doublewords of an YMM register. The destination contains four packed unaligned quadword integers stored in an YMM register.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "POP", "Alias": [], "Brief": "Pop a Value from the Stack", "Description": "\nLoads the value from the top of the stack to the location specified with the destination operand (or explicit opcode) and then increments the stack pointer. The destination operand can be a general-purpose register, memory loca-tion, or segment register.\nAddress and operand sizes are determined and used as follows:\nThe address size is used only when writing to a destination operand in memory.\nThe operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is incremented (2, 4 or 8).\nThe stack-address size determines the width of the stack pointer when reading from the stack in memory and when incrementing the stack pointer. (As stated above, the amount by which the stack pointer is incremented is determined by the operand size.)\nIf the destination operand is one of the segment registers DS, ES, FS, GS, or SS, the value loaded into the register must be a valid segment selector. In protected mode, popping a segment selector into a segment register automat-ically causes the descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register and causes the selector and the descriptor information to be validated (see the “Operation” section below).\nA NULL value (0000-0003) may be popped into the DS, ES, FS, or GS register without causing a general protection fault. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a NULL value causes a general protection exception (#GP). In this situation, no memory reference occurs and the saved value of the segment register is NULL.\nThe POP instruction cannot pop a value into the CS register. To load the CS register from the stack, use the RET instruction.\nIf the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0H as a result of the POP instruction, the resulting location of the memory write is processor-family-specific.\nThe POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.\nA POP SS instruction inhibits all interrupts, including the NMI interrupt, until after execution of the next instruction. This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt1. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). When in 64-bit mode, POPs using 32-bit operands are not encodable and POPs to DS, ES, SS are not valid. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "POPA", "Alias": [ "POPAD" ], "Brief": "Pop All General", "Description": "\nPops doublewords (POPAD) or words (POPA) from the stack into the general-purpose registers. The registers are loaded in the following order: EDI, ESI, EBP, EBX, EDX, ECX, and EAX (if the operand-size attribute is 32) and DI, SI, BP, BX, DX, CX, and AX (if the operand-size attribute is 16). (These instructions reverse the operation of the PUSHA/PUSHAD instructions.) The value on the stack for the ESP or SP register is ignored. Instead, the ESP or SP register is incremented after each register is loaded.\nThe POPA (pop all) and POPAD (pop all double) mnemonics reference the same opcode. The POPA instruction is intended for use when the operand-size attribute is 16 and the POPAD instruction for when the operand-size attri-bute is 32. Some assemblers may force the operand size to 16 when POPA is used and to 32 when POPAD is used (using the operand-size override prefix [66H] if necessary). Others may treat these mnemonics as synonyms (POPA/POPAD) and use the current setting of the operand-size attribute to determine the size of values to be popped from the stack, regardless of the mnemonic used. (The D flag in the current code segment’s segment descriptor determines the operand-size attribute.)\nThis instruction executes as described in non-64-bit modes. It is not valid in 64-bit mode.\n" }, { "Name": "POPCNT", "Alias": [], "Brief": "Return the Count of Number of Bits Set to 1", "Description": "\nThis instruction calculates of number of bits set to 1 in the second operand (source) and returns the count in the first operand (a destination register).\n" }, { "Name": "POPF", "Alias": [ "POPFD", "POPFQ" ], "Brief": "Pop Stack into EFLAGS Register", "Description": "\nPops a doubleword (POPFD) from the top of the stack (if the current operand-size attribute is 32) and stores the value in the EFLAGS register, or pops a word from the top of the stack (if the operand-size attribute is 16) and stores it in the lower 16 bits of the EFLAGS register (that is, the FLAGS register). These instructions reverse the operation of the PUSHF/PUSHFD instructions.\nThe POPF (pop flags) and POPFD (pop flags double) mnemonics reference the same opcode. The POPF instruction is intended for use when the operand-size attribute is 16; the POPFD instruction is intended for use when the operand-size attribute is 32. Some assemblers may force the operand size to 16 for POPF and to 32 for POPFD. Others may treat the mnemonics as synonyms (POPF/POPFD) and use the setting of the operand-size attribute to determine the size of values to pop from the stack.\nThe effect of POPF/POPFD on the EFLAGS register changes, depending on the mode of operation. See the Table 4-12 and key below for details.\nWhen operating in protected, compatibility, or 64-bit mode at privilege level 0 (or in real-address mode, the equiv-alent to privilege level 0), all non-reserved flags in the EFLAGS register except RF1, VIP, VIF, and VM may be modi-fied. VIP, VIF and VM remain unaffected.\nWhen operating in protected, compatibility, or 64-bit mode with a privilege level greater than 0, but less than or equal to IOPL, all flags can be modified except the IOPL field and RF1, IF, VIP, VIF, and VM; these remain unaf-fected. The AC and ID flags can only be modified if the operand-size attribute is 32. The interrupt flag (IF) is altered only when executing at a level at least as privileged as the IOPL. If a POPF/POPFD instruction is executed with insufficient privilege, an exception does not occur but privileged bits do not change.\nWhen operating in virtual-8086 mode (EFLAGS.VM = 1) without the virtual-8086 mode extensions (CR4.VME = 0), the POPF/POPFD instructions can be used only if IOPL = 3; otherwise, a general-protection exception (#GP) occurs. If the virtual-8086 mode extensions are enabled (CR4.VME = 1), POPF (but not POPFD) can be executed in virtual-8086 mode with IOPL < 3.\nIn 64-bit mode, use REX.W to pop the top of stack to RFLAGS. The mnemonic assigned is POPFQ (note that the 32-bit operand is not encodable). POPFQ pops 64 bits from the stack, loads the lower 32 bits into RFLAGS, and zero extends the upper bits of RFLAGS.\nSee Chapter 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more informa-tion about the EFLAGS registers.\n1.\nRF is always zero after the execution of POPF. This is because POPF, like all instructions, clears RF as it begins to execute.\nTable 4-12. Effect of POPF/POPFD on the EFLAGS Register\n\nFlags\nMode\nOperand\nCPL\nIOPL\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nN\nN\nN\nN\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nN\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nN\nN\nN\nN\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nN\nN\nN\nN\nN\nS\nN\nS\nS\nN\nS\nS\nS\nS\nS\nS\nN\nN\nN\nN\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nN\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nS\nN\nN\nS\nN\nS\nN\nS\nS\nN\nS\nS\nS\nS\nS\nS\nS\nN\nN\nS\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\n16\n3\n0-2\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nN\nN\nN\nN\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\n32\n3\n0-2\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nS\nN\nN\nS\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\nN/\nX\nN/\nX\nSV/\nX\nN/\nX\nN/\nX\nS/\nX\nN/X\nS/\nX\nS/\nX\nN/\nX\nS/\nX\nS/\nX\nS/\nX\nS/\nX\nS/\nX\nS/\nX\nN\nN\nN\nN\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\n32\n3\n0-2\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nX\nS\nN\nN\nS\nN\nS\nN\nS\nS\nS\nS\nS\nS\nS\nS\nS\nNotes\n21\n20\n19\n18\n17\n16\n14\n13:12\n11\n10\n9\n8\n7\n6\n4\n2\n0\nSize\nID\nVIP\nVIF\nAC\nVM\nRF\nNT\nIOPL\nOF\nDF\nIF\nTF\nSF\nZF\nAF\nPF\nCF\nReal-Address\n16\n0\n0-3\n0\nMode (CR0.PE\n32\n0\n0-3\n0\n= 0)\nProtected,\n16\n0\n0-3\n0\nCompatibility,\n16\n1-3\n<CPL\n0\nand 64-Bit\nModes\n16\n1-3\n≥CPL\n0\n32, 64\n0\n0-3\n0\n(CR0.PE = 1, EFLAGS.VM =\n32, 64\n1-3\n<CPL\n0\n0)\n32, 64\n1-3\n≥CPL\n0\n1\nVirtual-8086 (CR0.PE = 1,\n16\n3\n3\n0\nEFLAGS.VM =\n1\n1,\nCR4.VME = 0)\n32\n3\n3\n0\n2\nVME\n16\n3\n0-2\n0/\n(CR0.PE = 1,\nX\nEFLAGS.VM =\n16\n3\n3\n0\n1,\n1\nCR4.VME = 1)\n32\n3\n3\n0\nNOTES:\n1. #GP fault - no flag update\n2. #GP fault with no flag update if VIP=1 in EFLAGS register and IF=1 in FLAGS value on stack\n\nKey\n\nUpdated from stack\n\nUpdated from IF (bit 9) in FLAGS value on stack\n\nNo change in value\n\nNo EFLAGS update\n0\nValue is cleared\nS\nSV\nN\nX\n" }, { "Name": "POR", "Alias": [], "Brief": "Bitwise Logical OR", "Description": "\nPerforms a bitwise logical OR operation on the source operand (second operand) and the destination operand (first operand) and stores the result in the destination operand. Each bit of the result is set to 1 if either or both of the corresponding bits of the first and second operands are 1; otherwise, it is set to 0.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE version: The second source operand is an XMM register or a 128-bit memory location. The first source and destination operands can be XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The second source operand is an XMM register or a 128-bit memory location. The first source and destination operands can be XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source and destination operands can be YMM registers.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PREFETCHW", "Alias": [], "Brief": "Prefetch Data into Caches in Anticipation of a Write", "Description": "\nFetches the cache line of data from memory that contains the byte specified with the source operand to a location in the 1st or 2nd level cache and invalidates all other cached instances of the line.\nThe source operand is a byte memory location. If the line selected is already present in the lowest level cache and is already in an exclusively owned state, no data movement occurs. Prefetches from non-writeback memory are ignored.\nThe PREFETCHW instruction is merely a hint and does not affect program behavior. If executed, this instruction moves data closer to the processor and invalidates any other cached copy in anticipation of the line being written to in the future.\nThe characteristic of prefetch locality hints is implementation-dependent, and can be overloaded or ignored by a processor implementation. The amount of data prefetched is also processor implementation-dependent. It will, however, be a minimum of 32 bytes.\nIt should be noted that processors are free to speculatively fetch and cache data with exclusive ownership from system memory regions that permit such accesses (that is, the WB memory type). A PREFETCHW instruction is considered a hint to this speculative behavior. Because this speculative fetching can occur at any time and is not tied to instruction execution, a PREFETCHW instruction is not ordered with respect to the fence instructions (MFENCE, SFENCE, and LFENCE) or locked memory references. A PREFETCHW instruction is also unordered with respect to CLFLUSH instructions, other PREFETCHW instructions, or any other general instruction\nIt is ordered with respect to serializing instructions such as CPUID, WRMSR, OUT, and MOV CR.\nThis instruction's operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "PREFETCHWT1", "Alias": [], "Brief": "Prefetch Vector Data Into Caches with Intent to Write and T1 Hint", "Description": "\nFetches the line of data from memory that contains the byte specified with the source operand to a location in the cache hierarchy specified by an intent to write hint (so that data is brought into ‘Exclusive’ state via a request for ownership) and a locality hint:\nThe source operand is a byte memory location. (The locality hints are encoded into the machine level instruction using bits 3 through 5 of the ModR/M byte. Use of any ModR/M value other than the specified ones will lead to unpredictable behavior.)\nIf the line selected is already present in the cache hierarchy at a level closer to the processor, no data movement occurs. Prefetches from uncacheable or WC memory are ignored.\nThe PREFETCHh instruction is merely a hint and does not affect program behavior. If executed, this instruction moves data closer to the processor in anticipation of future use.\nThe implementation of prefetch locality hints is implementation-dependent, and can be overloaded or ignored by a processor implementation. The amount of data prefetched is also processor implementation-dependent. It will, however, be a minimum of 32 bytes.\nIt should be noted that processors are free to speculatively fetch and cache data from system memory regions that are assigned a memory-type that permits speculative reads (that is, the WB, WC, and WT memory types). A PREFETCHh instruction is considered a hint to this speculative behavior. Because this speculative fetching can occur at any time and is not tied to instruction execution, a PREFETCHh instruction is not ordered with respect to the fence instructions (MFENCE, SFENCE, and LFENCE) or locked memory references. A PREFETCHh instruction is also unordered with respect to CLFLUSH instructions, other PREFETCHh instructions, or any other general instruction. It is ordered with respect to serializing instructions such as CPUID, WRMSR, OUT, and MOV CR.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "PREFETCHh", "Alias": [], "Brief": "Prefetch Data Into Caches", "Description": "\nFetches the line of data from memory that contains the byte specified with the source operand to a location in the cache hierarchy specified by a locality hint:\n— Pentium III processor—1st- or 2nd-level cache.\n— Pentium 4 and Intel Xeon processors—2nd-level cache.\n— Pentium III processor—2nd-level cache.\n— Pentium 4 and Intel Xeon processors—2nd-level cache.\n— Pentium III processor—2nd-level cache.\n— Pentium 4 and Intel Xeon processors—2nd-level cache.\n— Pentium III processor—1st-level cache\n— Pentium 4 and Intel Xeon processors—2nd-level cache\nThe source operand is a byte memory location. (The locality hints are encoded into the machine level instruction using bits 3 through 5 of the ModR/M byte.)\nIf the line selected is already present in the cache hierarchy at a level closer to the processor, no data movement occurs. Prefetches from uncacheable or WC memory are ignored.\nThe PREFETCHh instruction is merely a hint and does not affect program behavior. If executed, this instruction moves data closer to the processor in anticipation of future use.\nThe implementation of prefetch locality hints is implementation-dependent, and can be overloaded or ignored by a processor implementation. The amount of data prefetched is also processor implementation-dependent. It will, however, be a minimum of 32 bytes.\nIt should be noted that processors are free to speculatively fetch and cache data from system memory regions that are assigned a memory-type that permits speculative reads (that is, the WB, WC, and WT memory types). A PREFETCHh instruction is considered a hint to this speculative behavior. Because this speculative fetching can occur at any time and is not tied to instruction execution, a PREFETCHh instruction is not ordered with respect to the fence instructions (MFENCE, SFENCE, and LFENCE) or locked memory references. A PREFETCHh instruction is also\nunordered with respect to CLFLUSH instructions, other PREFETCHh instructions, or any other general instruction. It is ordered with respect to serializing instructions such as CPUID, WRMSR, OUT, and MOV CR.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "PSADBW", "Alias": [], "Brief": "Compute Sum of Absolute Differences", "Description": "\nComputes the absolute value of the difference of 8 unsigned byte integers from the source operand (second operand) and from the destination operand (first operand). These 8 differences are then summed to produce an unsigned word integer result that is stored in the destination operand. Figure 4-10 shows the operation of the PSADBW instruction when using 64-bit operands.\nWhen operating on 64-bit operands, the word integer result is stored in the low word of the destination operand, and the remaining bytes in the destination operand are cleared to all 0s.\nWhen operating on 128-bit operands, two packed results are computed. Here, the 8 low-order bytes of the source and destination operands are operated on to produce a word result that is stored in the low word of the destination operand, and the 8 high-order bytes are operated on to produce a word result that is stored in bits 64 through 79 of the destination operand. The remaining bytes of the destination operand are cleared.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE version: The first source operand and destination register are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and destination register are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The first source operand and destination register are YMM registers. The second source operand is an YMM register or a 256-bit memory location.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n\nSRC\nX2\nDEST\nTEMP\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nY3\nY1\nY0\nY2\nX7\nX6\nY5\nABS(X7:Y7)\nX5\nX3\nX1\nY7\nY4\nABS(X4:Y4)\nABS(X1:Y1)\nABS(X0:Y0)\n00H\n00H\n00H\n00H\n00H\nX4\nX0\nY6\n00H\nSUM(TEMP7...TEMP0)\n\n\n\n\nABS(X6:Y6) ABS(X5:Y5)\n\n\nABS(X3:Y3) ABS(X2:Y2)\n\n\nFigure 4-10. PSADBW Instruction Operation Using 64-bit Operands\n" }, { "Name": "PSHUFB", "Alias": [], "Brief": "Packed Shuffle Bytes", "Description": "\nPSHUFB performs in-place shuffles of bytes in the destination operand (the first operand) according to the shuffle control mask in the source operand (the second operand). The instruction permutes the data in the destination operand, leaving the shuffle mask unaffected. If the most significant bit (bit[7]) of each byte of the shuffle control mask is set, then constant zero is written in the result byte. Each byte in the shuffle control mask forms an index to permute the corresponding byte in the destination operand. The value of each index is the least significant 4 bits (128-bit operation) or 3 bits (64-bit operation) of the shuffle control byte. When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nIn 64-bit mode, use the REX prefix to access additional registers.\nLegacy SSE version: Both operands can be MMX registers.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The destination operand is the first operand, the first source operand is the second operand, the second source operand is the third operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: Bits (255:128) of the destination YMM register stores the 16-byte shuffle result of the upper 16 bytes of the first source operand, using the upper 16-bytes of the second source operand as control mask. The value of each index is for the high 128-bit lane is the least significant 4 bits of the respective shuffle control byte. The index value selects a source data element within each 128-bit lane.\nNote: VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PSHUFD", "Alias": [], "Brief": "Shuffle Packed Doublewords", "Description": "\nCopies doublewords from source operand (second operand) and inserts them in the destination operand (first operand) at the locations selected with the order operand (third operand). Figure 4-12 shows the operation of the 256-bit VPSHUFD instruction and the encoding of the order operand. Each 2-bit field in the order operand selects the contents of one doubleword location within a 128-bit lane and copy to the target element in the destination operand. For example, bits 0 and 1 of the order operand targets the first doubleword element in the low and high 128-bit lane of the destination operand for 256-bit VPSHUFD. The encoded value of bits 1:0 of the order operand (see the field encoding in Figure 4-12) determines which doubleword element (from the respective 128-bit lane) of the source operand will be copied to doubleword 0 of the destination operand.\nFor 128-bit operation, only the low 128-bit lane are operative. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The order operand is an 8-bit immediate. Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.\nSRC\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\nDEST\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\n00B - X4\nEncoding\n00B - X0\nEncoding\n01B - X5\nof Fields in\nORDER\n01B - X1\nof Fields in\n10B - X6\nORDER\n10B - X2\nORDER\n11B - X7\nOperand\n7\n56\n4\n3\n12\n0\n11B - X3\nOperand\nFigure 4-12. 256-bit VPSHUFD Instruction Operation\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The order operand is an 8-bit immediate. Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.\nLegacy SSE instructions: In 64-bit mode using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: Bits (255:128) of the destination stores the shuffled results of the upper 16 bytes of the source operand using the immediate byte as the order operand.\nNote: VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\n" }, { "Name": "PSHUFHW", "Alias": [], "Brief": "Shuffle Packed High Words", "Description": "\nCopies words from the high quadword of a 128-bit lane of the source operand and inserts them in the high quad-word of the destination operand at word locations (of the respective lane) selected with the immediate operand. This 256-bit operation is similar to the in-lane operation used by the 256-bit VPSHUFD instruction, which is illus-trated in Figure 4-12. For 128-bit operation, only the low 128-bit lane is operative. Each 2-bit field in the immediate operand selects the contents of one word location in the high quadword of the destination operand. The binary encodings of the immediate operand fields select words (0, 1, 2 or 3, 4) from the high quadword of the source operand to be copied to the destination operand. The low quadword of the source operand is copied to the low quadword of the destination operand, for each 128-bit lane.\nNote that this instruction permits a word in the high quadword of the source operand to be copied to more than one word location in the high quadword of the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.\nVEX.256 encoded version: The destination operand is an YMM register. The source operand can be an YMM register or a 256-bit memory location.\nNote: In VEX encoded versions VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "PSHUFLW", "Alias": [], "Brief": "Shuffle Packed Low Words", "Description": "\nCopies words from the low quadword of a 128-bit lane of the source operand and inserts them in the low quadword of the destination operand at word locations (of the respective lane) selected with the immediate operand. The 256-bit operation is similar to the in-lane operation used by the 256-bit VPSHUFD instruction, which is illustrated in Figure 4-12. For 128-bit operation, only the low 128-bit lane is operative. Each 2-bit field in the immediate operand selects the contents of one word location in the low quadword of the destination operand. The binary encodings of the immediate operand fields select words (0, 1, 2 or 3) from the low quadword of the source operand to be copied to the destination operand. The high quadword of the source operand is copied to the high quadword of the destination operand, for each 128-bit lane.\nNote that this instruction permits a word in the low quadword of the source operand to be copied to more than one word location in the low quadword of the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The destination operand is an YMM register. The source operand can be an YMM register or a 256-bit memory location.\nNote: VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSHUFW", "Alias": [], "Brief": "Shuffle Packed Words", "Description": "\nCopies words from the source operand (second operand) and inserts them in the destination operand (first operand) at word locations selected with the order operand (third operand). This operation is similar to the opera-tion used by the PSHUFD instruction, which is illustrated in Figure 4-12. For the PSHUFW instruction, each 2-bit field in the order operand selects the contents of one word location in the destination operand. The encodings of the order operand fields select words from the source operand to be copied to the destination operand.\nThe source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register. The order operand is an 8-bit immediate. Note that this instruction permits a word in the source operand to be copied to more than one word location in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n" }, { "Name": "PSIGNB", "Alias": [ "PSIGNW", "PSIGND " ], "Brief": "Packed SIGN", "Description": "\n(V)PSIGNB/(V)PSIGNW/(V)PSIGND negates each data element of the destination operand (the first operand) if the signed integer value of the corresponding data element in the source operand (the second operand) is less than zero. If the signed integer value of a data element in the source operand is positive, the corresponding data element in the destination operand is unchanged. If a data element in the source operand is zero, the corre-sponding data element in the destination operand is set to zero.\n(V)PSIGNB operates on signed bytes. (V)PSIGNW operates on 16-bit signed words. (V)PSIGND operates on signed 32-bit integers. When the source operand is a 128bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.\nLegacy SSE instructions: Both operands can be MMX registers. In 64-bit mode, use the REX prefix to access addi-tional registers.\n128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destina-tion register remain unchanged.\nVEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise instructions will #UD.\nVEX.256 encoded version: The first source and destination operands are YMM registers. The second source operand is an YMM register or a 256-bit memory location.\n" }, { "Name": "PSLLW", "Alias": [ "PSLLD", "PSLLQ" ], "Brief": "Shift Packed Data Left Logical", "Description": "\nShifts the bits in the individual data elements (words, doublewords, or quadword) in the destination operand (first operand) to the left by the number of bits specified in the count operand (second operand). As the bits in the data elements are shifted left, the empty low-order bits are cleared (set to 0). If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all 0s. Figure 4-13 gives an example of shifting words in a 64-bit operand.\n\nPre-Shift\nDEST\nShift Left\nwith Zero\nExtension\nPost-Shift\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX3\nX0\nX2\nX1\nX3 << COUNT\n\n\n\n\nX2 << COUNT\nX1 << COUNT\nX0 << COUNT\nFigure 4-13. PSLLW, PSLLD, and PSLLQ Instruction Operation Using 64-bit Operand\nThe (V)PSLLW instruction shifts each of the words in the destination operand to the left by the number of bits spec-ified in the count operand; the (V)PSLLD instruction shifts each of the doublewords in the destination operand; and the (V)PSLLQ instruction shifts the quadword (or quadwords) in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The destination operand is an MMX technology register; the count operand can be either an MMX technology register or an 64-bit memory location.\n128-bit Legacy SSE version: The destination and first source operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. The count operand can be either an XMM register or a 128-bit memory location or an 8-bit immediate. If the count operand is a memory address, 128 bits are loaded but the upper 64 bits are ignored.\nVEX.128 encoded version: The destination and first source operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed. The count operand can be either an XMM register or a 128-bit memory loca-tion or an 8-bit immediate. If the count operand is a memory address, 128 bits are loaded but the upper 64 bits are ignored.\nVEX.256 encoded version: The destination and first source operands are YMM registers. The count operand can be either an XMM register or a 128-bit memory location or an 8-bit immediate.\nNote: For shifts with an immediate count (VEX.128.66.0F 71-73 /6), VEX.vvvv encodes the destination register, and VEX.B + ModRM.r/m encodes the source register. VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSLLDQ", "Alias": [], "Brief": "Shift Double Quadword Left Logical", "Description": "\nShifts the destination operand (first operand) to the left by the number of bytes specified in the count operand (second operand). The empty low-order bytes are cleared (set to all 0s). If the value specified by the count operand is greater than 15, the destination operand is set to all 0s. The count operand is an 8-bit immediate.\n128-bit Legacy SSE version: The source and destination operands are the same. Bits (VLMAX-1:128) of the corre-sponding YMM destination register remain unchanged.\nVEX.128 encoded version: The source and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The source operand is a YMM register. The destination operand is a YMM register. The count operand applies to both the low and high 128-bit lanes.\nNote: VEX.vvvv encodes the destination register, and VEX.B + ModRM.r/m encodes the source register. VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSRAW", "Alias": [ "PSRAD" ], "Brief": "Shift Packed Data Right Arithmetic", "Description": "\nShifts the bits in the individual data elements (words or doublewords) in the destination operand (first operand) to the right by the number of bits specified in the count operand (second operand). As the bits in the data elements are shifted right, the empty high-order bits are filled with the initial value of the sign bit of the data element. If the value specified by the count operand is greater than 15 (for words) or 31 (for doublewords), each destination data element is filled with the initial value of the sign bit of the element. (Figure 4-14 gives an example of shifting words in a 64-bit operand.)\n\nPre-Shift\nDEST\nShift Right\nwith Sign\nExtension\nPost-Shift\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX1\nX0\nX3 >> COUNT\nX2 >> COUNT\nX1 >> COUNT\nX3\nX2\nX0 >> COUNT\nFigure 4-14. PSRAW and PSRAD Instruction Operation Using a 64-bit Operand\nNote that only the first 64-bits of a 128-bit count operand are checked to compute the count. If the second source operand is a memory address, 128 bits are loaded.\nThe (V)PSRAW instruction shifts each of the words in the destination operand to the right by the number of bits specified in the count operand, and the (V)PSRAD instruction shifts each of the doublewords in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The destination operand is an MMX technology register; the count operand can be either an MMX technology register or an 64-bit memory location.\n128-bit Legacy SSE version: The destination and first source operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. The count operand can be either an XMM register or a 128-bit memory location or an 8-bit immediate. If the count operand is a memory address, 128 bits are loaded but the upper 64 bits are ignored.\nVEX.128 encoded version: The destination and first source operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed. The count operand can be either an XMM register or a 128-bit memory loca-tion or an 8-bit immediate. If the count operand is a memory address, 128 bits are loaded but the upper 64 bits are ignored.\nVEX.256 encoded version: The destination and first source operands are YMM registers. The count operand can be either an XMM register or a 128-bit memory location or an 8-bit immediate.\nNote: For shifts with an immediate count (VEX.128.66.0F 71-73 /4), VEX.vvvv encodes the destination register, and VEX.B + ModRM.r/m encodes the source register. VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSRLW", "Alias": [ "PSRLD", "PSRLQ" ], "Brief": "Shift Packed Data Right Logical", "Description": "\nShifts the bits in the individual data elements (words, doublewords, or quadword) in the destination operand (first operand) to the right by the number of bits specified in the count operand (second operand). As the bits in the data elements are shifted right, the empty high-order bits are cleared (set to 0). If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all 0s. Figure 4-15 gives an example of shifting words in a 64-bit operand.\nNote that only the first 64-bits of a 128-bit count operand are checked to compute the count.\n\nPre-Shift\nDEST\nShift Right\nwith Zero\nExtension\nPost-Shift\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX3\nX2\nX0\nX2 >> COUNT\nX1 >> COUNT\nX0 >> COUNT\nX3 >> COUNT\nX1\nFigure 4-15. PSRLW, PSRLD, and PSRLQ Instruction Operation Using 64-bit Operand\nThe (V)PSRLW instruction shifts each of the words in the destination operand to the right by the number of bits specified in the count operand; the (V)PSRLD instruction shifts each of the doublewords in the destination operand; and the PSRLQ instruction shifts the quadword (or quadwords) in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The destination operand is an MMX technology register; the count operand can be either an MMX technology register or an 64-bit memory location.\n128-bit Legacy SSE version: The destination operand is an XMM register; the count operand can be either an XMM register or a 128-bit memory location, or an 8-bit immediate. If the count operand is a memory address, 128 bits\nare loaded but the upper 64 bits are ignored. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The destination operand is an XMM register; the count operand can be either an XMM register or a 128-bit memory location, or an 8-bit immediate. If the count operand is a memory address, 128 bits are loaded but the upper 64 bits are ignored. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The destination and first source operands are YMM registers. The count operand can be either an YMM register or a 128-bit memory location or an 8-bit immediate.\nNote: For shifts with an immediate count (VEX.128.66.0F 71-73 /2), VEX.vvvv encodes the destination register, and VEX.B + ModRM.r/m encodes the source register. VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSRLDQ", "Alias": [], "Brief": "Shift Double Quadword Right Logical", "Description": "\nShifts the destination operand (first operand) to the right by the number of bytes specified in the count operand (second operand). The empty high-order bytes are cleared (set to all 0s). If the value specified by the count operand is greater than 15, the destination operand is set to all 0s. The count operand is an 8-bit immediate.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source and destination operands are the same. Bits (VLMAX-1:128) of the corre-sponding YMM destination register remain unchanged.\nVEX.128 encoded version: The source and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The source operand is a YMM register. The destination operand is a YMM register. The count operand applies to both the low and high 128-bit lanes.\nNote: VEX.vvvv encodes the destination register, and VEX.B + ModRM.r/m encodes the source register. VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSUBB", "Alias": [ "PSUBW", "PSUBD" ], "Brief": "Subtract Packed Integers", "Description": "\nPerforms a SIMD subtract of the packed integers of the source operand (second operand) from the packed integers of the destination operand (first operand), and stores the packed integer results in the destination operand. See Figure 9-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD operation. Overflow is handled with wraparound, as described in the following paragraphs.\nThe (V)PSUBB instruction subtracts packed byte integers. When an individual result is too large or too small to be represented in a byte, the result is wrapped around and the low 8 bits are written to the destination element.\nThe (V)PSUBW instruction subtracts packed word integers. When an individual result is too large or too small to be represented in a word, the result is wrapped around and the low 16 bits are written to the destination element.\nThe (V)PSUBD instruction subtracts packed doubleword integers. When an individual result is too large or too small to be represented in a doubleword, the result is wrapped around and the low 32 bits are written to the destination element.\nNote that the (V)PSUBB, (V)PSUBW, and (V)PSUBD instructions can operate on either unsigned or signed (two's complement notation) packed integers; however, it does not set bits in the EFLAGS register to indicate overflow and/or a carry. To prevent undetected overflow conditions, software must control the ranges of values upon which it operates.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: When operating on 64-bit operands, the destination operand must be an MMX technology register and the source operand can be either an MMX technology register or a 64-bit memory location.\n128-bit Legacy SSE version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM desti-nation register remain unchanged.\nVEX.128 encoded version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source operand and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSUBQ", "Alias": [], "Brief": "Subtract Packed Quadword Integers", "Description": "\nSubtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. When packed quadword operands are used, a SIMD subtract is performed. When a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).\nNote that the (V)PSUBQ instruction can operate on either unsigned or signed (two’s complement notation) inte-gers; however, it does not set bits in the EFLAGS register to indicate overflow and/or a carry. To prevent undetected overflow conditions, software must control the ranges of the values upon which it operates.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: The source operand can be a quadword integer stored in an MMX technology register or a 64-bit memory location.\n128-bit Legacy SSE version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM desti-nation register remain unchanged.\nVEX.128 encoded version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source operand and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSUBSB", "Alias": [ "PSUBSW" ], "Brief": "Subtract Packed Signed Integers with Signed Saturation", "Description": "\nPerforms a SIMD subtract of the packed signed integers of the source operand (second operand) from the packed signed integers of the destination operand (first operand), and stores the packed integer results in the destination operand. See Figure 9-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD operation. Overflow is handled with signed saturation, as described in the following para-graphs.\nThe (V)PSUBSB instruction subtracts packed signed byte integers. When an individual byte result is beyond the range of a signed byte integer (that is, greater than 7FH or less than 80H), the saturated value of 7FH or 80H, respectively, is written to the destination operand.\nThe (V)PSUBSW instruction subtracts packed signed word integers. When an individual word result is beyond the range of a signed word integer (that is, greater than 7FFFH or less than 8000H), the saturated value of 7FFFH or 8000H, respectively, is written to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: When operating on 64-bit operands, the destination operand must be an MMX technology register and the source operand can be either an MMX technology register or a 64-bit memory location.\n128-bit Legacy SSE version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM desti-nation register remain unchanged.\nVEX.128 encoded version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source operand and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PSUBUSB", "Alias": [ "PSUBUSW" ], "Brief": "Subtract Packed Unsigned Integers with Unsigned Saturation", "Description": "\nPerforms a SIMD subtract of the packed unsigned integers of the source operand (second operand) from the packed unsigned integers of the destination operand (first operand), and stores the packed unsigned integer results in the destination operand. See Figure 9-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD operation. Overflow is handled with unsigned saturation, as described in the following paragraphs.\nThese instructions can operate on either 64-bit or 128-bit operands.\nThe (V)PSUBUSB instruction subtracts packed unsigned byte integers. When an individual byte result is less than zero, the saturated value of 00H is written to the destination operand.\nThe (V)PSUBUSW instruction subtracts packed unsigned word integers. When an individual word result is less than zero, the saturated value of 0000H is written to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE version: When operating on 64-bit operands, the destination operand must be an MMX technology register and the source operand can be either an MMX technology register or a 64-bit memory location.\n128-bit Legacy SSE version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM desti-nation register remain unchanged.\nVEX.128 encoded version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source operand and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PTEST", "Alias": [], "Brief": "Logical Compare", "Description": "\nPTEST and VPTEST set the ZF flag if all bits in the result are 0 of the bitwise AND of the first source operand (first operand) and the second source operand (second operand). VPTEST sets the CF flag if all bits in the result are 0 of the bitwise AND of the second source operand (second operand) and the logical NOT of the destination operand.\nThe first source register is specified by the ModR/M reg field.\n128-bit versions: The first source register is an XMM register. The second source register can be an XMM register or a 128-bit memory location. The destination register is not modified.\nVEX.256 encoded version: The first source register is a YMM register. The second source register can be a YMM register or a 256-bit memory location. The destination register is not modified.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "PUNPCKHBW", "Alias": [ "PUNPCKHWD", "PUNPCKHDQ", "PUNPCKHQDQ" ], "Brief": "Unpack High Data", "Description": "\nUnpacks and interleaves the high-order data elements (bytes, words, doublewords, or quadwords) of the destina-tion operand (first operand) and source operand (second operand) into the destination operand. Figure 4-16 shows the unpack operation for bytes in 64-bit operands. The low-order data elements are ignored.\n\nSRC\nDEST\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\n\n\nY7\nX7\nY6\nX6\nY5\nX5\nY4\nX4\nFigure 4-16. PUNPCKHBW Instruction Operation Using 64-bit Operands\n31\n0\n255\n255\n31\n0\n\n\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\n\n\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\nSRC\n255\n0\n\n\nY7\nX7\nY6\nX6\nY3\nX3\nY2\nX2\nDEST\nFigure 4-17. 256-bit VPUNPCKHDQ Instruction Operation\nWhen the source data comes from a 64-bit memory operand, the full 64-bit operand is accessed from memory, but the instruction uses only the high-order 32 bits. When the source data comes from a 128-bit memory operand, an implementation may fetch only the appropriate 64 bits; however, alignment to a 16-byte boundary and normal segment checking will still be enforced.\nThe (V)PUNPCKHBW instruction interleaves the high-order bytes of the source and destination operands, the (V)PUNPCKHWD instruction interleaves the high-order words of the source and destination operands, the (V)PUNPCKHDQ instruction interleaves the high-order doubleword (or doublewords) of the source and destination operands, and the (V)PUNPCKHQDQ instruction interleaves the high-order quadwords of the source and destina-tion operands.\nThese instructions can be used to convert bytes to words, words to doublewords, doublewords to quadwords, and quadwords to double quadwords, respectively, by placing all 0s in the source operand. Here, if the source operand contains all 0s, the result (stored in the destination operand) contains zero extensions of the high-order data elements from the original value in the destination operand. For example, with the (V)PUNPCKHBW instruction the high-order bytes are zero extended (that is, unpacked into unsigned word integers), and with the (V)PUNPCKHWD instruction, the high-order words are zero extended (unpacked into unsigned doubleword integers).\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE versions: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE versions: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded versions: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source operand and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PUNPCKLBW", "Alias": [ "PUNPCKLWD", "PUNPCKLDQ", "PUNPCKLQDQ" ], "Brief": "Unpack Low Data", "Description": "\nUnpacks and interleaves the low-order data elements (bytes, words, doublewords, and quadwords) of the destina-tion operand (first operand) and source operand (second operand) into the destination operand. (Figure 4-18 shows the unpack operation for bytes in 64-bit operands.). The high-order data elements are ignored.\n\nSRC\nDEST\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nY3\nX3\nY2\nX2\nY1\nX1\nY0\nX0\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\nFigure 4-18. PUNPCKLBW Instruction Operation Using 64-bit Operands\n31\n0\n255\n255\n31\n0\n\n\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\n\n\nY7\nY6\nY5\nY4\nY3\nY2\nY1\nY0\nSRC\n255\n0\n\n\nY5\nX5\nY4\nX4\nY1\nX1\nY0\nX0\nDEST\nFigure 4-19. 256-bit VPUNPCKLDQ Instruction Operation\nWhen the source data comes from a 128-bit memory operand, an implementation may fetch only the appropriate 64 bits; however, alignment to a 16-byte boundary and normal segment checking will still be enforced.\nThe (V)PUNPCKLBW instruction interleaves the low-order bytes of the source and destination operands, the (V)PUNPCKLWD instruction interleaves the low-order words of the source and destination operands, the (V)PUNPCKLDQ instruction interleaves the low-order doubleword (or doublewords) of the source and destination operands, and the (V)PUNPCKLQDQ instruction interleaves the low-order quadwords of the source and destination operands.\nThese instructions can be used to convert bytes to words, words to doublewords, doublewords to quadwords, and quadwords to double quadwords, respectively, by placing all 0s in the source operand. Here, if the source operand contains all 0s, the result (stored in the destination operand) contains zero extensions of the high-order data elements from the original value in the destination operand. For example, with the (V)PUNPCKLBW instruction the high-order bytes are zero extended (that is, unpacked into unsigned word integers), and with the (V)PUNPCKLWD instruction, the high-order words are zero extended (unpacked into unsigned doubleword integers).\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE versions: The source operand can be an MMX technology register or a 32-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE versions: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded versions: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source operand and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "PUSH", "Alias": [], "Brief": "Push Word, Doubleword or Quadword Onto the Stack", "Description": "\nDecrements the stack pointer and then stores the source operand on the top of the stack. Address and operand sizes are determined and used as follows:\nThe address size is used only when referencing a source operand in memory.\nThe operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is decremented (2, 4 or 8).\nIf the source operand is an immediate of size less than the operand size, a sign-extended value is pushed on the stack. If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits, either a zero-extended value is pushed on the stack or the segment selector is written on the stack using a 16-bit move. For the last case, all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified.\nThe stack-address size determines the width of the stack pointer when writing to the stack in memory and when decrementing\nthe stack pointer. (As stated above,\nthe amount by which\nthe stack pointer\nis\ndecremented is determined by the operand size.)\nIf the operand size is less than the stack-address size, the PUSH instruction may result in a misaligned stack pointer (a stack pointer that is not aligned on a doubleword or quadword boundary).\nThe PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. If a PUSH instruction uses a memory operand in which the ESP register is used for computing the operand address, the address of the operand is computed before the ESP register is decremented.\nIf the ESP or SP register is 1 when the PUSH instruction is executed in real-address mode, a stack-fault exception (#SS) is generated (because the limit of the stack segment is violated). Its delivery encounters a second stack-fault exception (for the same reason), causing generation of a double-fault exception (#DF). Delivery of the double-fault exception encounters a third stack-fault exception, and the logical processor enters shutdown mode. See the discussion of the double-fault exception in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\n" }, { "Name": "PUSHA", "Alias": [ "PUSHAD" ], "Brief": "Push All General", "Description": "\nPushes the contents of the general-purpose registers onto the stack. The registers are stored on the stack in the following order: EAX, ECX, EDX, EBX, ESP (original value), EBP, ESI, and EDI (if the current operand-size attribute is 32) and AX, CX, DX, BX, SP (original value), BP, SI, and DI (if the operand-size attribute is 16). These instruc-tions perform the reverse operation of the POPA/POPAD instructions. The value pushed for the ESP or SP register is its value before prior to pushing the first register (see the “Operation” section below).\nThe PUSHA (push all) and PUSHAD (push all double) mnemonics reference the same opcode. The PUSHA instruc-tion is intended for use when the operand-size attribute is 16 and the PUSHAD instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when PUSHA is used and to 32 when PUSHAD is used. Others may treat these mnemonics as synonyms (PUSHA/PUSHAD) and use the current setting of the operand-size attribute to determine the size of values to be pushed from the stack, regardless of the mnemonic used.\nIn the real-address mode, if the ESP or SP register is 1, 3, or 5 when PUSHA/PUSHAD executes: an #SS exception is generated but not delivered (the stack error reported prevents #SS delivery). Next, the processor generates a #DF exception and enters a shutdown state as described in the #DF discussion in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n" }, { "Name": "PUSHF", "Alias": [ "PUSHFD" ], "Brief": "Push EFLAGS Register onto the Stack", "Description": "\nDecrements the stack pointer by 4 (if the current operand-size attribute is 32) and pushes the entire contents of the EFLAGS register onto the stack, or decrements the stack pointer by 2 (if the operand-size attribute is 16) and pushes the lower 16 bits of the EFLAGS register (that is, the FLAGS register) onto the stack. These instructions reverse the operation of the POPF/POPFD instructions.\nWhen copying the entire EFLAGS register to the stack, the VM and RF flags (bits 16 and 17) are not copied; instead, the values for these flags are cleared in the EFLAGS image stored on the stack. See Chapter 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information about the EFLAGS register.\nThe PUSHF (push flags) and PUSHFD (push flags double) mnemonics reference the same opcode. The PUSHF instruction is intended for use when the operand-size attribute is 16 and the PUSHFD instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when PUSHF is used and to 32 when PUSHFD is used. Others may treat these mnemonics as synonyms (PUSHF/PUSHFD) and use the current setting of the operand-size attribute to determine the size of values to be pushed from the stack, regardless of the mnemonic used.\nIn 64-bit mode, the instruction’s default operation is to decrement the stack pointer (RSP) by 8 and pushes RFLAGS on the stack. 16-bit operation is supported using the operand size override prefix 66H. 32-bit operand size cannot be encoded in this mode. When copying RFLAGS to the stack, the VM and RF flags (bits 16 and 17) are not copied; instead, values for these flags are cleared in the RFLAGS image stored on the stack.\nWhen in virtual-8086 mode and the I/O privilege level (IOPL) is less than 3, the PUSHF/PUSHFD instruction causes a general protection exception (#GP).\nIn the real-address mode, if the ESP or SP register is 1 when PUSHF/PUSHFD instruction executes: an #SS excep-tion is generated but not delivered (the stack error reported prevents #SS delivery). Next, the processor generates a #DF exception and enters a shutdown state as described in the #DF discussion in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\n" }, { "Name": "PXOR", "Alias": [], "Brief": "Logical Exclusive OR", "Description": "\nPerforms a bitwise logical exclusive-OR (XOR) operation on the source operand (second operand) and the destina-tion operand (first operand) and stores the result in the destination operand. Each bit of the result is 1 if the corre-sponding bits of the two operands are different; each bit is 0 if the corresponding bits of the operands are the same.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nLegacy SSE instructions: The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register.\n128-bit Legacy SSE version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM desti-nation register remain unchanged.\nVEX.128 encoded version: The second source operand is an XMM register or a 128-bit memory location. The first source operand and destination operands are XMM registers. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nVEX.256 encoded version: The second source operand is an YMM register or a 256-bit memory location. The first source operand and destination operands are YMM registers.\nNote: VEX.L must be 0, otherwise instructions will #UD.\n" }, { "Name": "RCL", "Alias": [ "RCR", "ROL", "ROR" ], "Brief": "", "Description": "\nShifts (rotates) the bits of the first operand (destination operand) the number of bit positions specified in the second operand (count operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the count operand is an unsigned integer that can be an immediate or a value in the CL register. In legacy and compatibility mode, the processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 least-significant bits.\nThe rotate left (ROL) and rotate through carry left (RCL) instructions shift all the bits toward more-significant bit positions, except for the most-significant bit, which is rotated to the least-significant bit location. The rotate right (ROR) and rotate through carry right (RCR) instructions shift all the bits toward less significant bit positions, except for the least-significant bit, which is rotated to the most-significant bit location.\nThe RCL and RCR instructions include the CF flag in the rotation. The RCL instruction shifts the CF flag into the least-significant bit and shifts the most-significant bit into the CF flag. The RCR instruction shifts the CF flag into the most-significant bit and shifts the least-significant bit into the CF flag. For the ROL and ROR instructions, the orig-inal value of the CF flag is not a part of the result, but the CF flag receives a copy of the bit that was shifted from one end to the other.\nThe OF flag is defined only for the 1-bit rotates; it is undefined in all other cases (except RCL and RCR instructions only: a zero-bit rotate does nothing, that is affects no flags). For left rotates, the OF flag is set to the exclusive OR of the CF bit (after the rotate) and the most-significant bit of the result. For right rotates, the OF flag is set to the exclusive OR of the two most-significant bits of the result.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Use of REX.W promotes the first operand to 64 bits and causes the count operand to become a 6-bit counter.\n" }, { "Name": "RCPPS", "Alias": [], "Brief": "Compute Reciprocals of Packed Single", "Description": "\nPerforms a SIMD computation of the approximate reciprocals of the four packed single-precision floating-point values in the source operand (second operand) stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. See Figure 10-5 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD single-precision floating-point operation.\nThe relative error for this approximation is:\n|Relative Error| ≤ 1.5 ∗ 2−12\nThe RCPPS instruction is not affected by the rounding control bits in the MXCSR register. When a source value is a 0.0, an ∞ of the sign of the source value is returned. A denormal source value is treated as a 0.0 (of the same sign). Tiny results are always flushed to 0.0, with the sign of the operand. (Input values greater than or equal to |1.11111111110100000000000B∗2125| are guaranteed to not produce tiny results; input values less than or equal to |1.00000000000110000000001B*2126| are guaranteed to produce tiny results, which are in turn flushed to 0.0; and input values in between this range may or may not produce tiny results, depending on the implementation.) When a source value is an SNaN or QNaN, the SNaN is converted to a QNaN or the source QNaN is returned.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "RCPSS", "Alias": [], "Brief": "Compute Reciprocal of Scalar Single", "Description": "\nComputes of an approximate reciprocal of the low single-precision floating-point value in the source operand (second operand) and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a scalar single-precision floating-point operation.\nThe relative error for this approximation is:\n|Relative Error| ≤ 1.5 ∗ 2−12\nThe RCPSS instruction is not affected by the rounding control bits in the MXCSR register. When a source value is a 0.0, an ∞ of the sign of the source value is returned. A denormal source value is treated as a 0.0 (of the same sign). Tiny results are always flushed to 0.0, with the sign of the operand. (Input values greater than or equal to |1.11111111110100000000000B∗2125| are guaranteed to not produce tiny results; input values less than or equal to |1.00000000000110000000001B*2126| are guaranteed to produce tiny results, which are in turn flushed to 0.0; and input values in between this range may or may not produce tiny results, depending on the implementation.) When a source value is an SNaN or QNaN, the SNaN is converted to a QNaN or the source QNaN is returned.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "RDFSBASE", "Alias": [ "RDGSBASE" ], "Brief": "Read FS/GS Segment Base", "Description": "\nLoads the general-purpose register indicated by the modR/M:r/m field with the FS or GS segment base address.\nThe destination operand may be either a 32-bit or a 64-bit general-purpose register. The REX.W prefix indicates the operand size is 64 bits. If no REX.W prefix is used, the operand size is 32 bits; the upper 32 bits of the source base address (for FS or GS) are ignored and upper 32 bits of the destination register are cleared.\nThis instruction is supported only in 64-bit mode.\n" }, { "Name": "RDMSR", "Alias": [], "Brief": "Read from Model Specific Register", "Description": "\nReads the contents of a 64-bit model specific register (MSR) specified in the ECX register into registers EDX:EAX. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The EDX register is loaded with the high-order 32 bits of the MSR and the EAX register is loaded with the low-order 32 bits. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.) If fewer than 64 bits are implemented in the MSR being read, the values returned to EDX:EAX in unimplemented bit locations are undefined.\nThis instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) will be generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception.\nThe MSRs control functions for testability, execution tracing, performance-monitoring, and machine check errors. Chapter 35, “Model-Specific Registers (MSRs),” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, lists all the MSRs that can be read with this instruction and their addresses. Note that each processor family has its own set of MSRs.\nThe CPUID instruction should be used to determine whether MSRs are supported (CPUID.01H:EDX[5] = 1) before using this instruction.\n" }, { "Name": "RDPMC", "Alias": [], "Brief": "Read Performance", "Description": "\nThe EAX register is loaded with the low-order 32 bits. The EDX register is loaded with the supported high-order bits of the counter. The number of high-order bits loaded into EDX is implementation specific on processors that do no support architectural performance monitoring. The width of fixed-function and general-purpose performance coun-ters on processors supporting architectural performance monitoring are reported by CPUID 0AH leaf. See below for the treatment of the EDX register for “fast” reads.\nThe ECX register selects one of two type of performance counters, specifies the index relative to the base of each counter type, and selects “fast” read mode if supported. The two counter types are :\nECX[29:0] specifies the index. The width of general-purpose performance counters are 40-bits for processors that do not support architectural performance monitoring counters.The width of special-purpose performance counters are implementation specific. The width of fixed-function performance counters and general-purpose performance counters on processor supporting architectural performance monitoring are reported by CPUID 0AH leaf.\nTable 4-13 lists valid indices of the general-purpose and special-purpose performance counters according to the derived DisplayFamily_DisplayModel values of CPUID encoding for each processor family (see CPUID instruction in Chapter 3, “Instruction Set Reference, A-M” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A).\nTable 4-13. Valid General and Special Purpose Performance Counter Index Range for RDPMC\n\n\nProcessor Family\nDisplayFamily_DisplayModel/Other Signatures\nValid PMC Index Range\nGeneral-purpose Counters\n\n\nP6\nPentium® 4, Intel® Xeon processors\nPentium 4, Intel Xeon processors\nPentium M processors\n64-bit Intel Xeon processors with L3\nIntel® Core™ Solo and Intel® Core™ Duo processors, Dual-core Intel® Xeon® processor LV\n\n06H_01H, 06H_03H, 06H_05H, 06H_06H, 06H_07H, 06H_08H, 06H_0AH, 06H_0BH\n0FH_00H, 0FH_01H, 0FH_02H\n(0FH_03H, 0FH_04H, 0FH_06H) and (L3 is absent)\n06H_09H, 06H_0DH\n0FH_03H, 0FH_04H) and (L3 is present)\n06H_0EH\n\n0, 1\n≥ 0 and ≤ 17\n≥ 0 and ≤ 17\n0, 1\n≥ 0 and ≤ 25\n0, 1\n\n0, 1\n≥ 0 and ≤ 17\n≥ 0 and ≤ 17\n0, 1\n≥ 0 and ≤ 17\n0, 1\nTable 4-13. Valid General and Special Purpose Performance Counter Index Range for RDPMC (Contd.)\n\n\nProcessor Family\nDisplayFamily_DisplayModel/Other Signatures\nValid PMC Index Range\nGeneral-purpose Counters\n\n\nIntel® Core™2 Duo processor, Intel Xeon processor 3000, 5100, 5300, 7300 Series - general-purpose PMC\nIntel Xeon processors 7100 series with L3\nIntel® Core™2 Duo processor family, Intel Xeon processor family - general-purpose PMC\nIntel Xeon processors 7400 series\nIntel® Atom™ processor family\nIntel® Core™i7 processor, Intel Xeon processors 5500 series\n\n06H_0FH\n(0FH_06H) and (L3 is present)\n06H_17H\n(06H_1DH)\n06H_1CH\n06H_1AH, 06H_1EH, 06H_1FH, 06H_2EH\n\n0, 1\n≥ 0 and ≤ 25\n0, 1\n≥ 0 and ≤ 9\n0, 1\n0-3\n\n0, 1\n≥ 0 and ≤ 17\n0, 1\n0, 1\n0, 1\n0, 1, 2, 3\nThe Pentium 4 and Intel Xeon processors also support “fast” (32-bit) and “slow” (40-bit) reads on the first 18 performance counters. Selected this option using ECX[31]. If bit 31 is set, RDPMC reads only the low 32 bits of the selected performance counter. If bit 31 is clear, all 40 bits are read. A 32-bit result is returned in EAX and EDX is set to 0. A 32-bit read executes faster on Pentium 4 processors and Intel Xeon processors than a full 40-bit read.\nOn 64-bit Intel Xeon processors with L3, performance counters with indices 18-25 are 32-bit counters. EDX is cleared after executing RDPMC for these counters. On Intel Xeon processor 7100 series with L3, performance coun-ters with indices 18-25 are also 32-bit counters.\nIn Intel Core 2 processor family, Intel Xeon processor 3000, 5100, 5300 and 7400 series, the fixed-function perfor-mance counters are 40-bits wide; they can be accessed by RDMPC with ECX between from 4000_0000H and 4000_0002H.\nOn Intel Xeon processor 7400 series, there are eight 32-bit special-purpose counters addressable with indices 2-9, ECX[30]=0.\nWhen in protected or virtual 8086 mode, the performance-monitoring counters enabled (PCE) flag in register CR4 restricts the use of the RDPMC instruction as follows. When the PCE flag is set, the RDPMC instruction can be executed at any privilege level; when the flag is clear, the instruction can only be executed at privilege level 0. (When in real-address mode, the RDPMC instruction is always enabled.)\nThe performance-monitoring counters can also be read with the RDMSR instruction, when executing at privilege level 0.\nThe performance-monitoring counters are event counters that can be programmed to count events such as the number of instructions decoded, number of interrupts received, or number of cache loads. Chapter 19, “Perfor-mance Monitoring Events,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B, lists the events that can be counted for various processors in the Intel 64 and IA-32 architecture families.\nThe RDPMC instruction is not a serializing instruction; that is, it does not imply that all the events caused by the preceding instructions have been completed or that events caused by subsequent instructions have not begun. If an exact event count is desired, software must insert a serializing instruction (such as the CPUID instruction) before and/or after the RDPMC instruction.\nIn the Pentium 4 and Intel Xeon processors, performing back-to-back fast reads are not guaranteed to be mono-tonic. To guarantee monotonicity on back-to-back reads, a serializing instruction must be placed between the two RDPMC instructions.\nThe RDPMC instruction can execute in 16-bit addressing mode or virtual-8086 mode; however, the full contents of the ECX register are used to select the counter, and the event count is stored in the full EAX and EDX registers. The RDPMC instruction was introduced into the IA-32 Architecture in the Pentium Pro processor and the Pentium processor with MMX technology. The earlier Pentium processors have performance-monitoring counters, but they must be read with the RDMSR instruction.\n" }, { "Name": "RDRAND", "Alias": [], "Brief": "Read Random Number", "Description": "\nLoads a hardware generated random value and store it in the destination register. The size of the random value is determined by the destination register size and operating mode. The Carry Flag indicates whether a random value is available at the time the instruction is executed. CF=1 indicates that the data in the destination is valid. Other-wise CF=0 and the data in the destination operand will be returned as zeros for the specified width. All other flags are forced to 0 in either situation. Software must check the state of CF=1 for determining if a valid random value has been returned, otherwise it is expected to loop and retry execution of RDRAND (see Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 1, Section 7.3.17, “Random Number Generator Instructions”).\nThis instruction is available at all privilege levels.\nIn 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.B permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit oper-ands. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "RDSEED", "Alias": [], "Brief": "Read Random SEED", "Description": "\nLoads a hardware generated random value and store it in the destination register. The random value is generated from an Enhanced NRBG (Non Deterministic Random Bit Generator) that is compliant to NIST SP800-90B and NIST SP800-90C in the XOR construction mode. The size of the random value is determined by the destination register size and operating mode. The Carry Flag indicates whether a random value is available at the time the instruction is executed. CF=1 indicates that the data in the destination is valid. Otherwise CF=0 and the data in the destination operand will be returned as zeros for the specified width. All other flags are forced to 0 in either situation. Software must check the state of CF=1 for determining if a valid random seed value has been returned, otherwise it is expected to loop and retry execution of RDSEED (see Section 1.2).\nThe RDSEED instruction is available at all privilege levels. The RDSEED instruction executes normally either inside or outside a transaction region.\nIn 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.B permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit oper-ands. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "RDTSC", "Alias": [], "Brief": "Read Time", "Description": "\nLoads the current value of the processor’s time-stamp counter (a 64-bit MSR) into the EDX:EAX registers. The EDX register is loaded with the high-order 32 bits of the MSR and the EAX register is loaded with the low-order 32 bits. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.)\nThe processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset. See “Time Stamp Counter” in Chapter 17 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B, for specific details of the time stamp counter behavior.\nWhen in protected or virtual 8086 mode, the time stamp disable (TSD) flag in register CR4 restricts the use of the RDTSC instruction as follows. When the TSD flag is clear, the RDTSC instruction can be executed at any privilege level; when the flag is set, the instruction can only be executed at privilege level 0. (When in real-address mode, the RDTSC instruction is always enabled.)\nThe time-stamp counter can also be read with the RDMSR instruction, when executing at privilege level 0.\nThe RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed. If software requires RDTSC to be executed only after all previous instructions have completed locally, it can either use RDTSCP (if the processor supports that instruction) or execute the sequence LFENCE;RDTSC.\nThis instruction was introduced by the Pentium processor.\nSee “Changes to Instruction Behavior in VMX Non-Root Operation” in Chapter 25 of the Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.\n" }, { "Name": "RDTSCP", "Alias": [], "Brief": "Read Time", "Description": "\nLoads the current value of the processor’s time-stamp counter (a 64-bit MSR) into the EDX:EAX registers and also loads the IA32_TSC_AUX MSR (address C000_0103H) into the ECX register. The EDX register is loaded with the high-order 32 bits of the IA32_TSC MSR; the EAX register is loaded with the low-order 32 bits of the IA32_TSC MSR; and the ECX register is loaded with the low-order 32-bits of IA32_TSC_AUX MSR. On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX, RDX, and RCX are cleared.\nThe processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset. See “Time Stamp Counter” in Chapter 17 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B, for specific details of the time stamp counter behavior.\nWhen in protected or virtual 8086 mode, the time stamp disable (TSD) flag in register CR4 restricts the use of the RDTSCP instruction as follows. When the TSD flag is clear, the RDTSCP instruction can be executed at any privilege level; when the flag is set, the instruction can only be executed at privilege level 0. (When in real-address mode, the RDTSCP instruction is always enabled.)\nThe RDTSCP instruction waits until all previous instructions have been executed before reading the counter. However, subsequent instructions may begin execution before the read operation is performed.\nThe presence of the RDTSCP instruction is indicated by CPUID leaf 80000001H, EDX bit 27. If the bit is set to 1 then RDTSCP is present on the processor.\nSee “Changes to Instruction Behavior in VMX Non-Root Operation” in Chapter 25 of the Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.\n" }, { "Name": "REP", "Alias": [ "REPE", "REPZ", "REPNE", "REPNZ" ], "Brief": "Repeat String Operation Prefix", "Description": "\nRepeats a string instruction the number of times specified in the count register or until the indicated condition of the ZF flag is no longer met. The REP (repeat), REPE (repeat while equal), REPNE (repeat while not equal), REPZ (repeat while zero), and REPNZ (repeat while not zero) mnemonics are prefixes that can be added to one of the string instructions. The REP prefix can be added to the INS, OUTS, MOVS, LODS, and STOS instructions, and the REPE, REPNE, REPZ, and REPNZ prefixes can be added to the CMPS and SCAS instructions. (The REPZ and REPNZ prefixes are synonymous forms of the REPE and REPNE prefixes, respectively.) The F3H prefix is defined for the following instructions and undefined for the rest:\nThe REP prefixes apply only to one string instruction at a time. To repeat a block of instructions, use the LOOP instruction or another looping construct. All of these repeat prefixes cause the associated instruction to be repeated until the count in register is decremented to 0. See Table 4-14.\nTable 4-14. Repeat Prefixes\n\n\nRepeat Prefix\nTermination Condition 1*\nTermination Condition 2\n\n\nREP\nREPE/REPZ\nREPNE/REPNZ\n\nRCX or (E)CX = 0\nRCX or (E)CX = 0\nRCX or (E)CX = 0\n\nNone\nZF = 0\nZF = 1\nNOTES:\n*\nCount register is CX, ECX or RCX by default, depending on attributes of the operating modes.\nThe REPE, REPNE, REPZ, and REPNZ prefixes also check the state of the ZF flag after each iteration and terminate the repeat loop if the ZF flag is not in the specified state. When both termination conditions are tested, the cause of a repeat termination can be determined either by testing the count register with a JECXZ instruction or by testing the ZF flag (with a JZ, JNZ, or JNE instruction).\nWhen the REPE/REPZ and REPNE/REPNZ prefixes are used, the ZF flag does not require initialization because both the CMPS and SCAS instructions affect the ZF flag according to the results of the comparisons they make.\nA repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the registers is preserved to allow the string operation to be resumed upon a return from the exception or interrupt handler. The source and destination registers point to the next string elements to be operated on, the EIP register points to the string instruction, and the ECX register has the value it held following the last successful iteration of the instruction. This mechanism allows long string operations to proceed without affecting the interrupt response time of the system.\nWhen a fault occurs during the execution of a CMPS or SCAS instruction that is prefixed with REPE or REPNE, the EFLAGS value is restored to the state prior to the execution of the instruction. Since the SCAS and CMPS instruc-tions do not use EFLAGS as an input, the processor can resume the instruction after the page fault handler.\nUse the REP INS and REP OUTS instructions with caution. Not all I/O ports can handle the rate at which these instructions execute. Note that a REP STOS instruction is the fastest way to initialize a large block of memory.\nIn 64-bit mode, the operand size of the count register is associated with the address size attribute. Thus the default count register is RCX; REX.W has no effect on the address size and the count register. In 64-bit mode, if 67H is used to override address size attribute, the count register is ECX and any implicit source/destination operand will use the corresponding 32-bit index register. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "RET", "Alias": [], "Brief": "Return from Procedure", "Description": "\nTransfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.\nThe optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate.\nThe RET instruction can be used to execute three different types of returns:\nThe inter-privilege-level return type can only be executed in protected mode. See the section titled “Calling Proce-dures Using Call and RET” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for detailed information on near, far, and inter-privilege-level returns.\nWhen executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged.\nWhen executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.\nThe mechanics of an inter-privilege-level far return are similar to an intersegment return, except that the processor examines the privilege levels and access rights of the code and stack segments being returned to deter-mine if the control transfer is allowed to be made. The DS, ES, FS, and GS segment registers are cleared by the RET instruction during an inter-privilege-level return if they refer to segments that are not allowed to be accessed at the new privilege level. Since a stack switch also occurs on an inter-privilege level return, the ESP and SS registers are loaded from the stack.\nIf parameters are passed to the called procedure during an inter-privilege level call, the optional source operand must be used with the RET instruction to release the parameters on the return. Here, the parameters are released both from the called procedure’s stack and the calling procedure’s stack (that is, the stack being returned to).\nIn 64-bit mode, the default operation size of this instruction is the stack-address size, i.e. 64 bits. This applies to near returns, not far returns; the default operation size of far returns is 32 bits.\n" }, { "Name": "RORX", "Alias": [], "Brief": "Rotate Right Logical Without Affecting Flags", "Description": "\nRotates the bits of second operand right by the count value specified in imm8 without affecting arithmetic flags. The RORX instruction does not read or write the arithmetic flags.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n" }, { "Name": "ROUNDPD", "Alias": [], "Brief": "Round Packed Double Precision Floating", "Description": "\nRound the 2 double-precision floating-point values in the source operand (second operand) using the rounding mode specified in the immediate operand (third operand) and place the results in the destination operand (first operand). The rounding process rounds each input floating-point value to an integer value and returns the integer result as a single-precision floating-point value.\nThe immediate operand specifies control fields for the rounding operation, three bit fields are defined and shown in Figure 4-20. Bit 3 of the immediate byte controls processor behavior for a precision exception, bit 2 selects the source of rounding mode control. Bits 1:0 specify a non-sticky rounding-mode value (Table 4-15 lists the encoded values for rounding-mode field).\nThe Precision Floating-Point Exception is signaled according to the immediate operand. If any source operand is an SNaN then it will be converted to a QNaN. If DAZ is set to ‘1 then denormals will be converted to zero before rounding.\n128-bit Legacy SSE version: The second source can be an XMM register or 128-bit memory location. The destina-tion is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the source operand second source operand or a 128-bit memory location. The destina-tion operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n\n8\n3\n1\n0\n2\nP — Precision Mask; 0: normal, 1: inexact\nRS — Rounding select; 1: MXCSR.RC, 0: Imm8.RC\nRC — Rounding mode\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nReserved\nFigure 4-20. Bit Control Fields of Immediate Byte for ROUNDxx Instruction\nTable 4-15. Rounding Modes and Encoding of Rounding Control (RC) Field\n\n\nRounding Mode\nRC Field Setting\nDescription\n\nRound to nearest (even)\n00B\nRounded result is the closest to the infinitely precise result. If two values are equally close, the result is the even value (i.e., the integer value with the least-significant bit of zero).\n\nRound down (toward −∞)\n01B\nRounded result is closest to but no greater than the infinitely precise result.\n\nRound up (toward +∞)\n10B\nRounded result is closest to but no less than the infinitely precise result.\n\nRound toward zero (Truncate)\n11B\nRounded result is closest to but no greater in absolute value than the infinitely precise result.\n" }, { "Name": "ROUNDPS", "Alias": [], "Brief": "Round Packed Single Precision Floating", "Description": "\nRound the 4 single-precision floating-point values in the source operand (second operand) using the rounding mode specified in the immediate operand (third operand) and place the results in the destination operand (first operand). The rounding process rounds each input floating-point value to an integer value and returns the integer result as a single-precision floating-point value.\nThe immediate operand specifies control fields for the rounding operation, three bit fields are defined and shown in Figure 4-20. Bit 3 of the immediate byte controls processor behavior for a precision exception, bit 2 selects the source of rounding mode control. Bits 1:0 specify a non-sticky rounding-mode value (Table 4-15 lists the encoded values for rounding-mode field).\nThe Precision Floating-Point Exception is signaled according to the immediate operand. If any source operand is an SNaN then it will be converted to a QNaN. If DAZ is set to ‘1 then denormals will be converted to zero before rounding.\n128-bit Legacy SSE version: The second source can be an XMM register or 128-bit memory location. The destina-tion is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the source operand second source operand or a 128-bit memory location. The destina-tion operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "ROUNDSD", "Alias": [], "Brief": "Round Scalar Double Precision Floating", "Description": "\nRound the DP FP value in the lower qword of the source operand (second operand) using the rounding mode spec-ified in the immediate operand (third operand) and place the result in the destination operand (first operand). The rounding process rounds a double-precision floating-point input to an integer value and returns the integer result as a double precision floating-point value in the lowest position. The upper double precision floating-point value in the destination is retained.\nThe immediate operand specifies control fields for the rounding operation, three bit fields are defined and shown in Figure 4-20. Bit 3 of the immediate byte controls processor behavior for a precision exception, bit 2 selects the source of rounding mode control. Bits 1:0 specify a non-sticky rounding-mode value (Table 4-15 lists the encoded values for rounding-mode field).\nThe Precision Floating-Point Exception is signaled according to the immediate operand. If any source operand is an SNaN then it will be converted to a QNaN. If DAZ is set to ‘1 then denormals will be converted to zero before rounding.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "ROUNDSS", "Alias": [], "Brief": "Round Scalar Single Precision Floating", "Description": "\nRound the single-precision floating-point value in the lowest dword of the source operand (second operand) using the rounding mode specified in the immediate operand (third operand) and place the result in the destination operand (first operand). The rounding process rounds a single-precision floating-point input to an integer value and returns the result as a single-precision floating-point value in the lowest position. The upper three single-precision floating-point values in the destination are retained.\nThe immediate operand specifies control fields for the rounding operation, three bit fields are defined and shown in Figure 4-20. Bit 3 of the immediate byte controls processor behavior for a precision exception, bit 2 selects the source of rounding mode control. Bits 1:0 specify a non-sticky rounding-mode value (Table 4-15 lists the encoded values for rounding-mode field).\nThe Precision Floating-Point Exception is signaled according to the immediate operand. If any source operand is an SNaN then it will be converted to a QNaN. If DAZ is set to ‘1 then denormals will be converted to zero before rounding.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "RSM", "Alias": [], "Brief": "Resume from System Management Mode", "Description": "\nReturns program control from system management mode (SMM) to the application program or operating-system procedure that was interrupted when the processor received an SMM interrupt. The processor’s state is restored from the dump created upon entering SMM. If the processor detects invalid state information during state restora-tion, it enters the shutdown state. The following invalid information can cause a shutdown:\nThe contents of the model-specific registers are not affected by a return from SMM.\nThe SMM state map used by RSM supports resuming processor context for non-64-bit modes and 64-bit mode.\nSee Chapter 34, “System Management Mode,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, for more information about SMM and the behavior of the RSM instruction.\n" }, { "Name": "RSQRTPS", "Alias": [], "Brief": "Compute Reciprocals of Square Roots of Packed Single", "Description": "\nPerforms a SIMD computation of the approximate reciprocals of the square roots of the four packed single-preci-sion floating-point values in the source operand (second operand) and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. See Figure 10-5 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD single-precision floating-point operation.\nThe relative error for this approximation is:\n|Relative Error| ≤ 1.5 ∗ 2−12\nThe RSQRTPS instruction is not affected by the rounding control bits in the MXCSR register. When a source value is a 0.0, an ∞ of the sign of the source value is returned. A denormal source value is treated as a 0.0 (of the same sign). When a source value is a negative value (other than −0.0), a floating-point indefinite is returned. When a source value is an SNaN or QNaN, the SNaN is converted to a QNaN or the source QNaN is returned.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "RSQRTSS", "Alias": [], "Brief": "Compute Reciprocal of Square Root of Scalar Single", "Description": "\nComputes an approximate reciprocal of the square root of the low single-precision floating-point value in the source operand (second operand) stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a scalar single-precision floating-point operation.\nThe relative error for this approximation is:\n|Relative Error| ≤ 1.5 ∗ 2−12\nThe RSQRTSS instruction is not affected by the rounding control bits in the MXCSR register. When a source value is a 0.0, an ∞ of the sign of the source value is returned. A denormal source value is treated as a 0.0 (of the same sign). When a source value is a negative value (other than −0.0), a floating-point indefinite is returned. When a source value is an SNaN or QNaN, the SNaN is converted to a QNaN or the source QNaN is returned.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "SAHF", "Alias": [], "Brief": "Store AH into Flags", "Description": "\nLoads the SF, ZF, AF, PF, and CF flags of the EFLAGS register with values from the corresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3, and 5 of register AH are ignored; the corresponding reserved bits (1, 3, and 5) in the EFLAGS register remain as shown in the “Operation” section below.\nThis instruction executes as described above in compatibility mode and legacy mode. It is valid in 64-bit mode only if CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1.\n" }, { "Name": "SAL", "Alias": [ "SAR", "SHL", "SHR" ], "Brief": "Shift", "Description": "\nShifts the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted beyond the destination operand boundary are first shifted into the CF flag, then discarded. At the end of the shift operation, the CF flag contains the last bit shifted out of the destination operand.\nThe destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.\nThe shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation; they shift the bits in the destination operand to the left (toward more significant bit locations). For each shift count, the most significant bit of the destination operand is shifted into the CF flag, and the least significant bit is cleared (see Figure 7-7 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1).\nThe shift arithmetic right (SAR) and shift logical right (SHR) instructions shift the bits of the destination operand to the right (toward less significant bit locations). For each shift count, the least significant bit of the destination operand is shifted into the CF flag, and the most significant bit is either set or cleared depending on the instruction type. The SHR instruction clears the most significant bit (see Figure 7-8 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1); the SAR instruction sets or clears the most significant bit to correspond to the sign (most significant bit) of the original value in the destination operand. In effect, the SAR instruction fills the empty bit position’s shifted value with the sign of the unshifted value (see Figure 7-9 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1).\nThe SAR and SHR instructions can be used to perform signed or unsigned division, respectively, of the destination operand by powers of 2. For example, using the SAR instruction to shift a signed integer 1 bit to the right divides the value by 2.\nUsing the SAR instruction to perform a division operation does not produce the same result as the IDIV instruction. The quotient from the IDIV instruction is rounded toward zero, whereas the “quotient” of the SAR instruction is rounded toward negative infinity. This difference is apparent only for negative numbers. For example, when the IDIV instruction is used to divide -9 by 4, the result is -2 with a remainder of -1. If the SAR instruction is used to shift -9 right by two bits, the result is -3 and the “remainder” is +3; however, the SAR instruction stores only the most significant bit of the remainder (in the CF flag).\nThe OF flag is affected only on 1-bit shifts. For left shifts, the OF flag is set to 0 if the most-significant bit of the result is the same as the CF flag (that is, the top two bits of the original operand were the same); otherwise, it is set to 1. For the SAR instruction, the OF flag is cleared for all 1-bit shifts. For the SHR instruction, the OF flag is set to the most-significant bit of the original operand.\nIn 64-bit mode, the instruction’s default operation size is 32 bits and the mask width for CL is 5 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64-bits and sets the mask width for CL to 6 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "SARX", "Alias": [ "SHLX", "SHRX " ], "Brief": "Shift Without Affecting Flags", "Description": "\nShifts the bits of the first source operand (the second operand) to the left or right by a COUNT value specified in the second source operand (the third operand). The result is written to the destination operand (the first operand).\nThe shift arithmetic right (SARX) and shift logical right (SHRX) instructions shift the bits of the destination operand to the right (toward less significant bit locations), SARX keeps and propagates the most significant bit (sign bit) while shifting.\nThe logical shift left (SHLX) shifts the bits of the destination operand to the left (toward more significant bit loca-tions).\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\nIf the value specified in the first source operand exceeds OperandSize -1, the COUNT value is masked.\nSARX,SHRX, and SHLX instructions do not update flags.\n" }, { "Name": "SBB", "Alias": [], "Brief": "Integer Subtraction with Borrow", "Description": "\nAdds the source operand (second operand) and the carry (CF) flag, and subtracts the result from the destination operand (first operand). The result of the subtraction is stored in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a borrow from a previous subtraction.\nWhen an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.\nThe SBB instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a borrow in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.\nThe SBB instruction is usually executed as part of a multibyte or multiword subtraction in which a SUB instruction is followed by a SBB instruction.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "SCAS", "Alias": [ "SCASB", "SCASW", "SCASD" ], "Brief": "Scan String", "Description": "\nIn non-64-bit modes and in default 64-bit mode: this instruction compares a byte, word, doubleword or quadword specified using a memory operand with the value in AL, AX, or EAX. It then sets status flags in EFLAGS recording the results. The memory operand address is read from ES:(E)DI register (depending on the address-size attribute of the instruction and the current operational mode). Note that ES cannot be overridden with a segment override prefix.\nAt the assembly-code level, two forms of this instruction are allowed. The explicit-operand form and the no-oper-ands form. The explicit-operand form (specified using the SCAS mnemonic) allows a memory operand to be speci-fied explicitly. The memory operand must be a symbol that indicates the size and location of the operand value. The register operand is then automatically selected to match the size of the memory operand (AL register for byte comparisons, AX for word comparisons, EAX for doubleword comparisons). The explicit-operand form is provided to allow documentation. Note that the documentation provided by this form can be misleading. That is, the memory operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword) but it does not have to specify the correct location. The location is always specified by ES:(E)DI.\nThe no-operands form of the instruction uses a short form of SCAS. Again, ES:(E)DI is assumed to be the memory operand and AL, AX, or EAX is assumed to be the register operand. The size of operands is selected by the mnemonic: SCASB (byte comparison), SCASW (word comparison), or SCASD (doubleword comparison).\nAfter the comparison, the (E)DI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. If the DF flag is 0, the (E)DI register is incremented; if the DF flag is 1, the (E)DI register is decremented. The register is incremented or decremented by 1 for byte operations, by 2 for word oper-ations, and by 4 for doubleword operations.\nSCAS, SCASB, SCASW, SCASD, and SCASQ can be preceded by the REP prefix for block comparisons of ECX bytes, words, doublewords, or quadwords. Often, however, these instructions will be used in a LOOP construct that takes\nsome action based on the setting of status flags. See “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in this chapter for a description of the REP prefix.\nIn 64-bit mode, the instruction’s default address size is 64-bits, 32-bit address size is supported using the prefix 67H. Using a REX prefix in the form of REX.W promotes operation on doubleword operand to 64 bits. The 64-bit no-operand mnemonic is SCASQ. Address of the memory operand is specified in either RDI or EDI, and AL/AX/EAX/RAX may be used as the register operand. After a comparison, the destination register is incremented or decremented by the current operand size (depending on the value of the DF flag). See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "SETcc", "Alias": [ "SETA", "SETAE", "SETB", "SETBE", "SETC", "SETE", "SETG", "SETGE", "SETL", "SETLE", "SETNA", "SETNAE", "SETNB", "SETNBE", "SETNC", "SETNE", "SETNG", "SETNGE", "SETNL", "SETNLE", "SETNO", "SETNP", "SETNS", "SETNZ", "SETO", "SETP", "SETPE", "SETPO", "SETS", "SETZ" ], "Brief": "Set Byte on Condition", "Description": "\nSets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The condition code suffix (cc) indicates the condition being tested for.\nThe terms “above” and “below” are associated with the CF flag and refer to the relationship between two unsigned integer values. The terms “greater” and “less” are associated with the SF and OF flags and refer to the relationship between two signed integer values.\nMany of the SETcc instruction opcodes have alternate mnemonics. For example, SETG (set byte if greater) and SETNLE (set if not less or equal) have the same opcode and test for the same condition: ZF equals 0 and SF equals OF. These alternate mnemonics are provided to make code more intelligible. Appendix B, “EFLAGS Condition Codes,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, shows the alternate mnemonics for various test conditions.\nSome languages represent a logical one as an integer with all bits set. This representation can be obtained by choosing the logically opposite condition for the SETcc instruction, then decrementing the result. For example, to test for overflow, use the SETNO instruction, then decrement the result.\nIn IA-64 mode, the operand size is fixed at 8 bits. Use of REX prefix enable uniform addressing to additional byte registers. Otherwise, this instruction’s operation is the same as in legacy mode and compatibility mode.\n" }, { "Name": "SFENCE", "Alias": [], "Brief": "Store Fence", "Description": "\nPerforms a serializing operation on all store-to-memory instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes the SFENCE instruction in program order becomes globally visible before any store instruction that follows the SFENCE instruction. The SFENCE instruction is ordered with respect to store instructions, other SFENCE instructions, any LFENCE and MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions.\nWeakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue, write-combining, and write-collapsing. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The SFENCE instruction provides a performance-efficient way of ensuring store ordering between routines that produce weakly-ordered results and routines that consume this data.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\nSpecification of the instruction's opcode above indicates a ModR/M byte of F8. For this instruction, the processor ignores the r/m field of the ModR/M byte. Thus, SFENCE is encoded by any opcode of the form 0F AE Fx, where x is in the range 8-F.\n" }, { "Name": "SGDT", "Alias": [], "Brief": "Store Global Descriptor Table Register", "Description": "\nStores the content of the global descriptor table register (GDTR) in the destination operand. The destination operand specifies a memory location.\nIn legacy or compatibility mode, the destination operand is a 6-byte memory location. If the operand-size attribute is 16 bits, the limit is stored in the low 2 bytes and the 24-bit base address is stored in bytes 3-5, and byte 6 is zero-filled. If the operand-size attribute is 32 bits, the 16-bit limit field of the register is stored in the low 2 bytes of the memory location and the 32-bit base address is stored in the high 4 bytes.\nIn IA-32e mode, the operand size is fixed at 8+2 bytes. The instruction stores an 8-byte base and a 2-byte limit.\nSGDT is useful only by operating-system software. However, it can be used in application programs without causing an exception to be generated. See “LGDT/LIDT—Load Global/Interrupt Descriptor Table Register” in Chapter 3, Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A, for information on loading the GDTR and IDTR.\n" }, { "Name": "SHLD", "Alias": [], "Brief": "Double Precision Shift Left", "Description": "\nThe SHLD instruction is used for multi-precision shifts of 64 bits or more.\nThe instruction shifts the first operand (destination operand) to the left the number of bits specified by the third operand (count operand). The second operand (source operand) provides bits to shift in from the right (starting with bit 0 of the destination operand).\nThe destination operand can be a register or a memory location; the source operand is a register. The count operand is an unsigned integer that can be stored in an immediate byte or in the CL register. If the count operand is CL, the shift count is the logical AND of CL and a count mask. In non-64-bit modes and default 64-bit mode; only bits 0 through 4 of the count are used. This masks the count to a value between 0 and 31. If a count is greater than the operand size, the result is undefined.\nIf the count is 1 or greater, the CF flag is filled with the last bit shifted out of the destination operand. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. If the count operand is 0, flags are not affected.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits (upgrading the count mask to 6 bits). See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "SHRD", "Alias": [], "Brief": "Double Precision Shift Right", "Description": "\nThe SHRD instruction is useful for multi-precision shifts of 64 bits or more.\nThe instruction shifts the first operand (destination operand) to the right the number of bits specified by the third operand (count operand). The second operand (source operand) provides bits to shift in from the left (starting with the most significant bit of the destination operand).\nThe destination operand can be a register or a memory location; the source operand is a register. The count operand is an unsigned integer that can be stored in an immediate byte or the CL register. If the count operand is CL, the shift count is the logical AND of CL and a count mask. In non-64-bit modes and default 64-bit mode, the width of the count mask is 5 bits. Only bits 0 through 4 of the count register are used (masking the count to a value between 0 and 31). If the count is greater than the operand size, the result is undefined.\nIf the count is 1 or greater, the CF flag is filled with the last bit shifted out of the destination operand. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. If the count operand is 0, flags are not affected.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits (upgrading the count mask to 6 bits). See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "SHUFPD", "Alias": [], "Brief": "Shuffle Packed Double", "Description": "\nMoves either of the two packed double-precision floating-point values from destination operand (first operand) into the low quadword of the destination operand; moves either of the two packed double-precision floating-point values from the source operand into to the high quadword of the destination operand (see Figure 4-21). The select operand (third operand) determines which values are moved to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n\nDEST\nSRC\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX1 or X0\nX0\nY0\nX1\nY1\nY1 or Y0\nFigure 4-21. SHUFPD Shuffle Operation\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The select operand is an 8-bit immediate: bit 0 selects which value is moved from the destination operand to the result (where 0 selects the low quadword and 1 selects the high quadword) and bit 1 selects which value is moved from the source operand to the result. Bits 2 through 7 of the select operand are reserved and must be set to 0.\n" }, { "Name": "SHUFPS", "Alias": [], "Brief": "Shuffle Packed Single", "Description": "\nMoves two of the four packed single-precision floating-point values from the destination operand (first operand) into the low quadword of the destination operand; moves two of the four packed single-precision floating-point values from the source operand (second operand) into to the high quadword of the destination operand (see Figure 4-22). The select operand (third operand) determines which values are moved to the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\ndetermines which values are moved to the destination operand.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n\nDEST\nSRC\nDEST\nX3 ... X0\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX2\nY2\nX3\nX1\nX0\nY3\nY1\nY0\nY3 ... Y0\nY3 ... Y0\nX3 ... X0\nFigure 4-22. SHUFPS Shuffle Operation\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The select operand is an 8-bit immediate: bits 0 and 1 select the value to be moved from the destination operand to the low doubleword of the result, bits 2 and 3 select the value to be moved from the destination operand to the second doubleword of the result, bits 4 and 5 select the value to be moved from the source operand to the third doubleword of the result, and bits 6 and 7 select the value to be moved from the source operand to the high doubleword of the result.\n" }, { "Name": "SIDT", "Alias": [], "Brief": "Store Interrupt Descriptor Table Register", "Description": "\nStores the content the interrupt descriptor table register (IDTR) in the destination operand. The destination operand specifies a 6-byte memory location.\nIn non-64-bit modes, if the operand-size attribute is 32 bits, the 16-bit limit field of the register is stored in the low 2 bytes of the memory location and the 32-bit base address is stored in the high 4 bytes. If the operand-size attri-bute is 16 bits, the limit is stored in the low 2 bytes and the 24-bit base address is stored in the third, fourth, and fifth byte, with the sixth byte filled with 0s.\nIn 64-bit mode, the operand size fixed at 8+2 bytes. The instruction stores 8-byte base and 2-byte limit values.\nSIDT is only useful in operating-system software; however, it can be used in application programs without causing an exception to be generated. See “LGDT/LIDT—Load Global/Interrupt Descriptor Table Register” in Chapter 3, Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A, for information on loading the GDTR and IDTR.\n" }, { "Name": "SLDT", "Alias": [], "Brief": "Store Local Descriptor Table Register", "Description": "\nStores the segment selector from the local descriptor table register (LDTR) in the destination operand. The desti-nation operand can be a general-purpose register or a memory location. The segment selector stored with this instruction points to the segment descriptor (located in the GDT) for the current LDT. This instruction can only be executed in protected mode.\nOutside IA-32e mode, when the destination operand is a 32-bit register, the 16-bit segment selector is copied into the low-order 16 bits of the register. The high-order 16 bits of the register are cleared for the Pentium 4, Intel Xeon, and P6 family processors. They are undefined for Pentium, Intel486, and Intel386 processors. When the destina-tion operand is a memory location, the segment selector is written to memory as a 16-bit quantity, regardless of the operand size.\nIn compatibility mode, when the destination operand is a 32-bit register, the 16-bit segment selector is copied into the low-order 16 bits of the register. The high-order 16 bits of the register are cleared. When the destination operand is a memory location, the segment selector is written to memory as a 16-bit quantity, regardless of the operand size.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). The behavior of SLDT with a 64-bit register is to zero-extend the 16-bit selector and store it in the register. If the desti-nation is memory and operand size is 64, SLDT will write the 16-bit selector to memory as a 16-bit quantity, regardless of the operand size\n" }, { "Name": "SMSW", "Alias": [], "Brief": "Store Machine Status Word", "Description": "\nStores the machine status word (bits 0 through 15 of control register CR0) into the destination operand. The desti-nation operand can be a general-purpose register or a memory location.\nIn non-64-bit modes, when the destination operand is a 32-bit register, the low-order 16 bits of register CR0 are copied into the low-order 16 bits of the register and the high-order 16 bits are undefined. When the destination operand is a memory location, the low-order 16 bits of register CR0 are written to memory as a 16-bit quantity, regardless of the operand size.\nIn 64-bit mode, the behavior of the SMSW instruction is defined by the following examples:\nSMSW is only useful in operating-system software. However, it is not a privileged instruction and can be used in application programs. The is provided for compatibility with the Intel 286 processor. Programs and procedures intended to run on the Pentium 4, Intel Xeon, P6 family, Pentium, Intel486, and Intel386 processors should use the MOV (control registers) instruction to load the machine status word.\nSee “Changes to Instruction Behavior in VMX Non-Root Operation” in Chapter 25 of the Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.\n" }, { "Name": "SQRTPD", "Alias": [], "Brief": "Compute Square Roots of Packed Double", "Description": "\nPerforms a SIMD computation of the square roots of the two packed double-precision floating-point values in the source operand (second operand) stores the packed double-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. See Figure 11-3 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD double-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or 128-bit memory location. The destina-tion is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the source operand second source operand or a 128-bit memory location. The destina-tion operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "SQRTPS", "Alias": [], "Brief": "Compute Square Roots of Packed Single", "Description": "\nPerforms a SIMD computation of the square roots of the four packed single-precision floating-point values in the source operand (second operand) stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. See Figure 10-5 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD single-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or 128-bit memory location. The destina-tion is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the source operand second source operand or a 128-bit memory location. The destina-tion operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n" }, { "Name": "SQRTSD", "Alias": [], "Brief": "Compute Square Root of Scalar Double", "Description": "\nComputes the square root of the low double-precision floating-point value in the source operand (second operand) and stores the double-precision floating-point result in the destination operand. The source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. The high quadword of the desti-nation operand remains unchanged. See Figure 11-4 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an illustration of a scalar double-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "SQRTSS", "Alias": [], "Brief": "Compute Square Root of Scalar Single", "Description": "\nComputes the square root of the low single-precision floating-point value in the source operand (second operand) and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order double-words of the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a scalar single-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "STAC", "Alias": [], "Brief": "Set AC Flag in EFLAGS Register", "Description": "\nSets the AC flag bit in EFLAGS register. This may enable alignment checking of user-mode data accesses. This allows explicit supervisor-mode data accesses to user-mode pages even if the SMAP bit is set in the CR4 register.\nThis instruction's operation is the same in non-64-bit modes and 64-bit mode. Attempts to execute STAC when CPL > 0 cause #UD.\n" }, { "Name": "STC", "Alias": [], "Brief": "Set Carry Flag", "Description": "\nSets the CF flag in the EFLAGS register. Operation is the same in all modes.\n" }, { "Name": "STD", "Alias": [], "Brief": "Set Direction Flag", "Description": "\nSets the DF flag in the EFLAGS register. When the DF flag is set to 1, string operations decrement the index regis-ters (ESI and/or EDI). Operation is the same in all modes.\n" }, { "Name": "STI", "Alias": [], "Brief": "Set Interrupt Flag", "Description": "\nIf protected-mode virtual interrupts are not enabled, STI sets the interrupt flag (IF) in the EFLAGS register. After the IF flag is set, the processor begins responding to external, maskable interrupts after the next instruction is executed. The delayed effect of this instruction is provided to allow interrupts to be enabled just before returning from a procedure (or subroutine). For instance, if an STI instruction is followed by an RET instruction, the RET instruction is allowed to execute before external interrupts are recognized1. If the STI instruction is followed by a CLI instruction (which clears the IF flag), the effect of the STI instruction is negated.\nThe IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and NMI interrupts. NMI interrupts (and SMIs) may be blocked for one macroinstruction following an STI.\nWhen protected-mode virtual interrupts are enabled, CPL is 3, and IOPL is less than 3; STI sets the VIF flag in the EFLAGS register, leaving IF unaffected.\nTable 4-16 indicates the action of the STI instruction depending on the processor’s mode of operation and the CPL/IOPL settings of the running program or procedure.\nOperation is the same in all modes.\nTable 4-16. Decision Table for STI Results\n\n\nCR0.PE\nEFLAGS.VM\nEFLAGS.IOPL\nCS.CPL\nCR4.PVI\nEFLAGS.VIP\nCR4.VME\nSTI Result\n\n0\nX\nX\nX\nX\nX\nX\nIF = 1\n\n1\n0\n≥ CPL\nX\nX\nX\nX\nIF = 1\n\n1\n0\n< CPL\n3\n1\nX\nX\nVIF = 1\n\n1\n0\n< CPL\n< 3\nX\nX\nX\nGP Fault\n\n1\n0\n< CPL\nX\n0\nX\nX\nGP Fault\n\n1\n0\n< CPL\nX\nX\n1\nX\nGP Fault\n\n1\n1\n3\nX\nX\nX\nX\nIF = 1\n\n1\n1\n< 3\nX\nX\n0\n1\nVIF = 1\n\n1\n1\n< 3\nX\nX\n1\nX\nGP Fault\n\n1\n1\n< 3\nX\nX\nX\n0\nGP Fault\nNOTES:\nX = This setting has no impact.\n1.\nThe STI instruction delays recognition of interrupts only if it is executed with EFLAGS.IF = 0. In a sequence of STI instructions, only the first instruction in the sequence is guaranteed to delay interrupts.\nIn the following instruction sequence, interrupts may be recognized before RET executes: STI STI RET\n" }, { "Name": "STMXCSR", "Alias": [], "Brief": "Store MXCSR Register State", "Description": "\nStores the contents of the MXCSR control and status register to the destination operand. The destination operand is a 32-bit memory location. The reserved bits in the MXCSR register are stored as 0s.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\nVEX.L must be 0, otherwise instructions will #UD.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "STOS", "Alias": [ "STOSB", "STOSW", "STOSD", "STOSQ" ], "Brief": "Store String", "Description": "\nIn non-64-bit and default 64-bit mode; stores a byte, word, or doubleword from the AL, AX, or EAX register (respectively) into the destination operand. The destination operand is a memory location, the address of which is read from either the ES:EDI or ES:DI register (depending on the address-size attribute of the instruction and the mode of operation). The ES segment cannot be overridden with a segment override prefix.\nAt the assembly-code level, two forms of the instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the STOS mnemonic) allows the destination operand to be specified explicitly. Here, the destination operand should be a symbol that indicates the size and location of the destination value. The source operand is then automatically selected to match the size of the destination operand (the AL register for byte operands, AX for word operands, EAX for doubleword operands). The explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the destination operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the ES:(E)DI register. These must be loaded correctly before the store string instruction is executed.\nThe no-operands form provides “short forms” of the byte, word, doubleword, and quadword versions of the STOS instructions. Here also ES:(E)DI is assumed to be the destination operand and AL, AX, or EAX is assumed to be the source operand. The size of the destination and source operands is selected by the mnemonic: STOSB (byte read from register AL), STOSW (word from AX), STOSD (doubleword from EAX).\nAfter the byte, word, or doubleword is transferred from the register to the memory location, the (E)DI register is incremented or decremented according to the setting of the DF flag in the EFLAGS register. If the DF flag is 0, the register is incremented; if the DF flag is 1, the register is decremented (the register is incremented or decremented by 1 for byte operations, by 2 for word operations, by 4 for doubleword operations).\nNOTE\nTo improve performance, more recent processors support modifications to the processor’s operation during the string store operations initiated with STOS and STOSB. See Section 7.3.9.3 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 for additional information on fast-string operation.\nIn 64-bit mode, the default address size is 64 bits, 32-bit address size is supported using the prefix 67H. Using a REX prefix in the form of REX.W promotes operation on doubleword operand to 64 bits. The promoted no-operand mnemonic is STOSQ. STOSQ (and its explicit operands variant) store a quadword from the RAX register into the destination addressed by RDI or EDI. See the summary chart at the beginning of this section for encoding data and limits.\nThe STOS, STOSB, STOSW, STOSD, STOSQ instructions can be preceded by the REP prefix for block loads of ECX bytes, words, or doublewords. More often, however, these instructions are used within a LOOP construct because data needs to be moved into the AL, AX, or EAX register before it can be stored. See “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in this chapter for a description of the REP prefix.\n" }, { "Name": "STR", "Alias": [], "Brief": "Store Task Register", "Description": "\nStores the segment selector from the task register (TR) in the destination operand. The destination operand can be a general-purpose register or a memory location. The segment selector stored with this instruction points to the task state segment (TSS) for the currently running task.\nWhen the destination operand is a 32-bit register, the 16-bit segment selector is copied into the lower 16 bits of the register and the upper 16 bits of the register are cleared. When the destination operand is a memory location, the segment selector is written to memory as a 16-bit quantity, regardless of operand size.\nIn 64-bit mode, operation is the same. The size of the memory operand is fixed at 16 bits. In register stores, the 2-byte TR is zero extended if stored to a 64-bit register.\nThe STR instruction is useful only in operating-system software. It can only be executed in protected mode.\n" }, { "Name": "SUB", "Alias": [], "Brief": "Subtract", "Description": "\nSubtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.\nThe SUB instruction performs integer subtraction. It evaluates the result for both signed and unsigned integer operands and sets the OF and CF flags to indicate an overflow in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\n" }, { "Name": "SUBPD", "Alias": [], "Brief": "Subtract Packed Double", "Description": "\nPerforms a SIMD subtract of the two packed double-precision floating-point values in the source operand (second operand) from the two packed double-precision floating-point values in the destination operand (first operand), and stores the packed double-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. See Figure 11-3 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD double-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: T second source can be an XMM register or an 128-bit memory location. The destina-tion is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "SUBPS", "Alias": [], "Brief": "Subtract Packed Single", "Description": "\nPerforms a SIMD subtract of the four packed single-precision floating-point values in the source operand (second operand) from the four packed single-precision floating-point values in the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. See Figure 10-5 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD double-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "SUBSD", "Alias": [], "Brief": "Subtract Scalar Double", "Description": "\nSubtracts the low double-precision floating-point value in the source operand (second operand) from the low double-precision floating-point value in the destination operand (first operand), and stores the double-precision floating-point result in the destination operand. The source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. The high quadword of the destination operand remains unchanged. See Figure 11-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a scalar double-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "SUBSS", "Alias": [], "Brief": "Subtract Scalar Single", "Description": "\nSubtracts the low single-precision floating-point value in the source operand (second operand) from the low single-precision floating-point value in the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a scalar single-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n" }, { "Name": "SWAPGS", "Alias": [], "Brief": "Swap GS Base Register", "Description": "\nSWAPGS exchanges the current GS base register value with the value contained in MSR address C0000102H (IA32_KERNEL_GS_BASE). The SWAPGS instruction is a privileged instruction intended for use by system soft-ware.\nWhen using SYSCALL to implement system calls, there is no kernel stack at the OS entry point. Neither is there a straightforward method to obtain a pointer to kernel structures from which the kernel stack pointer could be read. Thus, the kernel cannot save general purpose registers or reference memory.\nBy design, SWAPGS does not require any general purpose registers or memory operands. No registers need to be saved before using the instruction. SWAPGS exchanges the CPL 0 data pointer from the IA32_KERNEL_GS_BASE MSR with the GS base register. The kernel can then use the GS prefix on normal memory references to access kernel data structures. Similarly, when the OS kernel is entered using an interrupt or exception (where the kernel stack is already set up), SWAPGS can be used to quickly get a pointer to the kernel data structures.\nThe IA32_KERNEL_GS_BASE MSR itself is only accessible using RDMSR/WRMSR instructions. Those instructions are only accessible at privilege level 0. The WRMSR instruction ensures that the IA32_KERNEL_GS_BASE MSR contains a canonical address.\n" }, { "Name": "SYSCALL", "Alias": [], "Brief": "Fast System Call", "Description": "\nSYSCALL invokes an OS system-call handler at privilege level 0. It does so by loading RIP from the IA32_LSTAR MSR (after saving the address of the instruction following SYSCALL into RCX). (The WRMSR instruction ensures that the IA32_LSTAR MSR always contain a canonical address.)\nSYSCALL also saves RFLAGS into R11 and then masks RFLAGS using the IA32_FMASK MSR (MSR address C0000084H); specifically, the processor clears in RFLAGS every bit corresponding to a bit that is set in the IA32_FMASK MSR.\nSYSCALL loads the CS and SS selectors with values derived from bits 47:32 of the IA32_STAR MSR. However, the CS and SS descriptor caches are not loaded from the descriptors (in GDT or LDT) referenced by those selectors. Instead, the descriptor caches are loaded with fixed values. See the Operation section for details. It is the respon-sibility of OS software to ensure that the descriptors (in GDT or LDT) referenced by those selector values corre-spond to the fixed values loaded into the descriptor caches; the SYSCALL instruction does not ensure this correspondence.\nThe SYSCALL instruction does not save the stack pointer (RSP). If the OS system-call handler will change the stack pointer, it is the responsibility of software to save the previous value of the stack pointer. This might be done prior to executing SYSCALL, with software restoring the stack pointer with the instruction following SYSCALL (which will be executed after SYSRET). Alternatively, the OS system-call handler may save the stack pointer and restore it before executing SYSRET.\n" }, { "Name": "SYSENTER", "Alias": [], "Brief": "Fast System Call", "Description": "\nExecutes a fast call to a level 0 system procedure or routine. SYSENTER is a companion instruction to SYSEXIT. The instruction is optimized to provide the maximum performance for system calls from user code running at privilege level 3 to operating system or executive procedures running at privilege level 0.\nWhen executed in IA-32e mode, the SYSENTER instruction transitions the logical processor to 64-bit mode; other-wise, the logical processor remains in protected mode.\nPrior to executing the SYSENTER instruction, software must specify the privilege level 0 code segment and code entry point, and the privilege level 0 stack segment and stack pointer by writing values to the following MSRs:\nThese MSRs can be read from and written to using RDMSR/WRMSR. The WRMSR instruction ensures that the IA32_SYSENTER_EIP and IA32_SYSENTER_ESP MSRs always contain canonical addresses.\nWhile SYSENTER loads the CS and SS selectors with values derived from the IA32_SYSENTER_CS MSR, the CS and SS descriptor caches are not loaded from the descriptors (in GDT or LDT) referenced by those selectors. Instead, the descriptor caches are loaded with fixed values. See the Operation section for details. It is the responsibility of OS software to ensure that the descriptors (in GDT or LDT) referenced by those selector values correspond to the fixed values loaded into the descriptor caches; the SYSENTER instruction does not ensure this correspondence.\nThe SYSENTER instruction can be invoked from all operating modes except real-address mode.\nThe SYSENTER and SYSEXIT instructions are companion instructions, but they do not constitute a call/return pair. When executing a SYSENTER instruction, the processor does not save state information for the user code (e.g., the instruction pointer), and neither the SYSENTER nor the SYSEXIT instruction supports passing parameters on the stack.\nTo use the SYSENTER and SYSEXIT instructions as companion instructions for transitions between privilege level 3 code and privilege level 0 operating system procedures, the following conventions must be followed:\nThe SYSENTER and SYSEXIT instructions were introduced into the IA-32 architecture in the Pentium II processor. The availability of these instructions on a processor is indicated with the SYSENTER/SYSEXIT present (SEP) feature flag returned to the EDX register by the CPUID instruction. An operating system that qualifies the SEP flag must also qualify the processor family and model to ensure that the SYSENTER/SYSEXIT instructions are actually present. For example:\nIF CPUID SEP bit is set\nTHEN IF (Family = 6) and (Model < 3) and (Stepping < 3)\nTHEN\nSYSENTER/SYSEXIT_Not_Supported; FI;\nELSE\nSYSENTER/SYSEXIT_Supported; FI;\nFI;\nWhen the CPUID instruction is executed on the Pentium Pro processor (model 1), the processor returns a the SEP flag as set, but does not support the SYSENTER/SYSEXIT instructions.\n" }, { "Name": "SYSEXIT", "Alias": [], "Brief": "Fast Return from Fast System Call", "Description": "\nExecutes a fast return to privilege level 3 user code. SYSEXIT is a companion instruction to the SYSENTER instruc-tion. The instruction is optimized to provide the maximum performance for returns from system procedures executing at protections levels 0 to user procedures executing at protection level 3. It must be executed from code executing at privilege level 0.\nWith a 64-bit operand size, SYSEXIT remains in 64-bit mode; otherwise, it either enters compatibility mode (if the logical processor is in IA-32e mode) or remains in protected mode (if it is not).\nPrior to executing SYSEXIT, software must specify the privilege level 3 code segment and code entry point, and the privilege level 3 stack segment and stack pointer by writing values into the following MSR and general-purpose registers:\nThe IA32_SYSENTER_CS MSR can be read from and written to using RDMSR and WRMSR.\nWhile SYSEXIT loads the CS and SS selectors with values derived from the IA32_SYSENTER_CS MSR, the CS and SS descriptor caches are not loaded from the descriptors (in GDT or LDT) referenced by those selectors. Instead, the descriptor caches are loaded with fixed values. See the Operation section for details. It is the responsibility of OS software to ensure that the descriptors (in GDT or LDT) referenced by those selector values correspond to the fixed values loaded into the descriptor caches; the SYSEXIT instruction does not ensure this correspondence.\nThe SYSEXIT instruction can be invoked from all operating modes except real-address mode and virtual-8086 mode.\nThe SYSENTER and SYSEXIT instructions were introduced into the IA-32 architecture in the Pentium II processor. The availability of these instructions on a processor is indicated with the SYSENTER/SYSEXIT present (SEP) feature flag returned to the EDX register by the CPUID instruction. An operating system that qualifies the SEP flag must also qualify the processor family and model to ensure that the SYSENTER/SYSEXIT instructions are actually present. For example:\nIF CPUID SEP bit is set\nTHEN IF (Family = 6) and (Model < 3) and (Stepping < 3)\nTHEN\nSYSENTER/SYSEXIT_Not_Supported; FI;\nELSE\nSYSENTER/SYSEXIT_Supported; FI;\nFI;\nWhen the CPUID instruction is executed on the Pentium Pro processor (model 1), the processor returns a the SEP flag as set, but does not support the SYSENTER/SYSEXIT instructions.\n" }, { "Name": "SYSRET", "Alias": [], "Brief": "Return From Fast System Call", "Description": "\nSYSRET is a companion instruction to the SYSCALL instruction. It returns from an OS system-call handler to user code at privilege level 3. It does so by loading RIP from RCX and loading RFLAGS from R11.1 With a 64-bit operand size, SYSRET remains in 64-bit mode; otherwise, it enters compatibility mode and only the low 32 bits of the regis-ters are loaded.\nSYSRET loads the CS and SS selectors with values derived from bits 63:48 of the IA32_STAR MSR. However, the CS and SS descriptor caches are not loaded from the descriptors (in GDT or LDT) referenced by those selectors. Instead, the descriptor caches are loaded with fixed values. See the Operation section for details. It is the respon-sibility of OS software to ensure that the descriptors (in GDT or LDT) referenced by those selector values corre-spond to the fixed values loaded into the descriptor caches; the SYSRET instruction does not ensure this correspondence.\nThe SYSRET instruction does not modify the stack pointer (ESP or RSP). For that reason, it is necessary for software to switch to the user stack. The OS may load the user stack pointer (if it was saved after SYSCALL) before executing SYSRET; alternatively, user code may load the stack pointer (if it was saved before SYSCALL) after receiving control from SYSRET.\nIf the OS loads the stack pointer before executing SYSRET, it must ensure that the handler of any interrupt or exception delivered between restoring the stack pointer and successful execution of SYSRET is not invoked with the user stack. It can do so using approaches such as the following:\n— Confirming that the value of RCX is canonical before executing SYSRET.\n— Using paging to ensure that the SYSCALL instruction will never save a non-canonical value into RCX.\n— Using the IST mechanism for gate 13 (#GP) in the IDT.\n" }, { "Name": "TEST", "Alias": [], "Brief": "Logical Compare", "Description": "\nComputes the bit-wise logical AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result. The result is then discarded.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "TZCNT", "Alias": [], "Brief": "Count the Number of Trailing Zero Bits", "Description": "\nTZCNT counts the number of trailing least significant zero bits in source operand (second operand) and returns the result in destination operand (first operand). TZCNT is an extension of the BSF instruction. The key difference between TZCNT and BSF instruction is that TZCNT provides operand size as output when source operand is zero while in the case of BSF instruction, if source operand is zero, the content of destination operand are undefined. On processors that do not support TZCNT, the instruction byte encoding is executed as BSF.\n" }, { "Name": "UCOMISD", "Alias": [], "Brief": "Unordered Compare Scalar Double", "Description": "\nPerforms an unordered compare of the double-precision floating-point values in the low quadwords of source operand 1 (first operand) and source operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\nSource operand 1 is an XMM register; source operand 2 can be an XMM register or a 64 bit memory location.\nThe UCOMISD instruction differs from the COMISD instruction in that it signals a SIMD floating-point invalid oper-ation exception (#I) only when a source operand is an SNaN. The COMISD instruction signals an invalid operation exception if a source operand is either a QNaN or an SNaN.\nThe EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "UCOMISS", "Alias": [], "Brief": "Unordered Compare Scalar Single", "Description": "\nPerforms an unordered compare of the single-precision floating-point values in the low doublewords of the source operand 1 (first operand) and the source operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\nSource operand 1 is an XMM register; source operand 2 can be an XMM register or a 32 bit memory location.\nThe UCOMISS instruction differs from the COMISS instruction in that it signals a SIMD floating-point invalid opera-tion exception (#I) only when a source operand is an SNaN. The COMISS instruction signals an invalid operation exception if a source operand is either a QNaN or an SNaN.\nThe EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "UD2", "Alias": [], "Brief": "Undefined Instruction", "Description": "\nGenerates an invalid opcode exception. This instruction is provided for software testing to explicitly generate an invalid opcode exception. The opcode for this instruction is reserved for this purpose.\nOther than raising the invalid opcode exception, this instruction has no effect on processor state or memory.\nEven though it is the execution of the UD2 instruction that causes the invalid opcode exception, the instruction pointer saved by delivery of the exception references the UD2 instruction (and not the following instruction).\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "UNPCKHPD", "Alias": [], "Brief": "Unpack and Interleave High Packed Double", "Description": "\nPerforms an interleaved unpack of the high double-precision floating-point values from the source operand (second operand) and the destination operand (first operand). See Figure 4-23.\n\nDEST\nSRC\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nY1\nX0\nX1\nY1\nX1\nY0\nFigure 4-23. UNPCKHPD Instruction High Unpack and Interleave Operation\nWhen unpacking from a memory operand, an implementation may fetch only the appropriate 64 bits; however, alignment to 16-byte boundary and normal segment checking will still be enforced.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\n" }, { "Name": "UNPCKHPS", "Alias": [], "Brief": "Unpack and Interleave High Packed Single", "Description": "\nPerforms an interleaved unpack of the high-order single-precision floating-point values from the source operand (second operand) and the destination operand (first operand). See Figure 4-24. The source operand can be an XMM register or a 128-bit memory location; the destination operand is an XMM register.\n\nDEST\nSRC\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX3\nX1\nX0\nY3\nY1\nY0\nY3\nX3\nY2\nX2\nX2\nY2\nFigure 4-24. UNPCKHPS Instruction High Unpack and Interleave Operation\nWhen unpacking from a memory operand, an implementation may fetch only the appropriate 64 bits; however, alignment to 16-byte boundary and normal segment checking will still be enforced.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: T second source can be an XMM register or an 128-bit memory location. The destina-tion is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\n" }, { "Name": "UNPCKLPD", "Alias": [], "Brief": "Unpack and Interleave Low Packed Double", "Description": "\nPerforms an interleaved unpack of the low double-precision floating-point values from the source operand (second operand) and the destination operand (first operand). See Figure 4-25. The source operand can be an XMM register or a 128-bit memory location; the destination operand is an XMM register.\n\nDEST\nSRC\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nY0\nX1\nY1\nX0\nX0\nY0\nFigure 4-25. UNPCKLPD Instruction Low Unpack and Interleave Operation\nWhen unpacking from a memory operand, an implementation may fetch only the appropriate 64 bits; however, alignment to 16-byte boundary and normal segment checking will still be enforced.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: T second source can be an XMM register or an 128-bit memory location. The destina-tion is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\n" }, { "Name": "UNPCKLPS", "Alias": [], "Brief": "Unpack and Interleave Low Packed Single", "Description": "\nPerforms an interleaved unpack of the low-order single-precision floating-point values from the source operand (second operand) and the destination operand (first operand). See Figure 4-26. The source operand can be an XMM register or a 128-bit memory location; the destination operand is an XMM register.\n\nDEST\nSRC\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX2\nY2\nX3\nX1\nX0\nY3\nY1\nY0\nY1\nX1\nY0\nFigure 4-26. UNPCKLPS Instruction Low Unpack and Interleave Operation\nWhen unpacking from a memory operand, an implementation may fetch only the appropriate 64 bits; however, alignment to 16-byte boundary and normal segment checking will still be enforced.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\n" }, { "Name": "VBROADCAST", "Alias": [], "Brief": "Broadcast Floating", "Description": "\nLoad floating point values from the source operand (second operand) and broadcast to all elements of the destina-tion operand (first operand).\nVBROADCASTSD and VBROADCASTF128 are only supported as 256-bit wide versions. VBROADCASTSS is supported in both 128-bit and 256-bit wide versions.\nMemory and register source operand syntax support of 256-bit instructions depend on the processor’s enumeration of the following conditions with respect to CPUID.1:ECX.AVX[bit 28] and CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]:\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. An attempt to execute VBROADCASTSD or VBROADCASTF128 encoded with VEX.L= 0 will cause an #UD exception. Attempts to execute any VBROADCAST* instruction with VEX.W = 1 will cause #UD.\n\nm32\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX0\nX0\nX0\nX0\nX0\nX0\nX0\nX0\nX0\nFigure 4-27. VBROADCASTSS Operation (VEX.256 encoded version)\n\nm32\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX0\nX0\n0\nX0\nX0\nX0\n0\n0\n0\nFigure 4-28. VBROADCASTSS Operation (128-bit version)\n\nm64\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX0\nX0\nX0\nX0\nX0\nFigure 4-29. VBROADCASTSD Operation\n\nm128\nDEST\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nX0\nX0\nX0\nFigure 4-30. VBROADCASTF128 Operation\n" }, { "Name": "VCVTPH2PS", "Alias": [], "Brief": "Convert 16", "Description": "\nConverts four/eight packed half precision (16-bits) floating-point values in the low-order 64/128 bits of an XMM/YMM register or 64/128-bit memory location to four/eight packed single-precision floating-point values and writes the converted values into the destination XMM/YMM register.\nIf case of a denormal operand, the correct normal result is returned. MXCSR.DAZ is ignored and is treated as if it 0. No denormal exception is reported on MXCSR.\n128-bit version: The source operand is a XMM register or 64-bit memory location. The destination operand is a XMM register. The upper bits (VLMAX-1:128) of the corresponding destination YMM register are zeroed.\n256-bit version: The source operand is a XMM register or 128-bit memory location. The destination operand is a YMM register.\n The diagram below illustrates how data is converted from four packed half precision (in 64 bits) to four single precision (in 128 bits) FP values. Note: VEX.vvvv is reserved (must be 1111b).\n\nVCVTPH2PS xmm1, xmm2/mem64, imm8\n127 96\n95 64\n63 48\n47 32\n31 16\n15 0\nxmm2/mem64\nVH3\nVH2\nVH1\nVH0\nconvert\nconvert\nconvert\nconvert\n127 96\n95 64\n63 32\n31 0\nVS3\nVS2\nVS1\nVS0\nxmm1\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 4-31. VCVTPH2PS (128-bit Version)\n" }, { "Name": "VCVTPS2PH", "Alias": [], "Brief": "Convert Single", "Description": "\nConvert four or eight packed single-precision floating values in first source operand to four or eight packed half-precision (16-bit) floating-point values. The rounding mode is specified using the immediate field (imm8).\nUnderflow results (i.e. tiny results) are converted to denormals. MXCSR.FTZ is ignored. If a source element is denormal relative to input format with MXCSR.DAZ not set, DM masked and at least one of PM or UM unmasked; a SIMD exception will be raised with DE, UE and PE set.\n128-bit version: The source operand is a XMM register. The destination operand is a XMM register or 64-bit memory location. The upper-bits vector register zeroing behavior of VEX prefix encoding still applies if the destination operand is a xmm register. So the upper bits (255:64) of corresponding YMM register are zeroed.\n256-bit version: The source operand is a YMM register. The destination operand is a XMM register or 128-bit memory location. The upper-bits vector register zeroing behavior of VEX prefix encoding still applies if the destina-tion operand is a xmm register. So the upper bits (255:128) of the corresponding YMM register are zeroed.\nNote: VEX.vvvv is reserved (must be 1111b).\nThe diagram below illustrates how data is converted from four packed single precision (in 128 bits) to four half precision (in 64 bits) FP values.\n\nVCVTPS2PH xmm1/mem64, xmm2, imm8\n127 96\n95 64\n63 32\n31 0\nVS3\nVS2\nVS1\nVS0\nxmm2\nconvert\nconvert\nconvert\nconvert\n127 96\n95 64\n63 48\n47 32\n31 16\n15 0\nVH3\nVH2\nVH1\nVH0\nxmm1/mem64\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 4-32. VCVTPS2PH (128-bit Version)\nThe immediate byte defines several bit fields that controls rounding operation. The effect and encoding of RC field are listed in Table 4-17.\nTable 4-17. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions\n\n\nBits\nField Name/value\nDescription\nComment\n\nImm[1:0]\nRC=00B\nRound to nearest even\nIf Imm[2] = 0\n\n\nRC=01B\nRound down\n\n\n\nRC=10B\nRound up\n\n\n\nRC=11B\nTruncate\n\n\nImm[2]\nMS1=0\nUse imm[1:0] for rounding\nIgnore MXCSR.RC\n\n\nMS1=1\nUse MXCSR.RC for rounding\n\n\nImm[7:3]\nIgnored\nIgnored by processor\n\n" }, { "Name": "VERR", "Alias": [ "VERW" ], "Brief": "Verify a Segment for Reading or Writing", "Description": "\nVerifies whether the code or data segment specified with the source operand is readable (VERR) or writable (VERW) from the current privilege level (CPL). The source operand is a 16-bit register or a memory location that contains the segment selector for the segment to be verified. If the segment is accessible and readable (VERR) or writable (VERW), the ZF flag is set; otherwise, the ZF flag is cleared. Code segments are never verified as writable. This check cannot be performed on system segments.\nTo set the ZF flag, the following conditions must be met:\nThe validation performed is the same as is performed when a segment selector is loaded into the DS, ES, FS, or GS register, and the indicated access (read or write) is performed. The segment selector's value cannot result in a protection exception, enabling the software to anticipate possible segment access problems.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode. The operand size is fixed at 16 bits.\n" }, { "Name": "VEXTRACTF128", "Alias": [], "Brief": "Extract Packed Floating", "Description": "\nExtracts 128-bits of packed floating-point values from the source operand (second operand) at an 128-bit offset from imm8[0] into the destination operand (first operand). The destination may be either an XMM register or an 128-bit memory location.\nVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nThe high 7 bits of the immediate are ignored.\nIf VEXTRACTF128 is encoded with VEX.L= 0, an attempt to execute the instruction encoded with VEX.L= 0 will cause an #UD exception.\n" }, { "Name": "VEXTRACTI128", "Alias": [], "Brief": "Extract packed Integer Values", "Description": "\nExtracts 128-bits of packed integer values from the source operand (second operand) at a 128-bit offset from imm8[0] into the destination operand (first operand). The destination may be either an XMM register or a 128-bit memory location.\nVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nThe high 7 bits of the immediate are ignored.\nAn attempt to execute VEXTRACTI128 encoded with VEX.L= 0 will cause an #UD exception.\n" }, { "Name": "VFMADD132PD", "Alias": [ "VFMADD213PD", "VFMADD231PD " ], "Brief": "Fused Multiply", "Description": "\nPerforms a set of SIMD multiply-add computation on packed double-precision floating-point values using three source operands and writes the multiply-add results in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMADD132PD: Multiplies the two or four packed double-precision floating-point values from the first source operand to the two or four packed double-precision floating-point values in the third source operand, adds the infi-nite precision intermediate result to the two or four packed double-precision floating-point values in the second source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMADD213PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the first source operand, adds the infi-nite precision intermediate result to the two or four packed double-precision floating-point values in the third source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMADD231PD: Multiplies the two or four packed double-precision floating-point values from the second source to the two or four packed double-precision floating-point values in the third source operand, adds the infinite preci-sion intermediate result to the two or four packed double-precision floating-point values in the first source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMADD132PS", "Alias": [ "VFMADD213PS", "VFMADD231PS " ], "Brief": "Fused Multiply", "Description": "\nPerforms a set of SIMD multiply-add computation on packed single-precision floating-point values using three source operands and writes the multiply-add results in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMADD132PS: Multiplies the four or eight packed single-precision floating-point values from the first source operand to the four or eight packed single-precision floating-point values in the third source operand, adds the infi-nite precision intermediate result to the four or eight packed single-precision floating-point values in the second source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMADD213PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the first source operand, adds the infi-nite precision intermediate result to the four or eight packed single-precision floating-point values in the third source operand, performs rounding and stores the resulting the four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMADD231PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the third source operand, adds the infi-nite precision intermediate result to the four or eight packed single-precision floating-point values in the first source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the “Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1”.\n" }, { "Name": "VFMADD132SD", "Alias": [ "VFMADD213SD", "VFMADD231SD " ], "Brief": "Fused Multiply", "Description": "\nPerforms a SIMD multiply-add computation on the low packed double-precision floating-point values using three source operands and writes the multiply-add result in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMADD132SD: Multiplies the low packed double-precision floating-point value from the first source operand to the low packed double-precision floating-point value in the third source operand, adds the infinite precision inter-mediate result to the low packed double-precision floating-point values in the second source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFMADD213SD: Multiplies the low packed double-precision floating-point value from the second source operand to the low packed double-precision floating-point value in the first source operand, adds the infinite precision inter-mediate result to the low packed double-precision floating-point value in the third source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFMADD231SD: Multiplies the low packed double-precision floating-point value from the second source to the low packed double-precision floating-point value in the third source operand, adds the infinite precision intermediate result to the low packed double-precision floating-point value in the first source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 64-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMADD132SS", "Alias": [ "VFMADD213SS", "VFMADD231SS " ], "Brief": "Fused Multiply", "Description": "\nPerforms a SIMD multiply-add computation on packed single-precision floating-point values using three source operands and writes the multiply-add results in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMADD132SS: Multiplies the low packed single-precision floating-point value from the first source operand to the low packed single-precision floating-point value in the third source operand, adds the infinite precision interme-diate result to the low packed single-precision floating-point value in the second source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFMADD213SS: Multiplies the low packed single-precision floating-point value from the second source operand to the low packed single-precision floating-point value in the first source operand, adds the infinite precision interme-diate result to the low packed single-precision floating-point value in the third source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFMADD231SS: Multiplies the low packed single-precision floating-point value from the second source operand to the low packed single-precision floating-point value in the third source operand, adds the infinite precision inter-mediate result to the low packed single-precision floating-point value in the first source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 32-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMADDSUB132PD", "Alias": [ "VFMADDSUB213PD", "VFMADDSUB231PD " ], "Brief": "Fused Multiply", "Description": "\nVFMADDSUB132PD: Multiplies the two or four packed double-precision floating-point values from the first source operand to the two or four packed double-precision floating-point values in the third source operand. From the infi-nite precision intermediate result, adds the odd double-precision floating-point elements and subtracts the even double-precision floating-point values in the second source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMADDSUB213PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the first source operand. From the infinite precision intermediate result, adds the odd double-precision floating-point elements and subtracts the even double-precision floating-point values in the third source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMADDSUB231PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the third source operand. From the infinite precision intermediate result, adds the odd double-precision floating-point elements and subtracts the even double-precision floating-point values in the first source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMADDSUB132PS", "Alias": [ "VFMADDSUB213PS", "VFMADDSUB231PS " ], "Brief": "Fused Multiply", "Description": "\nVFMADDSUB132PS: Multiplies the four or eight packed single-precision floating-point values from the first source operand to the four or eight packed single-precision floating-point values in the third source operand. From the infi-nite precision intermediate result, adds the odd single-precision floating-point elements and subtracts the even single-precision floating-point values in the second source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMADDSUB213PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the first source operand. From the infinite precision intermediate result, adds the odd single-precision floating-point elements and subtracts the even single-precision floating-point values in the third source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMADDSUB231PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the third source operand. From the infinite precision intermediate result, adds the odd single-precision floating-point elements and subtracts the even single-precision floating-point values in the first source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMSUB132PD", "Alias": [ "VFMSUB213PD", "VFMSUB231PD " ], "Brief": "Fused Multiply", "Description": "\nPerforms a set of SIMD multiply-subtract computation on packed double-precision floating-point values using three source operands and writes the multiply-subtract results in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMSUB132PD: Multiplies the two or four packed double-precision floating-point values from the first source operand to the two or four packed double-precision floating-point values in the third source operand. From the infi-nite precision intermediate result, subtracts the two or four packed double-precision floating-point values in the second source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMSUB213PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the first source operand. From the infi-nite precision intermediate result, subtracts the two or four packed double-precision floating-point values in the third source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMSUB231PD: Multiplies the two or four packed double-precision floating-point values from the second source to the two or four packed double-precision floating-point values in the third source operand. From the infinite preci-sion intermediate result, subtracts the two or four packed double-precision floating-point values in the first source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMSUB132PS", "Alias": [ "VFMSUB213PS", "VFMSUB231PS " ], "Brief": "Fused Multiply", "Description": "\nPerforms a set of SIMD multiply-subtract computation on packed single-precision floating-point values using three source operands and writes the multiply-subtract results in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMSUB132PS: Multiplies the four or eight packed single-precision floating-point values from the first source operand to the four or eight packed single-precision floating-point values in the third source operand. From the infi-nite precision intermediate result, subtracts the four or eight packed single-precision floating-point values in the second source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMSUB213PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the first source operand. From the infi-nite precision intermediate result, subtracts the four or eight packed single-precision floating-point values in the third source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMSUB231PS: Multiplies the four or eight packed single-precision floating-point values from the second source to the four or eight packed single-precision floating-point values in the third source operand. From the infinite preci-sion intermediate result, subtracts the four or eight packed single-precision floating-point values in the first source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMSUB132SD", "Alias": [ "VFMSUB213SD", "VFMSUB231SD " ], "Brief": "Fused Multiply", "Description": "\nPerforms a SIMD multiply-subtract computation on the low packed double-precision floating-point values using three source operands and writes the multiply-add result in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMSUB132SD: Multiplies the low packed double-precision floating-point value from the first source operand to the low packed double-precision floating-point value in the third source operand. From the infinite precision inter-mediate result, subtracts the low packed double-precision floating-point values in the second source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFMSUB213SD: Multiplies the low packed double-precision floating-point value from the second source operand to the low packed double-precision floating-point value in the first source operand. From the infinite precision inter-mediate result, subtracts the low packed double-precision floating-point value in the third source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFMSUB231SD: Multiplies the low packed double-precision floating-point value from the second source to the low packed double-precision floating-point value in the third source operand. From the infinite precision intermediate result, subtracts the low packed double-precision floating-point value in the first source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 64-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMSUB132SS", "Alias": [ "VFMSUB213SS", "VFMSUB231SS " ], "Brief": "Fused Multiply", "Description": "\nPerforms a SIMD multiply-subtract computation on the low packed single-precision floating-point values using three source operands and writes the multiply-add result in the destination operand. The destination operand is also the first source operand. The second operand must be a SIMD register. The third source operand can be a SIMD register or a memory location.\nVFMSUB132SS: Multiplies the low packed single-precision floating-point value from the first source operand to the low packed single-precision floating-point value in the third source operand. From the infinite precision interme-diate result, subtracts the low packed single-precision floating-point values in the second source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFMSUB213SS: Multiplies the low packed single-precision floating-point value from the second source operand to the low packed single-precision floating-point value in the first source operand. From the infinite precision interme-diate result, subtracts the low packed single-precision floating-point value in the third source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFMSUB231SS: Multiplies the low packed single-precision floating-point value from the second source to the low packed single-precision floating-point value in the third source operand. From the infinite precision intermediate result, subtracts the low packed single-precision floating-point value in the first source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 32-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMSUBADD132PD", "Alias": [ "VFMSUBADD213PD", "VFMSUBADD231PD " ], "Brief": "Fused Multiply", "Description": "\nVFMSUBADD132PD: Multiplies the two or four packed double-precision floating-point values from the first source operand to the two or four packed double-precision floating-point values in the third source operand. From the infi-nite precision intermediate result, subtracts the odd double-precision floating-point elements and adds the even double-precision floating-point values in the second source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMSUBADD213PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the first source operand. From the infinite precision intermediate result, subtracts the odd double-precision floating-point elements and adds the even double-precision floating-point values in the third source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMSUBADD231PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the third source operand. From the infinite precision intermediate result, subtracts the odd double-precision floating-point elements and adds the even double-precision floating-point values in the first source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFMSUBADD132PS", "Alias": [ "VFMSUBADD213PS", "VFMSUBADD231PS " ], "Brief": "Fused Multiply", "Description": "\nVFMSUBADD132PS: Multiplies the four or eight packed single-precision floating-point values from the first source operand to the four or eight packed single-precision floating-point values in the third source operand. From the infi-nite precision intermediate result, subtracts the odd single-precision floating-point elements and adds the even single-precision floating-point values in the second source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMSUBADD213PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the first source operand. From the infinite precision intermediate result, subtracts the odd single-precision floating-point elements and adds the even single-precision floating-point values in the third source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFMSUBADD231PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the third source operand. From the infinite precision intermediate result, subtracts the odd single-precision floating-point elements and adds the even single-precision floating-point values in the first source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMADD132PD", "Alias": [ "VFNMADD213PD", "VFNMADD231PD " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMADD132PD: Multiplies the two or four packed double-precision floating-point values from the first source operand to the two or four packed double-precision floating-point values in the third source operand, adds the negated infinite precision intermediate result to the two or four packed double-precision floating-point values in the second source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFNMADD213PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the first source operand, adds the negated infinite precision intermediate result to the two or four packed double-precision floating-point values in the third source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFNMADD231PD: Multiplies the two or four packed double-precision floating-point values from the second source to the two or four packed double-precision floating-point values in the third source operand, adds the negated infi-nite precision intermediate result to the two or four packed double-precision floating-point values in the first source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a\nXMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMADD132PS", "Alias": [ "VFNMADD213PS", "VFNMADD231PS " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMADD132PS: Multiplies the four or eight packed single-precision floating-point values from the first source operand to the four or eight packed single-precision floating-point values in the third source operand, adds the negated infinite precision intermediate result to the four or eight packed single-precision floating-point values in the second source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFNMADD213PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the first source operand, adds the negated infinite precision intermediate result to the four or eight packed single-precision floating-point values in the third source operand, performs rounding and stores the resulting the four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFNMADD231PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the third source operand, adds the negated infinite precision intermediate result to the four or eight packed single-precision floating-point values in the first source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMADD132SD", "Alias": [ "VFNMADD213SD", "VFNMADD231SD " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMADD132SD: Multiplies the low packed double-precision floating-point value from the first source operand to the low packed double-precision floating-point value in the third source operand, adds the negated infinite preci-sion intermediate result to the low packed double-precision floating-point values in the second source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFNMADD213SD: Multiplies the low packed double-precision floating-point value from the second source operand to the low packed double-precision floating-point value in the first source operand, adds the negated infinite preci-sion intermediate result to the low packed double-precision floating-point value in the third source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFNMADD231SD: Multiplies the low packed double-precision floating-point value from the second source to the low packed double-precision floating-point value in the third source operand, adds the negated infinite precision intermediate result to the low packed double-precision floating-point value in the first source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 64-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMADD132SS", "Alias": [ "VFNMADD213SS", "VFNMADD231SS " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMADD132SS: Multiplies the low packed single-precision floating-point value from the first source operand to the low packed single-precision floating-point value in the third source operand, adds the negated infinite precision intermediate result to the low packed single-precision floating-point value in the second source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFNMADD213SS: Multiplies the low packed single-precision floating-point value from the second source operand to the low packed single-precision floating-point value in the first source operand, adds the negated infinite preci-sion intermediate result to the low packed single-precision floating-point value in the third source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFNMADD231SS: Multiplies the low packed single-precision floating-point value from the second source operand to the low packed single-precision floating-point value in the third source operand, adds the negated infinite preci-sion intermediate result to the low packed single-precision floating-point value in the first source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 32-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMSUB132PD", "Alias": [ "VFNMSUB213PD", "VFNMSUB231PD " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMSUB132PD: Multiplies the two or four packed double-precision floating-point values from the first source operand to the two or four packed double-precision floating-point values in the third source operand. From negated infinite precision intermediate results, subtracts the two or four packed double-precision floating-point values in the second source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMSUB213PD: Multiplies the two or four packed double-precision floating-point values from the second source operand to the two or four packed double-precision floating-point values in the first source operand. From negated infinite precision intermediate results, subtracts the two or four packed double-precision floating-point values in the third source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVFMSUB231PD: Multiplies the two or four packed double-precision floating-point values from the second source to the two or four packed double-precision floating-point values in the third source operand. From negated infinite precision intermediate results, subtracts the two or four packed double-precision floating-point values in the first source operand, performs rounding and stores the resulting two or four packed double-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a\nXMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMSUB132PS", "Alias": [ "VFNMSUB213PS", "VFNMSUB231PS " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMSUB132PS: Multiplies the four or eight packed single-precision floating-point values from the first source operand to the four or eight packed single-precision floating-point values in the third source operand. From negated infinite precision intermediate results, subtracts the four or eight packed single-precision floating-point values in the second source operand, performs rounding and stores the resulting four or eight packed single-preci-sion floating-point values to the destination operand (first source operand).\nVFNMSUB213PS: Multiplies the four or eight packed single-precision floating-point values from the second source operand to the four or eight packed single-precision floating-point values in the first source operand. From negated infinite precision intermediate results, subtracts the four or eight packed single-precision floating-point values in the third source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVFNMSUB231PS: Multiplies the four or eight packed single-precision floating-point values from the second source to the four or eight packed single-precision floating-point values in the third source operand. From negated infinite precision intermediate results, subtracts the four or eight packed single-precision floating-point values in the first source operand, performs rounding and stores the resulting four or eight packed single-precision floating-point values to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a\nXMM register or a 128-bit memory location and encoded in rm_field. The upper 128 bits of the YMM destination register are zeroed.\nVEX.256 encoded version: The destination operand (also first source operand) is a YMM register and encoded in reg_field. The second source operand is a YMM register and encoded in VEX.vvvv. The third source operand is a YMM register or a 256-bit memory location and encoded in rm_field.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMSUB132SD", "Alias": [ "VFNMSUB213SD", "VFNMSUB231SD " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMSUB132SD: Multiplies the low packed double-precision floating-point value from the first source operand to the low packed double-precision floating-point value in the third source operand. From negated infinite precision intermediate result, subtracts the low double-precision floating-point value in the second source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFNMSUB213SD: Multiplies the low packed double-precision floating-point value from the second source operand to the low packed double-precision floating-point value in the first source operand. From negated infinite precision intermediate result, subtracts the low double-precision floating-point value in the third source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVFNMSUB231SD: Multiplies the low packed double-precision floating-point value from the second source to the low packed double-precision floating-point value in the third source operand. From negated infinite precision interme-diate result, subtracts the low double-precision floating-point value in the first source operand, performs rounding and stores the resulting packed double-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 64-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VFNMSUB132SS", "Alias": [ "VFNMSUB213SS", "VFNMSUB231SS " ], "Brief": "Fused Negative Multiply", "Description": "\nVFNMSUB132SS: Multiplies the low packed single-precision floating-point value from the first source operand to the low packed single-precision floating-point value in the third source operand. From negated infinite precision intermediate result, the low single-precision floating-point value in the second source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFNMSUB213SS: Multiplies the low packed single-precision floating-point value from the second source operand to the low packed single-precision floating-point value in the first source operand. From negated infinite precision intermediate result, the low single-precision floating-point value in the third source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVFNMSUB231SS: Multiplies the low packed single-precision floating-point value from the second source to the low packed single-precision floating-point value in the third source operand. From negated infinite precision interme-diate result, the low single-precision floating-point value in the first source operand, performs rounding and stores the resulting packed single-precision floating-point value to the destination operand (first source operand).\nVEX.128 encoded version: The destination operand (also first source operand) is a XMM register and encoded in reg_field. The second source operand is a XMM register and encoded in VEX.vvvv. The third source operand is a XMM register or a 32-bit memory location and encoded in rm_field. The upper bits ([VLMAX-1:128]) of the YMM destination register are zeroed.\nCompiler tools may optionally support a complementary mnemonic for each instruction mnemonic listed in the opcode/instruction column of the summary table. The behavior of the complementary mnemonic in situations involving NANs are governed by the definition of the instruction mnemonic defined in the opcode/instruction column. See also Section 14.5.1, “FMA Instruction Operand Order and Arithmetic Behavior” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "VGATHERDPD", "Alias": [ "VGATHERQPD " ], "Brief": "Gather Packed DP FP Values Using Signed Dword/Qword Indices", "Description": "\nThe instruction conditionally loads up to 2 or 4 double-precision floating-point values from memory addresses specified by the memory operand (the second operand) and using qword indices. The memory operand uses the VSIB form of the SIB byte to specify a general purpose register operand as the common base, a vector register for an array of indices relative to the base and a constant scale factor.\nThe mask operand (the third operand) specifies the conditional load operation from each memory address and the corresponding update of each data element of the destination operand (the first operand). Conditionality is speci-fied by the most significant bit of each data element of the mask register. If an element’s mask bit is not set, the corresponding element of the destination register is left unchanged. The width of data element in the destination register and mask register are identical. The entire mask register will be set to zero by this instruction unless the instruction causes an exception.\nUsing dword indices in the lower half of the mask register, the instruction conditionally loads up to 2 or 4 double-precision floating-point values from the VSIB addressing memory operand, and updates the destination register.\nThis instruction can be suspended by an exception if at least one element is already gathered (i.e., if the exception is triggered by an element other than the rightmost one with its mask bit set). When this happens, the destination register and the mask operand are partially updated; those elements that have been gathered are placed into the destination register and have their mask bits set to zero. If any traps or interrupts are pending from already gath-ered elements, they will be delivered in lieu of the exception; in this case, EFLAG.RF is set to one so an instruction breakpoint is not re-triggered when the instruction is continued.\nIf the data size and index size are different, part of the destination register and part of the mask register do not correspond to any elements being gathered. This instruction sets those parts to zero. It may do this to one or both of those registers even if the instruction triggers an exception, and even if the instruction triggers the exception before gathering any elements.\nVEX.128 version: The instruction will gather two double-precision floating-point values. For dword indices, only the lower two indices in the vector index register are used.\nVEX.256 version: The instruction will gather four double-precision floating-point values. For dword indices, only the lower four indices in the vector index register are used.\nNote that:\n" }, { "Name": "VGATHERDPS", "Alias": [ "VGATHERQPS " ], "Brief": "Gather Packed SP FP values Using Signed Dword/Qword Indices", "Description": "\nThe instruction conditionally loads up to 4 or 8 single-precision floating-point values from memory addresses spec-ified by the memory operand (the second operand) and using dword indices. The memory operand uses the VSIB form of the SIB byte to specify a general purpose register operand as the common base, a vector register for an array of indices relative to the base and a constant scale factor.\nThe mask operand (the third operand) specifies the conditional load operation from each memory address and the corresponding update of each data element of the destination operand (the first operand). Conditionality is speci-fied by the most significant bit of each data element of the mask register. If an element’s mask bit is not set, the corresponding element of the destination register is left unchanged. The width of data element in the destination register and mask register are identical. The entire mask register will be set to zero by this instruction unless the instruction causes an exception.\nUsing qword indices, the instruction conditionally loads up to 2 or 4 single-precision floating-point values from the VSIB addressing memory operand, and updates the lower half of the destination register. The upper 128 or 256 bits of the destination register are zero’ed with qword indices.\nThis instruction can be suspended by an exception if at least one element is already gathered (i.e., if the exception is triggered by an element other than the rightmost one with its mask bit set). When this happens, the destination register and the mask operand are partially updated; those elements that have been gathered are placed into the destination register and have their mask bits set to zero. If any traps or interrupts are pending from already gath-ered elements, they will be delivered in lieu of the exception; in this case, EFLAG.RF is set to one so an instruction breakpoint is not re-triggered when the instruction is continued.\nIf the data size and index size are different, part of the destination register and part of the mask register do not correspond to any elements being gathered. This instruction sets those parts to zero. It may do this to one or both of those registers even if the instruction triggers an exception, and even if the instruction triggers the exception before gathering any elements.\nVEX.128 version: For dword indices, the instruction will gather four single-precision floating-point values. For qword indices, the instruction will gather two values and zeroes the upper 64 bits of the destination.\nVEX.256 version: For dword indices, the instruction will gather eight single-precision floating-point values. For qword indices, the instruction will gather four values and zeroes the upper 128 bits of the destination.\nNote that:\n" }, { "Name": "VINSERTF128", "Alias": [], "Brief": "Insert Packed Floating", "Description": "\nPerforms an insertion of 128-bits of packed floating-point values from the second source operand (third operand) into an the destination operand (first operand) at an 128-bit offset from imm8[0]. The remaining portions of the destination are written by the corresponding fields of the first source operand (second operand). The second source operand can be either an XMM register or a 128-bit memory location.\nThe high 7 bits of the immediate are ignored.\n" }, { "Name": "VINSERTI128", "Alias": [], "Brief": "Insert Packed Integer Values", "Description": "\nPerforms an insertion of 128-bits of packed integer data from the second source operand (third operand) into an the destination operand (first operand) at a 128-bit offset from imm8[0]. The remaining portions of the destination are written by the corresponding fields of the first source operand (second operand). The second source operand can be either an XMM register or a 128-bit memory location.\nThe high 7 bits of the immediate are ignored.\nVEX.L must be 1; an attempt to execute this instruction with VEX.L=0 will cause #UD.\n" }, { "Name": "VMASKMOV", "Alias": [], "Brief": "Conditional SIMD Packed Loads and Stores", "Description": "\nConditionally moves packed data elements from the second source operand into the corresponding data element of the destination operand, depending on the mask bits associated with each data element. The mask bits are specified in the first source operand.\nThe mask bit for each data element is the most significant bit of that element in the first source operand. If a mask is 1, the corresponding data element is copied from the second source operand to the destination operand. If the mask is 0, the corresponding data element is set to zero in the load form of these instructions, and unmodified in the store form.\nThe second source operand is a memory address for the load form of these instruction. The destination operand is a memory address for the store form of these instructions. The other operands are both XMM registers (for VEX.128 version) or YMM registers (for VEX.256 version).\nFaults occur only due to mask-bit required memory accesses that caused the faults. Faults will not occur due to referencing any memory location if the corresponding mask bit for that memory location is 0. For example, no faults will be detected if the mask bits are all zero.\nUnlike previous MASKMOV instructions (MASKMOVQ and MASKMOVDQU), a nontemporal hint is not applied to these instructions.\nInstruction behavior on alignment check reporting with mask bits of less than all 1s are the same as with mask bits of all 1s.\nVMASKMOV should not be used to access memory mapped I/O and un-cached memory as the access and the ordering of the individual loads or stores it does is implementation specific.\nIn cases where mask bits indicate data should not be loaded or stored paging A and D bits will be set in an imple-mentation dependent way. However, A and D bits are always set for pages where data is actually loaded/stored.\nNote: for load forms, the first source (the mask) is encoded in VEX.vvvv; the second source is encoded in rm_field, and the destination register is encoded in reg_field.\nNote: for store forms, the first source (the mask) is encoded in VEX.vvvv; the second source register is encoded in reg_field, and the destination memory location is encoded in rm_field.\n" }, { "Name": "VPBLENDD", "Alias": [], "Brief": "Blend Packed Dwords", "Description": "\nDword elements from the source operand (second operand) are conditionally written to the destination operand (first operand) depending on bits in the immediate operand (third operand). The immediate bits (bits 7:0) form a mask that determines whether the corresponding word in the destination is copied from the source. If a bit in the mask, corresponding to a word, is “1\", then the word is copied, else the word is unchanged.\nVEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "VPBROADCAST", "Alias": [], "Brief": "Broadcast Integer Data", "Description": "\nLoad integer data from the source operand (second operand) and broadcast to all elements of the destination operand (first operand).\nThe destination operand is a YMM register. The source operand is 8-bit, 16-bit 32-bit, 64-bit memory location or the low 8-bit, 16-bit 32-bit, 64-bit data in an XMM register. VPBROADCASTB/D/W/Q also support XMM register as the source operand.\nVBROADCASTI128: The destination operand is a YMM register. The source operand is 128-bit memory location. Register source encodings for VBROADCASTI128 are reserved and will #UD.\nVPBROADCASTB/W/D/Q is supported in both 128-bit and 256-bit wide versions.\nVBROADCASTI128 is only supported as a 256-bit wide version.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. Attempts to execute any VPBROADCAST* instruction with VEX.W = 1 will cause #UD. If VBROADCASTI128 is encoded with VEX.L= 0, an attempt to execute the instruction encoded with VEX.L= 0 will cause an #UD exception.\nX0\nm32\nDEST\nX0\nX0\nX0\nX0\nX0\nX0\nX0\nX0\nFigure 4-33. VPBROADCASTD Operation (VEX.256 encoded version)\nX0\nm32\nDEST\n0\n0\n0\n0\nX0\nX0\nX0\nX0\nFigure 4-34. VPBROADCASTD Operation (128-bit version)\nm64\nX0\nDEST\nX0\nX0\nX0\nX0\nFigure 4-35. VPBROADCASTQ Operation\nm128i\nX0\n\n\nX0\nDEST\nX0\nFigure 4-36. VBROADCASTI128 Operation\n" }, { "Name": "VPERM2F128", "Alias": [], "Brief": "Permute Floating", "Description": "\nPermute 128 bit floating-point-containing fields from the first source operand (second operand) and second source operand (third operand) using bits in the 8-bit immediate and store results in the destination operand (first operand). The first source operand is a YMM register, the second source operand is a YMM register or a 256-bit memory location, and the destination operand is a YMM register.\nY1\nY0\nSRC2\nX1\nX0\nSRC1\nX0, X1, Y0, or Y1\nDEST\nX0, X1, Y0, or Y1\nFigure 4-42. VPERM2F128 Operation\nImm8[1:0] select the source for the first destination 128-bit field, imm8[5:4] select the source for the second destination field. If imm8[3] is set, the low 128-bit field is zeroed. If imm8[7] is set, the high 128-bit field is zeroed.\nVEX.L must be 1, otherwise the instruction will #UD.\n" }, { "Name": "VPERM2I128", "Alias": [], "Brief": "Permute Integer Values", "Description": "\nPermute 128 bit integer data from the first source operand (second operand) and second source operand (third operand) using bits in the 8-bit immediate and store results in the destination operand (first operand). The first source operand is a YMM register, the second source operand is a YMM register or a 256-bit memory location, and the destination operand is a YMM register.\nY1\nY0\nSRC2\nX1\nX0\nSRC1\nX0, X1, Y0, or Y1\nDEST\nX0, X1, Y0, or Y1\nFigure 4-37. VPERM2I128 Operation\nImm8[1:0] select the source for the first destination 128-bit field, imm8[5:4] select the source for the second destination field. If imm8[3] is set, the low 128-bit field is zeroed. If imm8[7] is set, the high 128-bit field is zeroed.\nVEX.L must be 1, otherwise the instruction will #UD.\n" }, { "Name": "VPERMD", "Alias": [], "Brief": "Full Doublewords Element Permutation", "Description": "\nUse the index values in each dword element of the first source operand (the second operand) to select a dword element in the second source operand (the third operand), the resultant dword value from the second source operand is copied to the destination operand (the first operand) in the corresponding position of the index element. Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.\nAn attempt to execute VPERMD encoded with VEX.L= 0 will cause an #UD exception.\n" }, { "Name": "VPERMILPD", "Alias": [], "Brief": "Permute Double", "Description": "\nPermute double-precision floating-point values in the first source operand (second operand) using 8-bit control fields in the low bytes of the second source operand (third operand) and store results in the destination operand (first operand). The first source operand is a YMM register, the second source operand is a YMM register or a 256-bit memory location, and the destination operand is a YMM register.\nX3\nX2\nX1\nX0\nSRC1\nX2..X3\nX2..X3\nX0..X1\nDEST\nFigure 4-38. VPERMILPD operation\nThere is one control byte per destination double-precision element. Each control byte is aligned with the low 8 bits of the corresponding double-precision destination element. Each control byte contains a 1-bit select field (see Figure 4-39) that determines which of the source elements are selected. Source elements are restricted to lie in the same source 128-bit region as the destination.\nBit\n66\n1\n194 193\n65\n2\n127\n255\n63\n. . .\nd\nd\nd\nignored\nignored\nignored\nsel\nsel\nsel\ne\ne\ne\nr\nr\nr\no\no\no\nn\nn\nn\ng\ng\ng\ni\ni\ni\nControl Field 4\nControl Field 2\nControl Field1\nFigure 4-39. VPERMILPD Shuffle Control\n(immediate control version)\nPermute double-precision floating-point values in the first source operand (second operand) using two, 1-bit control fields in the low 2 bits of the 8-bit immediate and store results in the destination operand (first operand). The source operand is a YMM register or 256-bit memory location and the destination operand is a YMM register.\nNote: For the VEX.128.66.0F3A 05 instruction version, VEX.vvvv is reserved and must be 1111b otherwise instruc-tion will #UD.\nNote: For the VEX.256.66.0F3A 05 instruction version, VEX.vvvv is reserved and must be 1111b otherwise instruc-tion will #UD.\n" }, { "Name": "VPERMILPS", "Alias": [], "Brief": "Permute Single", "Description": "\n(variable control version)\nPermute single-precision floating-point values in the first source operand (second operand) using 8-bit control fields in the low bytes of corresponding elements the shuffle control (third operand) and store results in the desti-nation operand (first operand). The first source operand is a YMM register, the second source operand is a YMM register or a 256-bit memory location, and the destination operand is a YMM register.\n\n\n\n\n\n\n\n\n\nX7\nX6\nX5\nX4\nX3\nX2\nX1\nX0\nSRC1\nDEST\nX7 .. X4\nX7 .. X4\nX7 .. X4\nX7 .. X4\nX3 ..X0\nX3 ..X0\nX3 .. X0\nX3 .. X0\nFigure 4-40. VPERMILPS Operation\nThere is one control byte per destination single-precision element. Each control byte is aligned with the low 8 bits of the corresponding single-precision destination element. Each control byte contains a 2-bit select field (see Figure 4-41) that determines which of the source elements are selected. Source elements are restricted to lie in the same source 128-bit region as the destination.\nBit\n31\n226\n225 224\n63\n34\n33 32\n1\n0\n255\n. . .\n\n\n\nignored\nsel\nignored\nignored\nsel\nsel\nControl Field 7\nControl Field 2\nControl Field 1\nFigure 4-41. VPERMILPS Shuffle Control\n(immediate control version)\nPermute single-precision floating-point values in the first source operand (second operand) using four 2-bit control fields in the 8-bit immediate and store results in the destination operand (first operand). The source operand is a YMM register or 256-bit memory location and the destination operand is a YMM register. This is similar to a wider version of PSHUFD, just operating on single-precision floating-point values.\nNote: For the VEX.128.66.0F3A 04 instruction version, VEX.vvvv is reserved and must be 1111b otherwise instruc-tion will #UD.\nNote: For the VEX.256.66.0F3A 04 instruction version, VEX.vvvv is reserved and must be 1111b otherwise instruc-tion will #UD.\n" }, { "Name": "VPERMPD", "Alias": [], "Brief": "Permute Double", "Description": "\nUse two-bit index values in the immediate byte to select a double-precision floating-point element in the source operand; the resultant data from the source operand is copied to the corresponding element of the destination operand in the order of the index field. Note that this instruction permits a qword in the source operand to be copied to multiple location in the destination operand.\nAn attempt to execute VPERMPD encoded with VEX.L= 0 will cause an #UD exception.\n" }, { "Name": "VPERMPS", "Alias": [], "Brief": "Permute Single", "Description": "\nUse the index values in each dword element of the first source operand (the second operand) to select a single-precision floating-point element in the second source operand (the third operand), the resultant data from the second source operand is copied to the destination operand (the first operand) in the corresponding position of the index element. Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.\nAn attempt to execute VPERMPS encoded with VEX.L= 0 will cause an #UD exception.\n" }, { "Name": "VPERMQ", "Alias": [], "Brief": "Qwords Element Permutation", "Description": "\nUse two-bit index values in the immediate byte to select a qword element in the source operand, the resultant qword value from the source operand is copied to the corresponding element of the destination operand in the order of the index field. Note that this instruction permits a qword in the source operand to be copied to multiple locations in the destination operand.\nAn attempt to execute VPERMQ encoded with VEX.L= 0 will cause an #UD exception.\n" }, { "Name": "VPGATHERDD", "Alias": [ "VPGATHERQD " ], "Brief": "Gather Packed Dword Values Using Signed Dword/Qword Indices", "Description": "\nThe instruction conditionally loads up to 4 or 8 dword values from memory addresses specified by the memory operand (the second operand) and using dword indices. The memory operand uses the VSIB form of the SIB byte to specify a general purpose register operand as the common base, a vector register for an array of indices relative to the base and a constant scale factor.\nThe mask operand (the third operand) specifies the conditional load operation from each memory address and the corresponding update of each data element of the destination operand (the first operand). Conditionality is speci-fied by the most significant bit of each data element of the mask register. If an element’s mask bit is not set, the corresponding element of the destination register is left unchanged. The width of data element in the destination register and mask register are identical. The entire mask register will be set to zero by this instruction unless the instruction causes an exception.\nUsing qword indices, the instruction conditionally loads up to 2 or 4 dword values from the VSIB addressing memory operand, and updates the lower half of the destination register. The upper 128 or 256 bits of the destina-tion register are zero’ed with qword indices.\nThis instruction can be suspended by an exception if at least one element is already gathered (i.e., if the exception is triggered by an element other than the rightmost one with its mask bit set). When this happens, the destination register and the mask operand are partially updated; those elements that have been gathered are placed into the destination register and have their mask bits set to zero. If any traps or interrupts are pending from already gath-ered elements, they will be delivered in lieu of the exception; in this case, EFLAG.RF is set to one so an instruction breakpoint is not re-triggered when the instruction is continued.\nIf the data size and index size are different, part of the destination register and part of the mask register do not correspond to any elements being gathered. This instruction sets those parts to zero. It may do this to one or both of those registers even if the instruction triggers an exception, and even if the instruction triggers the exception before gathering any elements.\nVEX.128 version: For dword indices, the instruction will gather four dword values. For qword indices, the instruc-tion will gather two values and zeroes the upper 64 bits of the destination.\nVEX.256 version: For dword indices, the instruction will gather eight dword values. For qword indices, the instruc-tion will gather four values and zeroes the upper 128 bits of the destination.\nNote that:\n" }, { "Name": "VPGATHERDQ", "Alias": [ "VPGATHERQQ " ], "Brief": "Gather Packed Qword Values Using Signed Dword/Qword Indices", "Description": "\nThe instruction conditionally loads up to 2 or 4 qword values from memory addresses specified by the memory operand (the second operand) and using qword indices. The memory operand uses the VSIB form of the SIB byte to specify a general purpose register operand as the common base, a vector register for an array of indices relative to the base and a constant scale factor.\nThe mask operand (the third operand) specifies the conditional load operation from each memory address and the corresponding update of each data element of the destination operand (the first operand). Conditionality is speci-fied by the most significant bit of each data element of the mask register. If an element’s mask bit is not set, the corresponding element of the destination register is left unchanged. The width of data element in the destination register and mask register are identical. The entire mask register will be set to zero by this instruction unless the instruction causes an exception.\nUsing dword indices in the lower half of the mask register, the instruction conditionally loads up to 2 or 4 qword values from the VSIB addressing memory operand, and updates the destination register.\nThis instruction can be suspended by an exception if at least one element is already gathered (i.e., if the exception is triggered by an element other than the rightmost one with its mask bit set). When this happens, the destination register and the mask operand are partially updated; those elements that have been gathered are placed into the destination register and have their mask bits set to zero. If any traps or interrupts are pending from already gath-ered elements, they will be delivered in lieu of the exception; in this case, EFLAG.RF is set to one so an instruction breakpoint is not re-triggered when the instruction is continued.\nIf the data size and index size are different, part of the destination register and part of the mask register do not correspond to any elements being gathered. This instruction sets those parts to zero. It may do this to one or both of those registers even if the instruction triggers an exception, and even if the instruction triggers the exception before gathering any elements.\nVEX.128 version: The instruction will gather two qword values. For dword indices, only the lower two indices in the vector index register are used.\nVEX.256 version: The instruction will gather four qword values. For dword indices, only the lower four indices in the vector index register are used.\nNote that:\n" }, { "Name": "VPMASKMOV", "Alias": [], "Brief": "Conditional SIMD Integer Packed Loads and Stores", "Description": "\nConditionally moves packed data elements from the second source operand into the corresponding data element of the destination operand, depending on the mask bits associated with each data element. The mask bits are speci-fied in the first source operand.\nThe mask bit for each data element is the most significant bit of that element in the first source operand. If a mask is 1, the corresponding data element is copied from the second source operand to the destination operand. If the mask is 0, the corresponding data element is set to zero in the load form of these instructions, and unmodified in the store form.\nThe second source operand is a memory address for the load form of these instructions. The destination operand is a memory address for the store form of these instructions. The other operands are either XMM registers (for VEX.128 version) or YMM registers (for VEX.256 version).\nFaults occur only due to mask-bit required memory accesses that caused the faults. Faults will not occur due to referencing any memory location if the corresponding mask bit for that memory location is 0. For example, no faults will be detected if the mask bits are all zero.\nUnlike previous MASKMOV instructions (MASKMOVQ and MASKMOVDQU), a nontemporal hint is not applied to these instructions.\nInstruction behavior on alignment check reporting with mask bits of less than all 1s are the same as with mask bits of all 1s.\nVMASKMOV should not be used to access memory mapped I/O as the ordering of the individual loads or stores it does is implementation specific.\nIn cases where mask bits indicate data should not be loaded or stored paging A and D bits will be set in an imple-mentation dependent way. However, A and D bits are always set for pages where data is actually loaded/stored.\nNote: for load forms, the first source (the mask) is encoded in VEX.vvvv; the second source is encoded in rm_field, and the destination register is encoded in reg_field.\nNote: for store forms, the first source (the mask) is encoded in VEX.vvvv; the second source register is encoded in reg_field, and the destination memory location is encoded in rm_field.\n" }, { "Name": "VPSLLVD", "Alias": [ "VPSLLVQ " ], "Brief": "Variable Bit Shift Left Logical", "Description": "\nShifts the bits in the individual data elements (doublewords, or quadword) in the first source operand to the left by the count value of respective data elements in the second source operand. As the bits in the data elements are shifted left, the empty low-order bits are cleared (set to 0).\nThe count values are specified individually in each data element of the second source operand. If the unsigned integer value specified in the respective data element of the second source operand is greater than 31 (for double-words), or 63 (for a quadword), then the destination data element are written with 0.\nVEX.128 encoded version: The destination and first source operands are XMM registers. The count operand can be either an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The destination and first source operands are YMM registers. The count operand can be either an YMM register or a 256-bit memory location.\n" }, { "Name": "VPSRAVD", "Alias": [], "Brief": "Variable Bit Shift Right Arithmetic", "Description": "\nShifts the bits in the individual doubleword data elements in the first source operand to the right by the count value of respective data elements in the second source operand. As the bits in each data element are shifted right, the empty high-order bits are filled with the sign bit of the source element.\nThe count values are specified individually in each data element of the second source operand. If the unsigned integer value specified in the respective data element of the second source operand is greater than 31, then the destination data element are filled with the corresponding sign bit of the source element.\nVEX.128 encoded version: The destination and first source operands are XMM registers. The count operand can be either an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The destination and first source operands are YMM registers. The count operand can be either an YMM register or a 256-bit memory location.\n" }, { "Name": "VPSRLVD", "Alias": [ "VPSRLVQ " ], "Brief": "Variable Bit Shift Right Logical", "Description": "\nShifts the bits in the individual data elements (doublewords, or quadword) in the first source operand to the right by the count value of respective data elements in the second source operand. As the bits in the data elements are shifted right, the empty high-order bits are cleared (set to 0).\nThe count values are specified individually in each data element of the second source operand. If the unsigned integer value specified in the respective data element of the second source operand is greater than 31 (for double-words), or 63 (for a quadword), then the destination data element are written with 0.\nVEX.128 encoded version: The destination and first source operands are XMM registers. The count operand can be either an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM register are zeroed.\nVEX.256 encoded version: The destination and first source operands are YMM registers. The count operand can be either an YMM register or a 256-bit memory location.\n" }, { "Name": "VTESTPD", "Alias": [ "VTESTPS" ], "Brief": "Packed Bit Test", "Description": "\nVTESTPS performs a bitwise comparison of all the sign bits of the packed single-precision elements in the first source operation and corresponding sign bits in the second source operand. If the AND of the source sign bits with the dest sign bits produces all zeros, the ZF is set else the ZF is clear. If the AND of the source sign bits with the inverted dest sign bits produces all zeros the CF is set else the CF is clear. An attempt to execute VTESTPS with VEX.W=1 will cause #UD.\nVTESTPD performs a bitwise comparison of all the sign bits of the double-precision elements in the first source operation and corresponding sign bits in the second source operand. If the AND of the source sign bits with the dest sign bits produces all zeros, the ZF is set else the ZF is clear. If the AND the source sign bits with the inverted dest sign bits produces all zeros the CF is set else the CF is clear. An attempt to execute VTESTPS with VEX.W=1 will cause #UD.\nThe first source register is specified by the ModR/M reg field.\n128-bit version: The first source register is an XMM register. The second source register can be an XMM register or a 128-bit memory location. The destination register is not modified.\nVEX.256 encoded version: The first source register is a YMM register. The second source register can be a YMM register or a 256-bit memory location. The destination register is not modified.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n" }, { "Name": "VZEROALL", "Alias": [], "Brief": "Zero All YMM Registers", "Description": "\nThe instruction zeros contents of all XMM or YMM registers.\nNote: VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. In Compatibility and legacy 32-bit mode only the lower 8 registers are modified.\n" }, { "Name": "VZEROUPPER", "Alias": [], "Brief": "Zero Upper Bits of YMM Registers", "Description": "\nThe instruction zeros the bits in position 128 and higher of all YMM registers. The lower 128-bits of the registers (the corresponding XMM registers) are unmodified.\nThis instruction is recommended when transitioning between AVX and legacy SSE code - it will eliminate perfor-mance penalties caused by false dependencies.\nNote: VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. In Compatibility and legacy 32-bit mode only the lower 8 registers are modified.\n" }, { "Name": "WBINVD", "Alias": [], "Brief": "Write Back and Invalidate Cache", "Description": "\nWrites back all modified cache lines in the processor’s internal cache to main memory and invalidates (flushes) the internal caches. The instruction then issues a special-function bus cycle that directs external caches to also write back modified data and another bus cycle to indicate that the external caches should be invalidated.\nAfter executing this instruction, the processor does not wait for the external caches to complete their write-back and flushing operations before proceeding with instruction execution. It is the responsibility of hardware to respond to the cache write-back and flush signals. The amount of time or cycles for WBINVD to complete will vary due to size and other factors of different cache hierarchies. As a consequence, the use of the WBINVD instruction can have an impact on logical processor interrupt/event response time. Additional information of WBINVD behavior in a cache hierarchy with hierarchical sharing topology can be found in Chapter 2 of the Intel® 64 and IA-32 Architec-tures Software Developer’s Manual, Volume 3A.\nThe WBINVD instruction is a privileged instruction. When the processor is running in protected mode, the CPL of a program or procedure must be 0 to execute this instruction. This instruction is also a serializing instruction (see “Serializing Instructions” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A).\nIn situations where cache coherency with main memory is not a concern, software can use the INVD instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n" }, { "Name": "WRFSBASE", "Alias": [ "WRGSBASE" ], "Brief": "Write FS/GS Segment Base", "Description": "\nLoads the FS or GS segment base address with the general-purpose register indicated by the modR/M:r/m field.\nThe source operand may be either a 32-bit or a 64-bit general-purpose register. The REX.W prefix indicates the operand size is 64 bits. If no REX.W prefix is used, the operand size is 32 bits; the upper 32 bits of the source register are ignored and upper 32 bits of the base address (for FS or GS) are cleared.\nThis instruction is supported only in 64-bit mode.\n" }, { "Name": "WRMSR", "Alias": [], "Brief": "Write to Model Specific Register", "Description": "\nWrites the contents of registers EDX:EAX into the 64-bit model specific register (MSR) specified in the ECX register. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The contents of the EDX register are copied to high-order 32 bits of the selected MSR and the contents of the EAX register are copied to low-order 32 bits of the MSR. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are ignored.) Undefined or reserved bits in an MSR should be set to values previously read.\nThis instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) is generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception. The processor will also generate a general protection exception if software attempts to write to bits in a reserved MSR.\nWhen the WRMSR instruction is used to write to an MTRR, the TLBs are invalidated. This includes global entries (see “Translation Lookaside Buffers (TLBs)” in Chapter 3 of the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 3A).\nMSRs control functions for testability, execution tracing, performance-monitoring and machine check errors. Chapter 35, “Model-Specific Registers (MSRs)”, in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, lists all MSRs that can be written with this instruction and their addresses. Note that each processor family has its own set of MSRs.\nThe WRMSR instruction is a serializing instruction (see “Serializing Instructions” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A). Note that WRMSR to the IA32_TSC_DEADLINE MSR (MSR index 6E0H) and the X2APIC MSRs (MSR indices 802H to 83FH) are not serializing.\nThe CPUID instruction should be used to determine whether MSRs are supported (CPUID.01H:EDX[5] = 1) before using this instruction.\n" }, { "Name": "XABORT", "Alias": [], "Brief": "Transactional Abort", "Description": "\nXABORT forces an RTM abort. Following an RTM abort, the logical processor resumes execution at the fallback address computed through the outermost XBEGIN instruction. The EAX register is updated to reflect an XABORT instruction caused the abort, and the imm8 argument will be provided in bits 31:24 of EAX.\n" }, { "Name": "XACQUIRE", "Alias": [ "XRELEASE " ], "Brief": "Hardware Lock Elision Prefix Hints", "Description": "\nThe XACQUIRE prefix is a hint to start lock elision on the memory address specified by the instruction and the XRELEASE prefix is a hint to end lock elision on the memory address specified by the instruction.\nThe XACQUIRE prefix hint can only be used with the following instructions (these instructions are also referred to as XACQUIRE-enabled when used with the XACQUIRE prefix):\nThe XRELEASE prefix hint can only be used with the following instructions (also referred to as XRELEASE-enabled when used with the XRELEASE prefix):\nThe lock variables must satisfy the guidelines described in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, Section 15.3.3, for elision to be successful, otherwise an HLE abort may be signaled.\nIf an encoded byte sequence that meets XACQUIRE/XRELEASE requirements includes both prefixes, then the HLE semantic is determined by the prefix byte that is placed closest to the instruction opcode. For example, an F3F2C6 will not be treated as a XRELEASE-enabled instruction since the F2H (XACQUIRE) is closest to the instruction opcode C6. Similarly, an F2F3F0 prefixed instruction will be treated as a XRELEASE-enabled instruction since F3H (XRELEASE) is closest to the instruction opcode.\nIntel 64 and IA-32 Compatibility\nThe effect of the XACQUIRE/XRELEASE prefix hint is the same in non-64-bit modes and in 64-bit mode.\nFor instructions that do not support the XACQUIRE hint, the presence of the F2H prefix behaves the same way as prior hardware, according to\nFor instructions that do not support the XRELEASE hint, the presence of the F3H prefix behaves the same way as in prior hardware, according to\n" }, { "Name": "XADD", "Alias": [], "Brief": "Exchange and Add", "Description": "\nExchanges the first operand (destination operand) with the second operand (source operand), then loads the sum of the two values into the destination operand. The destination operand can be a register or a memory location; the source operand is a register.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\n" }, { "Name": "XBEGIN", "Alias": [], "Brief": "Transactional Begin", "Description": "\nThe XBEGIN instruction specifies the start of an RTM code region. If the logical processor was not already in trans-actional execution, then the XBEGIN instruction causes the logical processor to transition into transactional execu-tion. The XBEGIN instruction that transitions the logical processor into transactional execution is referred to as the outermost XBEGIN instruction. The instruction also specifies a relative offset to compute the address of the fallback code path following a transactional abort.\nOn an RTM abort, the logical processor discards all architectural register and memory updates performed during the RTM execution and restores architectural state to that corresponding to the outermost XBEGIN instruction. The fallback address following an abort is computed from the outermost XBEGIN instruction.\n" }, { "Name": "XCHG", "Alias": [], "Brief": "Exchange Register/Memory with Register", "Description": "\nExchanges the contents of the destination (first) and source (second) operands. The operands can be two general-purpose registers or a register and a memory location. If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. (See the LOCK prefix description in this chapter for more information on the locking protocol.)\nThis instruction is useful for implementing semaphores or similar data structures for process synchronization. (See “Bus Locking” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information on bus locking.)\nThe XCHG instruction can also be used instead of the BSWAP instruction for 16-bit operands.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "XEND", "Alias": [], "Brief": "Transactional End", "Description": "\nThe instruction marks the end of an RTM code region. If this corresponds to the outermost scope (that is, including this XEND instruction, the number of XBEGIN instructions is the same as number of XEND instructions), the logical processor will attempt to commit the logical processor state atomically. If the commit fails, the logical processor will rollback all architectural register and memory updates performed during the RTM execution. The logical processor will resume execution at the fallback address computed from the outermost XBEGIN instruction. The EAX register is updated to reflect RTM abort information.\nXEND executed outside a transactional region will cause a #GP (General Protection Fault).\n" }, { "Name": "XGETBV", "Alias": [], "Brief": "Get Value of Extended Control Register", "Description": "\nReads the contents of the extended control register (XCR) specified in the ECX register into registers EDX:EAX. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The EDX register is loaded with the high-order 32 bits of the XCR and the EAX register is loaded with the low-order 32 bits. (On proces-sors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.) If fewer than 64 bits are implemented in the XCR being read, the values returned to EDX:EAX in unimplemented bit loca-tions are undefined.\nXCR0 is supported on any processor that supports the XGETBV instruction. If CPUID.(EAX=0DH,ECX=1):EAX.XG1[bit 2] = 1, executing XGETBV with ECX = 1 returns in EDX:EAX the logical-AND of XCR0 and the current value of the XINUSE state-component bitmap. This allows software to discover the state of the init optimization used by XSAVEOPT and XSAVES. See Chapter 13, “Managing State Using the XSAVE Feature Set‚” in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nUse of any other value for ECX results in a general-protection (#GP) exception.\n" }, { "Name": "XLAT", "Alias": [ "XLATB" ], "Brief": "Table Look", "Description": "\nLocates a byte entry in a table in memory, using the contents of the AL register as a table index, then copies the contents of the table entry back into the AL register. The index in the AL register is treated as an unsigned integer. The XLAT and XLATB instructions get the base address of the table in memory from either the DS:EBX or the DS:BX registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). (The DS segment may be overridden with a segment override prefix.)\nAt the assembly-code level, two forms of this instruction are allowed: the “explicit-operand” form and the “no-operand” form. The explicit-operand form (specified with the XLAT mnemonic) allows the base address of the table to be specified explicitly with a symbol. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the symbol does not have to specify the correct base address. The base address is always specified by the DS:(E)BX registers, which must be loaded correctly before the XLAT instruction is executed.\nThe no-operands form (XLATB) provides a “short form” of the XLAT instructions. Here also the processor assumes that the DS:(E)BX registers contain the base address of the table.\nIn 64-bit mode, operation is similar to that in legacy or compatibility mode. AL is used to specify the table index (the operand size is fixed at 8 bits). RBX, however, is used to specify the table’s base address. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "XOR", "Alias": [], "Brief": "Logical Exclusive OR", "Description": "\nPerforms a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory oper-ands cannot be used in one instruction.) Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n" }, { "Name": "XORPD", "Alias": [], "Brief": "Bitwise Logical XOR for Double", "Description": "\nPerforms a bitwise logical exclusive-OR of the two packed double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the result in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "XORPS", "Alias": [], "Brief": "Bitwise Logical XOR for Single", "Description": "\nPerforms a bitwise logical exclusive-OR of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the result in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n" }, { "Name": "XRSTOR", "Alias": [], "Brief": "Restore Processor Extended States", "Description": "\nPerforms a full or partial restore of processor state components from the XSAVE area located at the memory address specified by the source operand. The implicit EDX:EAX register pair specifies a 64-bit instruction mask. The specific state components restored correspond to the bits set in the requested-feature bitmap (RFBM), which is the logical-AND of EDX:EAX and XCR0.\nThe format of the XSAVE area is detailed in Section 13.4, “XSAVE Area,” of Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1.\nSection 13.7, “Operation of XRSTOR,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 provides a detailed description of the operation of the XRSTOR instruction. The following items provide a high-level outline:\nfor which RFBM[i] = 0.\ni\nUse of a source operand not aligned to 64-byte boundary (for 64-bit and 32-bit modes) results in a general-protec-tion (#GP) exception. In 64-bit mode, the upper 32 bits of RDX and RAX are ignored.\n" }, { "Name": "XRSTORS", "Alias": [], "Brief": "Restore Processor Extended States Supervisor", "Description": "\nPerforms a full or partial restore of processor state components from the XSAVE area located at the memory address specified by the source operand. The implicit EDX:EAX register pair specifies a 64-bit instruction mask. The specific state components restored correspond to the bits set in the requested-feature bitmap (RFBM), which is the logical-AND of EDX:EAX and the logical-OR of XCR0 with the IA32_XSS MSR. XRSTORS may be executed only if CPL = 0.\nThe format of the XSAVE area is detailed in Section 13.4, “XSAVE Area,” of Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1.\nSection 13.11, “Operation of XRSTORS,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 provides a detailed description of the operation of the XRSTOR instruction. The following items provide a high-level outline:\nfor which RFBM[i] = 0.\ni\nUse of a source operand not aligned to 64-byte boundary (for 64-bit and 32-bit modes) results in a general-protec-tion (#GP) exception. In 64-bit mode, the upper 32 bits of RDX and RAX are ignored.\n" }, { "Name": "XSAVE", "Alias": [], "Brief": "Save Processor Extended States", "Description": "\nPerforms a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand. The implicit EDX:EAX register pair specifies a 64-bit instruction mask. The specific state components saved correspond to the bits set in the requested-feature bitmap (RFBM), which is the logical-AND of EDX:EAX and XCR0.\nThe format of the XSAVE area is detailed in Section 13.4, “XSAVE Area,” of Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1.\nSection 13.6, “Operation of XSAVE,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 provides a detailed description of the operation of the XSAVE instruction. The following items provide a high-level outline:\nUse of a destination operand not aligned to 64-byte boundary (in either 64-bit or 32-bit modes) results in a general-protection (#GP) exception. In 64-bit mode, the upper 32 bits of RDX and RAX are ignored.\n" }, { "Name": "XSAVEC", "Alias": [], "Brief": "Save Processor Extended States with Compaction", "Description": "\nPerforms a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand. The implicit EDX:EAX register pair specifies a 64-bit instruction mask. The specific state components saved correspond to the bits set in the requested-feature bitmap (RFBM), which is the logical-AND of EDX:EAX and XCR0.\nThe format of the XSAVE area is detailed in Section 13.4, “XSAVE Area,” of Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1.\nSection 13.9, “Operation of XSAVEC,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 provides a detailed description of the operation of the XSAVEC instruction. The following items provide a high-level outline:\nUse of a destination operand not aligned to 64-byte boundary (in either 64-bit or 32-bit modes) results in a general-protection (#GP) exception. In 64-bit mode, the upper 32 bits of RDX and RAX are ignored.\n" }, { "Name": "XSAVEOPT", "Alias": [], "Brief": "Save Processor Extended States Optimized", "Description": "\nPerforms a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand. The implicit EDX:EAX register pair specifies a 64-bit instruction mask. The specific state components saved correspond to the bits set in the requested-feature bitmap (RFBM), which is the logical-AND of EDX:EAX and XCR0.\nThe format of the XSAVE area is detailed in Section 13.4, “XSAVE Area,” of Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1.\nSection 13.8, “Operation of XSAVEOPT,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 provides a detailed description of the operation of the XSAVEOPT instruction. The following items provide a high-level outline:\nUse of a destination operand not aligned to 64-byte boundary (in either 64-bit or 32-bit modes) will result in a general-protection (#GP) exception. In 64-bit mode, the upper 32 bits of RDX and RAX are ignored.\n" }, { "Name": "XSAVES", "Alias": [], "Brief": "Save Processor Extended States Supervisor", "Description": "\nPerforms a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand. The implicit EDX:EAX register pair specifies a 64-bit instruction mask. The specific state components saved correspond to the bits set in the requested-feature bitmap (RFBM), the logical-AND of EDX:EAX and the logical-OR of XCR0 with the IA32_XSS MSR. XSAVES may be executed only if CPL = 0.\nThe format of the XSAVE area is detailed in Section 13.4, “XSAVE Area,” of Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1.\nSection 13.10, “Operation of XSAVES,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 provides a detailed description of the operation of the XSAVES instruction. The following items provide a high-level outline:\nUse of a destination operand not aligned to 64-byte boundary (in either 64-bit or 32-bit modes) results in a general-protection (#GP) exception. In 64-bit mode, the upper 32 bits of RDX and RAX are ignored.\n1.\nThere is an exception for state component 1 (SSE). MXCSR is part of SSE state, but XINUSE[1] may be 0 even if MXCSR does not have its initial value of 1F80H. In this case, the init optimization does not apply and XSAVEC will save SSE state as long as RFBM[1] = 1 and the modified optimization is not being applied.\n2.\nThere is an exception for state component 1 (SSE). MXCSR is part of SSE state, but XINUSE[1] may be 0 even if MXCSR does not have its initial value of 1F80H. In this case, XSAVES sets XSTATE_BV[1] to 1 as long as RFBM[1] = 1.\n" }, { "Name": "XSETBV", "Alias": [], "Brief": "Set Extended Control Register", "Description": "\nWrites the contents of registers EDX:EAX into the 64-bit extended control register (XCR) specified in the ECX register. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The contents of the EDX register are copied to high-order 32 bits of the selected XCR and the contents of the EAX register are copied to low-order 32 bits of the XCR. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are ignored.) Undefined or reserved bits in an XCR should be set to values previously read.\nThis instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) is generated. Specifying a reserved or unimplemented XCR in ECX will also cause a general protection exception. The processor will also generate a general protection exception if software attempts to write to reserved bits in an XCR.\nCurrently, only XCR0 is supported. Thus, all other values of ECX are reserved and will cause a #GP(0). Note that bit 0 of XCR0 (corresponding to x87 state) must be set to 1; the instruction will cause a #GP(0) if an attempt is made to clear this bit. In addition, the instruction causes a #GP(0) if an attempt is made to set XCR0[2] (AVX state) while clearing XCR0[1] (SSE state); it is necessary to set both bits to use AVX instructions; Section 13.3, “Enabling the XSAVE Feature Set and XSAVE-Supported Features,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\n" }, { "Name": "XTEST", "Alias": [], "Brief": "Test If In Transactional Execution", "Description": "\nThe XTEST instruction queries the transactional execution status. If the instruction executes inside a transaction-ally executing RTM region or a transactionally executing HLE region, then the ZF flag is cleared, else it is set.\n" } ]