// slow vbroadcastss xmm16, dword ptr [rcx] vbroadcastss xmm17, dword ptr [rcx + 4] vbroadcastss xmm18, dword ptr [rcx + 8] vbroadcastss xmm19, dword ptr [rcx + 12] // fast vmovups xmm31, [rcx] vbroadcastss zmm16, xmm31 valignd xmm17, xmm31, xmm31, 1 vbroadcastss zmm17, xmm17 valignd xmm18, xmm31, xmm31, 2 vbroadcastss zmm18, xmm18 valignd xmm19, xmm31, xmm31, 3 vbroadcastss zmm19, xmm19 // commmon vfmadd231ps zmm0, zmm16, [rax + 0] vfmadd231ps zmm1, zmm17, [rax + 64] vfmadd231ps zmm2, zmm18, [rax + 128] vfmadd231ps zmm3, zmm19, [rax + 192] add rcx, 16 add rax, 256