// Tile size: 1x8 // Accumulators: 0-7 // Col regs: 8-14 // Row regs: 15 vmovaps ymm15, [rax] vbroadcastss ymm8, dword ptr [rcx + 0 * 4] vfmadd231ps ymm0, ymm15, ymm8 vbroadcastss ymm9, dword ptr [rcx + 1 * 4] vfmadd231ps ymm1, ymm15, ymm9 vbroadcastss ymm10, dword ptr [rcx + 2 * 4] vfmadd231ps ymm2, ymm15, ymm10 vbroadcastss ymm11, dword ptr [rcx + 3 * 4] vfmadd231ps ymm3, ymm15, ymm11 vbroadcastss ymm12, dword ptr [rcx + 4 * 4] vfmadd231ps ymm4, ymm15, ymm12 vbroadcastss ymm13, dword ptr [rcx + 5 * 4] vfmadd231ps ymm5, ymm15, ymm13 vbroadcastss ymm10, dword ptr [rcx + 6 * 4] vfmadd231ps ymm6, ymm15, ymm10 vbroadcastss ymm11, dword ptr [rcx + 7 * 4] vfmadd231ps ymm7, ymm15, ymm11 vmovaps ymm15, [rax] vbroadcastss ymm8, dword ptr [rcx + 0 * 4] vfmadd231ps ymm0, ymm15, ymm8 vbroadcastss ymm9, dword ptr [rcx + 1 * 4] vfmadd231ps ymm1, ymm15, ymm9 vbroadcastss ymm10, dword ptr [rcx + 2 * 4] vfmadd231ps ymm2, ymm15, ymm10 vbroadcastss ymm11, dword ptr [rcx + 3 * 4] vfmadd231ps ymm3, ymm15, ymm11 vbroadcastss ymm12, dword ptr [rcx + 4 * 4] vfmadd231ps ymm4, ymm15, ymm12 vbroadcastss ymm13, dword ptr [rcx + 5 * 4] vfmadd231ps ymm5, ymm15, ymm13 vbroadcastss ymm10, dword ptr [rcx + 6 * 4] vfmadd231ps ymm6, ymm15, ymm10 vbroadcastss ymm11, dword ptr [rcx + 7 * 4] vfmadd231ps ymm7, ymm15, ymm11