Merge pull request #15136 from ChipKerchner:dotProd_unroll
* Unroll multiply and add instructions in dotProd_32f - 35% faster. * Eliminate unnecessary v_reduce_sum instructions.
Showing
Please
register
or
sign in
to comment
* Unroll multiply and add instructions in dotProd_32f - 35% faster. * Eliminate unnecessary v_reduce_sum instructions.