Merge pull request #16556 from ChipKerchner:vectorizeIntegralSumPixels
* Vectorize calculating integral for line for single and multiple channels * Single vector processing for 4-channels - 25-30% faster * Single vector processing for 4-channels - 25-30% faster * Fixed AVX512 code for 4 channels * Disable 3 channel 8UC1 to 32S for SSE2 and SSE3 (slower). Use new version of 8UC1 to 64F for AVX512.
Showing
This diff is collapsed.
Please
register
or
sign in
to comment