• Arjan van de Ven's avatar
    Merge pull request #10468 from fenrus75:avx512-2 · a75840d1
    Arjan van de Ven authored
    * Add a 512 bit codepath to the AVX512 fastConv function
    
    this patch adds a 512 wide codepath to the fastConv() function for
    AVX512 use.
    The basic idea is to process the first N * 16 elements of the vector
    with avx512, and then run the rest of the vector using the traditional
    AVX2 codepath.
    
    * dnn: use unaligned AVX512 load (OpenCV aligns data on 32-byte boundary)
    
    * dnn: change "vecsize" condition for AVX512
    
    * dnn: fix indentation
    a75840d1
layers_common.simd.hpp 19.4 KB