• Arjan van de Ven's avatar
    Merge pull request #10468 from fenrus75:avx512-2 · a75840d1
    Arjan van de Ven authored
    * Add a 512 bit codepath to the AVX512 fastConv function
    
    this patch adds a 512 wide codepath to the fastConv() function for
    AVX512 use.
    The basic idea is to process the first N * 16 elements of the vector
    with avx512, and then run the rest of the vector using the traditional
    AVX2 codepath.
    
    * dnn: use unaligned AVX512 load (OpenCV aligns data on 32-byte boundary)
    
    * dnn: change "vecsize" condition for AVX512
    
    * dnn: fix indentation
    a75840d1
Name
Last commit
Last update
..
caffe Loading commit data...
darknet Loading commit data...
layers Loading commit data...
ocl4dnn Loading commit data...
opencl Loading commit data...
tensorflow Loading commit data...
torch Loading commit data...
dnn.cpp Loading commit data...
halide_scheduler.cpp Loading commit data...
halide_scheduler.hpp Loading commit data...
init.cpp Loading commit data...
nms.cpp Loading commit data...
nms.inl.hpp Loading commit data...
op_halide.cpp Loading commit data...
op_halide.hpp Loading commit data...
precomp.hpp Loading commit data...