-
Arjan van de Ven authored
* Add a 512 bit codepath to the AVX512 fastConv function this patch adds a 512 wide codepath to the fastConv() function for AVX512 use. The basic idea is to process the first N * 16 elements of the vector with avx512, and then run the rest of the vector using the traditional AVX2 codepath. * dnn: use unaligned AVX512 load (OpenCV aligns data on 32-byte boundary) * dnn: change "vecsize" condition for AVX512 * dnn: fix indentation
a75840d1
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
caffe | ||
darknet | ||
layers | ||
ocl4dnn | ||
opencl | ||
tensorflow | ||
torch | ||
dnn.cpp | ||
halide_scheduler.cpp | ||
halide_scheduler.hpp | ||
init.cpp | ||
nms.cpp | ||
nms.inl.hpp | ||
op_halide.cpp | ||
op_halide.hpp | ||
precomp.hpp |