• Paul E. Murphy's avatar
    core: vectorize dotProd_32s · 33fb253a
    Paul E. Murphy authored
    Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
    x86 this showed about 1.4x improvement.
    
    For PPC, do a full multiply (32x32->64b), convert to DP
    then accumulate. This may be slightly less precise for
    some inputs. But is 1.5x faster than the above which
    is about 1.5x than the FMA above for ~2.5x speedup.
    33fb253a
Name
Last commit
Last update
..
3rdparty/SoftFloat Loading commit data...
doc Loading commit data...
include/opencv2 Loading commit data...
misc Loading commit data...
perf Loading commit data...
src Loading commit data...
test Loading commit data...
CMakeLists.txt Loading commit data...