modules/core · 33fb253a66275abaa5060ef318c9a5cc87c5fd6e · submodule / opencv

Paul E. Murphy authored Aug 20, 2019

Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.

33fb253a

Name	Last commit	Last update
..
3rdparty/SoftFloat		Loading commit data...
doc		Loading commit data...
include/opencv2		Loading commit data...
misc		Loading commit data...
perf		Loading commit data...
src		Loading commit data...
test		Loading commit data...
CMakeLists.txt		Loading commit data...