-
Paul E. Murphy authored
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On x86 this showed about 1.4x improvement. For PPC, do a full multiply (32x32->64b), convert to DP then accumulate. This may be slightly less precise for some inputs. But is 1.5x faster than the above which is about 1.5x than the FMA above for ~2.5x speedup.
33fb253a
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
3rdparty/SoftFloat | ||
doc | ||
include/opencv2 | ||
misc | ||
perf | ||
src | ||
test | ||
CMakeLists.txt |