Files · 33fb253a66275abaa5060ef318c9a5cc87c5fd6e · submodule / opencv

Paul E. Murphy authored Aug 20, 2019

Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.

33fb253a

Name	Last commit	Last update
.github		Loading commit data...
3rdparty		Loading commit data...
apps		Loading commit data...
cmake		Loading commit data...
data		Loading commit data...
doc		Loading commit data...
include		Loading commit data...
modules		Loading commit data...
platforms		Loading commit data...
samples		Loading commit data...
.editorconfig		Loading commit data...
.gitattributes		Loading commit data...
.gitignore		Loading commit data...
CMakeLists.txt		Loading commit data...
CONTRIBUTING.md		Loading commit data...
LICENSE		Loading commit data...
README.md		Loading commit data...

README.md