• Paul E. Murphy's avatar
    core: vectorize dotProd_32s · 33fb253a
    Paul E. Murphy authored
    Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
    x86 this showed about 1.4x improvement.
    
    For PPC, do a full multiply (32x32->64b), convert to DP
    then accumulate. This may be slightly less precise for
    some inputs. But is 1.5x faster than the above which
    is about 1.5x than the FMA above for ~2.5x speedup.
    33fb253a
Name
Last commit
Last update
..
cuda Loading commit data...
opencl Loading commit data...
utils Loading commit data...
algorithm.cpp Loading commit data...
alloc.cpp Loading commit data...
arithm.cpp Loading commit data...
arithm.dispatch.cpp Loading commit data...
arithm.simd.hpp Loading commit data...
arithm_ipp.hpp Loading commit data...
array.cpp Loading commit data...
async.cpp Loading commit data...
batch_distance.cpp Loading commit data...
bindings_utils.cpp Loading commit data...
bufferpool.impl.hpp Loading commit data...
channels.cpp Loading commit data...
check.cpp Loading commit data...
command_line_parser.cpp Loading commit data...
conjugate_gradient.cpp Loading commit data...
convert.dispatch.cpp Loading commit data...
convert.hpp Loading commit data...
convert.simd.hpp Loading commit data...
convert_c.cpp Loading commit data...
convert_scale.dispatch.cpp Loading commit data...
convert_scale.simd.hpp Loading commit data...
copy.cpp Loading commit data...
count_non_zero.dispatch.cpp Loading commit data...
count_non_zero.simd.hpp Loading commit data...
cuda_gpu_mat.cpp Loading commit data...
cuda_host_mem.cpp Loading commit data...
cuda_info.cpp Loading commit data...
cuda_stream.cpp Loading commit data...
datastructs.cpp Loading commit data...
directx.cpp Loading commit data...
directx.inc.hpp Loading commit data...
downhill_simplex.cpp Loading commit data...
dxt.cpp Loading commit data...
gl_core_3_1.cpp Loading commit data...
gl_core_3_1.hpp Loading commit data...
glob.cpp Loading commit data...
hal_internal.cpp Loading commit data...
hal_internal.hpp Loading commit data...
hal_replacement.hpp Loading commit data...
intel_gpu_gemm.inl.hpp Loading commit data...
kmeans.cpp Loading commit data...
lapack.cpp Loading commit data...
lda.cpp Loading commit data...
logger.cpp Loading commit data...
lpsolver.cpp Loading commit data...
lut.cpp Loading commit data...
mathfuncs.cpp Loading commit data...
mathfuncs.hpp Loading commit data...
mathfuncs_core.dispatch.cpp Loading commit data...
mathfuncs_core.simd.hpp Loading commit data...
matmul.dispatch.cpp Loading commit data...
matmul.simd.hpp Loading commit data...
matrix.cpp Loading commit data...
matrix_c.cpp Loading commit data...
matrix_decomp.cpp Loading commit data...
matrix_expressions.cpp Loading commit data...
matrix_iterator.cpp Loading commit data...
matrix_operations.cpp Loading commit data...
matrix_sparse.cpp Loading commit data...
matrix_wrap.cpp Loading commit data...
mean.dispatch.cpp Loading commit data...
mean.simd.hpp Loading commit data...
merge.dispatch.cpp Loading commit data...
merge.simd.hpp Loading commit data...
minmax.cpp Loading commit data...
norm.cpp Loading commit data...
ocl.cpp Loading commit data...
ocl_deprecated.hpp Loading commit data...
opengl.cpp Loading commit data...
out.cpp Loading commit data...
ovx.cpp Loading commit data...
parallel.cpp Loading commit data...
parallel_impl.cpp Loading commit data...
parallel_impl.hpp Loading commit data...
pca.cpp Loading commit data...
persistence.cpp Loading commit data...
persistence.hpp Loading commit data...
persistence_base64.cpp Loading commit data...
persistence_c.cpp Loading commit data...
persistence_cpp.cpp Loading commit data...
persistence_json.cpp Loading commit data...
persistence_types.cpp Loading commit data...
persistence_xml.cpp Loading commit data...
persistence_yml.cpp Loading commit data...
precomp.hpp Loading commit data...
rand.cpp Loading commit data...
softfloat.cpp Loading commit data...
split.dispatch.cpp Loading commit data...
split.simd.hpp Loading commit data...
stat.dispatch.cpp Loading commit data...
stat.hpp Loading commit data...
stat.simd.hpp Loading commit data...
stat_c.cpp Loading commit data...
stl.cpp Loading commit data...
sum.dispatch.cpp Loading commit data...
sum.simd.hpp Loading commit data...
system.cpp Loading commit data...
tables.cpp Loading commit data...
trace.cpp Loading commit data...
types.cpp Loading commit data...
umatrix.cpp Loading commit data...
umatrix.hpp Loading commit data...
va_intel.cpp Loading commit data...