• Vadim Pisarevsky's avatar
    further improvements in split & merge; started using non-temporary store instructions (#12063) · 43820d89
    Vadim Pisarevsky authored
    * 1. changed static const __m128/256 to const __m128/256 to avoid wierd instructions and calls inserted by compiler.
    2. added universal intrinsics that wrap MOVNTPS and other such (non-temporary or "no cache" store) instructions. v_store_interleave() and v_store() got respective flags/overloaded variants
    3. rewrote split & merge to use the "no cache" store instructions. It resulted in dramatic performance improvement when processing big arrays
    
    * hopefully, fixed some test failures where 4-channel v_store_interleave() is used
    
    * added missing implementation of the new universal intrinsics (v_store_aligned_nocache() etc.)
    
    * fixed silly typo in the new intrinsics in intrin_vsx.hpp
    
    * still trying to fix VSX compiler errors
    
    * still trying to fix VSX compiler errors
    
    * still trying to fix VSX compiler errors
    
    * still trying to fix VSX compiler errors
    43820d89
Name
Last commit
Last update
..
cuda Loading commit data...
opencl Loading commit data...
utils Loading commit data...
algorithm.cpp Loading commit data...
alloc.cpp Loading commit data...
arithm.cpp Loading commit data...
arithm_core.hpp Loading commit data...
arithm_simd.hpp Loading commit data...
array.cpp Loading commit data...
batch_distance.cpp Loading commit data...
bufferpool.impl.hpp Loading commit data...
channels.cpp Loading commit data...
check.cpp Loading commit data...
command_line_parser.cpp Loading commit data...
conjugate_gradient.cpp Loading commit data...
convert.avx2.cpp Loading commit data...
convert.cpp Loading commit data...
convert.fp16.cpp Loading commit data...
convert.hpp Loading commit data...
convert.sse4_1.cpp Loading commit data...
convert_c.cpp Loading commit data...
convert_scale.cpp Loading commit data...
copy.cpp Loading commit data...
count_non_zero.cpp Loading commit data...
cuda_gpu_mat.cpp Loading commit data...
cuda_host_mem.cpp Loading commit data...
cuda_info.cpp Loading commit data...
cuda_stream.cpp Loading commit data...
datastructs.cpp Loading commit data...
directx.cpp Loading commit data...
directx.inc.hpp Loading commit data...
downhill_simplex.cpp Loading commit data...
dxt.cpp Loading commit data...
gl_core_3_1.cpp Loading commit data...
gl_core_3_1.hpp Loading commit data...
glob.cpp Loading commit data...
hal_internal.cpp Loading commit data...
hal_internal.hpp Loading commit data...
hal_replacement.hpp Loading commit data...
intel_gpu_gemm.inl.hpp Loading commit data...
kmeans.cpp Loading commit data...
lapack.cpp Loading commit data...
lda.cpp Loading commit data...
logger.cpp Loading commit data...
lpsolver.cpp Loading commit data...
lut.cpp Loading commit data...
mathfuncs.cpp Loading commit data...
mathfuncs_core.dispatch.cpp Loading commit data...
mathfuncs_core.simd.hpp Loading commit data...
matmul.cpp Loading commit data...
matrix.cpp Loading commit data...
matrix_c.cpp Loading commit data...
matrix_decomp.cpp Loading commit data...
matrix_expressions.cpp Loading commit data...
matrix_iterator.cpp Loading commit data...
matrix_operations.cpp Loading commit data...
matrix_sparse.cpp Loading commit data...
matrix_wrap.cpp Loading commit data...
mean.cpp Loading commit data...
merge.cpp Loading commit data...
minmax.cpp Loading commit data...
norm.cpp Loading commit data...
ocl.cpp Loading commit data...
ocl_deprecated.hpp Loading commit data...
opengl.cpp Loading commit data...
out.cpp Loading commit data...
ovx.cpp Loading commit data...
parallel.cpp Loading commit data...
parallel_impl.cpp Loading commit data...
parallel_impl.hpp Loading commit data...
pca.cpp Loading commit data...
persistence.cpp Loading commit data...
persistence.hpp Loading commit data...
persistence_base64.cpp Loading commit data...
persistence_c.cpp Loading commit data...
persistence_cpp.cpp Loading commit data...
persistence_json.cpp Loading commit data...
persistence_types.cpp Loading commit data...
persistence_xml.cpp Loading commit data...
persistence_yml.cpp Loading commit data...
precomp.hpp Loading commit data...
rand.cpp Loading commit data...
softfloat.cpp Loading commit data...
split.cpp Loading commit data...
stat.dispatch.cpp Loading commit data...
stat.hpp Loading commit data...
stat.simd.hpp Loading commit data...
stat_c.cpp Loading commit data...
stl.cpp Loading commit data...
sum.cpp Loading commit data...
system.cpp Loading commit data...
tables.cpp Loading commit data...
trace.cpp Loading commit data...
types.cpp Loading commit data...
umatrix.cpp Loading commit data...
umatrix.hpp Loading commit data...
va_intel.cpp Loading commit data...