1. 13 Sep, 2018 1 commit
  2. 07 Aug, 2018 1 commit
  3. 02 Aug, 2018 1 commit
  4. 26 Jul, 2018 1 commit
    • Vadim Pisarevsky's avatar
      further improvements in split & merge; started using non-temporary store instructions (#12063) · 43820d89
      Vadim Pisarevsky authored
      * 1. changed static const __m128/256 to const __m128/256 to avoid wierd instructions and calls inserted by compiler.
      2. added universal intrinsics that wrap MOVNTPS and other such (non-temporary or "no cache" store) instructions. v_store_interleave() and v_store() got respective flags/overloaded variants
      3. rewrote split & merge to use the "no cache" store instructions. It resulted in dramatic performance improvement when processing big arrays
      
      * hopefully, fixed some test failures where 4-channel v_store_interleave() is used
      
      * added missing implementation of the new universal intrinsics (v_store_aligned_nocache() etc.)
      
      * fixed silly typo in the new intrinsics in intrin_vsx.hpp
      
      * still trying to fix VSX compiler errors
      
      * still trying to fix VSX compiler errors
      
      * still trying to fix VSX compiler errors
      
      * still trying to fix VSX compiler errors
      43820d89
  5. 24 Jul, 2018 1 commit
    • Vadim Pisarevsky's avatar
      converted split() & merge() to wide univ intrinsics (#12044) · 9c704080
      Vadim Pisarevsky authored
      * fixed/updated v_load_deinterleave and v_store_interleave intrinsics; modified split() and merge() functions to use those intrinsics
      
      * fixed a few compile errors and bug in v_load_deinterleave(ptr, v_uint32x4& a, v_uint32x4& b)
      
      * fixed few more compile errors
      9c704080
  6. 04 Jul, 2018 1 commit
  7. 12 Feb, 2018 1 commit
  8. 17 Dec, 2015 1 commit
  9. 03 Dec, 2015 1 commit
    • Maksim Shabunin's avatar
      HAL: improvements · b4bcdd10
      Maksim Shabunin authored
      - added new functions from core module: split, merge, add, sub, mul, div, ...
      - added function replacement mechanism
      - added example of HAL replacement library
      b4bcdd10