1. 24 Feb, 2018 1 commit
  2. 07 Feb, 2018 1 commit
  3. 24 Jan, 2018 1 commit
  4. 23 Jan, 2018 1 commit
  5. 02 Jan, 2018 2 commits
    • Frank Barchard's avatar
      I420ToI010 for 8 to 10 bit YUV conversion. · 2ed2402f
      Frank Barchard authored
      Convert planar 8 bit formats to planar 16 bit formats.
      
      Includes msan fix for Convert8To16Row_Opt unittest.
      
      I420 is YUV bt.601 8 bits per channel with 420 subsampling.
      I010 is YUV bt.601 10 bits per channel with 420 subsampling.
      I is color space - bt.601.  The function does no color space
       conversion so H420ToI010 is aliased to this function as well.
      0 = 420 subsampling.  The chroma channels are half width / height.
      10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.
      
      For SSSE3 version:
      out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
      [ RUN      ] LibYUVConvertTest.I420ToI010_Opt
      [       OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)
      
      Bug: libyuv:751
      Test: LibYUVConvertTest.I420ToI010_Opt
      Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
      Reviewed-on: https://chromium-review.googlesource.com/846421Reviewed-by: 's avatarMiguel Casas <mcasas@chromium.org>
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      2ed2402f
    • Frank Barchard's avatar
      Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2 · 140fc0a2
      Frank Barchard authored
      LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
      Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.
      
      Bug: libyuv: 769
      Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
      Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
      Reviewed-on: https://chromium-review.googlesource.com/846541Reviewed-by: 's avatarWeiyong Yao <braveyao@chromium.org>
      Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      140fc0a2
  6. 15 Dec, 2017 1 commit
  7. 14 Dec, 2017 1 commit
  8. 11 Sep, 2017 1 commit
  9. 29 Aug, 2017 1 commit
  10. 17 Aug, 2017 1 commit
  11. 14 Aug, 2017 1 commit
  12. 08 Mar, 2017 1 commit
  13. 03 Mar, 2017 1 commit
  14. 23 Feb, 2017 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized Interpolate/MergeUV/Misc functions · 45b176d1
      Manojkumar Bhosale authored
      BUG=libyuv:634
      
      Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
      
      Performance Gain (vs C auto-vectorized)
      InterpolateRow_MSA      - ~3.3x
      InterpolateRow_Any_MSA  - ~2.5x
      ARGBSetRow_MSA          - ~1.0x
      ARGBSetRow_Any_MSA      - ~1.0x
      ARGBToRGB24Row_MSA      - ~1.9x
      ARGBToRGB24Row_Any_MSA  - ~1.6x
      MergeUVRow_MSA          - ~1.6x
      MergeUVRow_Any_MSA      - ~1.2x
      
      Performance Gain (vs C non-vectorized)
      InterpolateRow_MSA      - ~11.3x
      InterpolateRow_Any_MSA  - ~ 7.9x
      ARGBSetRow_MSA          - ~ 6.2x
      ARGBSetRow_Any_MSA      - ~ 4.0x
      ARGBToRGB24Row_MSA      - ~ 9.9x
      ARGBToRGB24Row_Any_MSA  - ~ 8.4x
      MergeUVRow_MSA          - ~12.7x
      MergeUVRow_Any_MSA      - ~ 8.0x
      
      Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
      Reviewed-on: https://chromium-review.googlesource.com/445817Reviewed-by: 's avatarFrank Barchard <fbarchard@google.com>
      Commit-Queue: Frank Barchard <fbarchard@google.com>
      45b176d1
  15. 14 Feb, 2017 1 commit
  16. 01 Feb, 2017 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized ARGB/ABGR/BGRA/RGBA To Y/UV row functions · 54ce8f23
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C auto-vectorized)
      ARGBToYJRow_MSA       - ~3.2x
      ARGBToYJRow_Any_MSA   - ~2.7x
      BGRAToYRow_MSA        - ~3.2x
      BGRAToYRow_Any_MSA    - ~2.7x
      ABGRToYRow_MSA        - ~3.2x
      ABGRToYRow_Any_MSA    - ~2.6x
      RGBAToYRow_MSA        - ~3.1x
      RGBAToYRow_Any_MSA    - ~2.7x
      ARGBToUVJRow_MSA      - ~5.5x
      ARGBToUVJRow_Any_MSA  - ~4.5x
      BGRAToUVRow_MSA       - ~2.1x
      BGRAToUVRow_Any_MSA   - ~2.0x
      ABGRToUVRow_MSA       - ~2.1x
      ABGRToUVRow_Any_MSA   - ~1.9x
      RGBAToUVRow_MSA       - ~2.2x
      RGBAToUVRow_Any_MSA   - ~1.9x
      
      Performance Gain (vs C non-vectorized)
      ARGBToYJRow_MSA       - ~10.9x
      ARGBToYJRow_Any_MSA   -  ~9.2x
      BGRAToYRow_MSA        - ~10.9x
      BGRAToYRow_Any_MSA    -  ~9.3x
      ABGRToYRow_MSA        - ~11.0x
      ABGRToYRow_Any_MSA    -  ~9.3x
      RGBAToYRow_MSA        - ~10.9x
      RGBAToYRow_Any_MSA    -  ~9.1x
      ARGBToUVJRow_MSA      - ~12.4x
      ARGBToUVJRow_Any_MSA  - ~10.5x
      BGRAToUVRow_MSA       -  ~4.7x
      BGRAToUVRow_Any_MSA   -  ~4.4x
      ABGRToUVRow_MSA       -  ~4.7x
      ABGRToUVRow_Any_MSA   -  ~4.5x
      RGBAToUVRow_MSA       -  ~4.8x
      RGBAToUVRow_Any_MSA   -  ~4.4x
      
      Review-Url: https://codereview.chromium.org/2641153003 .
      54ce8f23
  17. 18 Jan, 2017 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized NV12/21 To RGB row functions · 09b8c971
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C auto-vectorized)
      NV12ToARGBRow_MSA       - ~1.5x
      NV12ToARGBRow_Any_MSA   - ~1.4x
      NV12ToRGB565Row_MSA     - ~1.4x
      NV12ToRGB565Row_Any_MSA - ~1.4x
      NV21ToARGBRow_MSA       - ~1.5x
      NV21ToARGBRow_Any_MSA   - ~1.5x
      SobelRow_MSA            - ~4.3x
      SobelRow_Any_MSA        - ~3.4x
      SobelToPlaneRow_MSA     - ~8.0x
      SobelToPlaneRow_Any_MSA - ~4.7x
      SobelXYRow_MSA          - ~3.0x
      SobelXYRow_Any_MSA      - ~2.5x
      
      Performance Gain (vs C non-vectorized)
      NV12ToARGBRow_MSA       - ~6.5x
      NV12ToARGBRow_Any_MSA   - ~6.5x
      NV12ToRGB565Row_MSA     - ~6.2x
      NV12ToRGB565Row_Any_MSA - ~6.1x
      NV21ToARGBRow_MSA       - ~6.5x
      NV21ToARGBRow_Any_MSA   - ~6.5x
      SobelRow_MSA            - ~14.5x
      SobelRow_Any_MSA        - ~11.3x
      SobelToPlaneRow_MSA     - ~34.2x
      SobelToPlaneRow_Any_MSA - ~19.4x
      SobelXYRow_MSA          - ~11.1x
      SobelXYRow_Any_MSA      - ~9.1x
      
      Review-Url: https://codereview.chromium.org/2636483002 .
      09b8c971
  18. 15 Dec, 2016 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions · a899dea2
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C vectorized)
      ARGBAttenuateRow_MSA          - ~1.1x
      ARGBAttenuateRow_Any_MSA      - ~1.1x
      ARGBToRGB565DitherRow_MSA     - ~6.4x
      ARGBToRGB565DitherRow_Any_MSA - ~6.2x
      ARGBShuffleRow_MSA            - ~5.1x
      ARGBShuffleRow_Any_MSA        - ~1.9x
      ARGBShadeRow_MSA              - ~1.1x
      ARGBGrayRow_MSA               - ~2.6x
      ARGBSepiaRow_MSA              - ~11.6x
      
      Performance Gain (vs C non-vectorized)
      ARGBAttenuateRow_MSA          - ~2.46x
      ARGBAttenuateRow_Any_MSA      - ~2.45x
      ARGBToRGB565DitherRow_MSA     - ~9.4x
      ARGBToRGB565DitherRow_Any_MSA - ~12.5x
      ARGBShuffleRow_MSA            - ~5.2x
      ARGBShuffleRow_Any_MSA        - ~1.9x
      ARGBShadeRow_MSA              - ~4.3x
      ARGBGrayRow_MSA               - ~10.5x
      ARGBSepiaRow_MSA              - ~12.2x
      
      Review-Url: https://codereview.chromium.org/2559693002 .
      a899dea2
  19. 02 Dec, 2016 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized ARGB Multiply/Add/Subtract row functions · 83f460be
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C vectorized)
      ARGBMultiplyRow_MSA       - 1.4x
      ARGBAddRow_MSA            - 8.6x
      ARGBSubtractRow_MSA       - 8.6x
      
      ARGBMultiplyRow_Any_MSA   - 1.35x
      ARGBAddRow_Any_MSA        - 7.3x
      ARGBSubtractRow_Any_MSA   - 7.2x
      
      Performance Gain (vs C non-vectorized)
      ARGBMultiplyRow_MSA       - 4.4x
      ARGBAddRow_MSA            - 27x
      ARGBSubtractRow_MSA       - 22x
      
      ARGBMultiplyRow_Any_MSA   - 3.5x
      ARGBAddRow_Any_MSA        - 23x
      ARGBSubtractRow_Any_MSA   - 18x
      
      Review URL: https://codereview.chromium.org/2529983002 .
      83f460be
  20. 08 Nov, 2016 1 commit
  21. 24 Oct, 2016 1 commit
    • Frank Barchard's avatar
      Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions · f5d5bd88
      Frank Barchard authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gains :- (vs C vectorized)
      
      I422ToARGBRow_MSA     : ~1.6x
      I422ToRGBARow_MSA     : ~1.6x
      
      I422ToARGBRow_Any_MSA : ~1.58x
      I422ToRGBARow_Any_MSA : ~1.6x
      
      Performance Gains :- (vs C non-vectorized)
      
      I422ToARGBRow_MSA     : ~7x
      I422ToRGBARow_MSA     : ~7x
      
      I422ToARGBRow_Any_MSA : ~6.9x
      I422ToRGBARow_Any_MSA : ~6.8x
      
      Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
      
      Review URL: https://codereview.chromium.org/2430313005 .
      f5d5bd88
  22. 21 Oct, 2016 1 commit
    • Frank Barchard's avatar
      scale by 1 for neon implemented · 451af5e9
      Frank Barchard authored
      void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
        asm volatile (
        "1:                                          \n"
          MEMACCESS(0)
          "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
          "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
          "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
          "uxtl2      v1.4s, v1.8h                   \n"
          "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
          "scvtf      v1.4s, v1.4s                   \n"
          "fcvtn      v4.4h, v2.4s                   \n"  // 8 floatsgit
          "fcvtn2     v4.8h, v1.4s                   \n"
         MEMACCESS(1)
          "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
          "b.gt       1b                             \n"
        : "+r"(src),    // %0
          "+r"(dst),    // %1
          "+r"(width)   // %2
        :
        : "cc", "memory", "v1", "v2", "v4"
        );
      }
      
      void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
        asm volatile (
        "1:                                          \n"
          MEMACCESS(0)
          "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
          "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
          "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
          "uxtl2      v1.4s, v1.8h                   \n"
          "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
          "scvtf      v1.4s, v1.4s                   \n"
          "fmul       v2.4s, v2.4s, %3.s[0]          \n"  // adjust exponent
          "fmul       v1.4s, v1.4s, %3.s[0]          \n"
          "uqshrn     v4.4h, v2.4s, #13              \n"  // isolate halffloat
          "uqshrn2    v4.8h, v1.4s, #13              \n"
         MEMACCESS(1)
          "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
          "b.gt       1b                             \n"
        : "+r"(src),    // %0
          "+r"(dst),    // %1
          "+r"(width)   // %2
        : "w"(scale * 1.9259299444e-34f)    // %3
        : "cc", "memory", "v1", "v2", "v4"
        );
      }
      
      TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
      BUG=libyuv:560
      R=hubbe@chromium.org
      
      Review URL: https://codereview.chromium.org/2430313008 .
      451af5e9
  23. 20 Oct, 2016 1 commit
  24. 15 Oct, 2016 1 commit
  25. 14 Oct, 2016 1 commit
  26. 13 Oct, 2016 1 commit
  27. 08 Oct, 2016 2 commits
  28. 07 Oct, 2016 1 commit
  29. 06 Oct, 2016 1 commit
    • Frank Barchard's avatar
      YUY2ToI422_Any_Neon clean up to not require 16 pixels · 3b88a19a
      Frank Barchard authored
      YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
      the last pixel.  The replication was not necessary after a previous
      change to treat YUY2 to 4 byte macro pixels.
      
      TBR=harryjin@google.com
      BUG=libyuv:648
      TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"
      
      Review URL: https://codereview.chromium.org/2399143002 .
      3b88a19a
  30. 03 Oct, 2016 1 commit
  31. 30 Sep, 2016 1 commit
  32. 29 Sep, 2016 1 commit
  33. 26 Sep, 2016 1 commit
  34. 22 Sep, 2016 1 commit
  35. 24 Aug, 2016 2 commits
  36. 07 Jun, 2016 1 commit
    • Frank Barchard's avatar
      ARGBExtractAlpha 16 pixels at a time for ARM · 65460962
      Frank Barchard authored
      arm64   8     TestARGBExtractAlpha (10019 ms) <-original 64 bit code
      arm64   8 x2  TestARGBExtractAlpha (7639 ms)
      arm64   16    TestARGBExtractAlpha (7369 ms) <- new 64 bit code
      thumb32 8     TestARGBExtractAlpha (9505 ms) <- original 32 bit code
      thumb32 8 x2  TestARGBExtractAlpha (7400 ms)
      thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code
      arm32   8     TestARGBExtractAlpha (10002 ms)
      
      BUG=libyuv:572
      TESTED=local test on nexus 9
      R=harryjin@google.com, wangcheng@google.com
      
      Review URL: https://codereview.chromium.org/2035573002 .
      65460962
  37. 28 May, 2016 1 commit