1. 18 Jan, 2017 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized NV12/21 To RGB row functions · 09b8c971
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C auto-vectorized)
      NV12ToARGBRow_MSA       - ~1.5x
      NV12ToARGBRow_Any_MSA   - ~1.4x
      NV12ToRGB565Row_MSA     - ~1.4x
      NV12ToRGB565Row_Any_MSA - ~1.4x
      NV21ToARGBRow_MSA       - ~1.5x
      NV21ToARGBRow_Any_MSA   - ~1.5x
      SobelRow_MSA            - ~4.3x
      SobelRow_Any_MSA        - ~3.4x
      SobelToPlaneRow_MSA     - ~8.0x
      SobelToPlaneRow_Any_MSA - ~4.7x
      SobelXYRow_MSA          - ~3.0x
      SobelXYRow_Any_MSA      - ~2.5x
      
      Performance Gain (vs C non-vectorized)
      NV12ToARGBRow_MSA       - ~6.5x
      NV12ToARGBRow_Any_MSA   - ~6.5x
      NV12ToRGB565Row_MSA     - ~6.2x
      NV12ToRGB565Row_Any_MSA - ~6.1x
      NV21ToARGBRow_MSA       - ~6.5x
      NV21ToARGBRow_Any_MSA   - ~6.5x
      SobelRow_MSA            - ~14.5x
      SobelRow_Any_MSA        - ~11.3x
      SobelToPlaneRow_MSA     - ~34.2x
      SobelToPlaneRow_Any_MSA - ~19.4x
      SobelXYRow_MSA          - ~11.1x
      SobelXYRow_Any_MSA      - ~9.1x
      
      Review-Url: https://codereview.chromium.org/2636483002 .
      09b8c971
  2. 13 Jan, 2017 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized RAW/RGB/ARGB to ARGB/Y/UV row functions · 7c64163f
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C vectorized)
      ARGB1555ToARGBRow_MSA     - 1.85
      ARGB1555ToARGBRow_Any_MSA - 1.82
      RGB565ToARGBRow_MSA       - 2.14
      RGB565ToARGBRow_Any_MSA   - 2.08
      RGB24ToARGBRow_MSA        - 8.57
      RGB24ToARGBRow_Any_MSA    - 7.42
      RAWToARGBRow_MSA          - 8.57
      RAWToARGBRow_Any_MSA      - 7.42
      ARGB1555ToYRow_MSA        - 2.60
      ARGB1555ToYRow_Any_MSA    - 2.47
      RGB565ToYRow_MSA          - 2.45
      RGB565ToYRow_Any_MSA      - 2.33
      RGB24ToYRow_MSA           - 2.23
      RGB24ToYRow_Any_MSA       - 2.01
      RAWToYRow_MSA             - 2.25
      RAWToYRow_Any_MSA         - 2.02
      ARGB1555ToUVRow_MSA       - 1.40
      ARGB1555ToUVRow_Any_MSA   - 1.37
      RGB565ToUVRow_MSA         - 1.68
      RGB565ToUVRow_Any_MSA     - 1.63
      RGB24ToUVRow_MSA          - 3.02
      RGB24ToUVRow_Any_MSA      - 2.87
      RAWToUVRow_MSA            - 3.04
      RAWToUVRow_Any_MSA        - 2.85
      
      Performance Gain (vs C non-vectorized)
      ARGB1555ToARGBRow_MSA     - 4.66
      ARGB1555ToARGBRow_Any_MSA - 4.45
      RGB565ToARGBRow_MSA       - 5.58
      RGB565ToARGBRow_Any_MSA   - 5.34
      RGB24ToARGBRow_MSA        - 8.57
      RGB24ToARGBRow_Any_MSA    - 7.42
      RAWToARGBRow_MSA          - 8.57
      RAWToARGBRow_Any_MSA      - 7.42
      ARGB1555ToYRow_MSA        - 6.38
      ARGB1555ToYRow_Any_MSA    - 5.98
      RGB565ToYRow_MSA          - 6.42
      RGB565ToYRow_Any_MSA      - 6.05
      RGB24ToYRow_MSA           - 7.87
      RGB24ToYRow_Any_MSA       - 7.01
      RAWToYRow_MSA             - 7.98
      RAWToYRow_Any_MSA         - 7.01
      ARGB1555ToUVRow_MSA       - 5.39
      ARGB1555ToUVRow_Any_MSA   - 5.06
      RGB565ToUVRow_MSA         - 6.39
      RGB565ToUVRow_Any_MSA     - 5.90
      RGB24ToUVRow_MSA          - 3.04
      RGB24ToUVRow_Any_MSA      - 2.87
      RAWToUVRow_MSA            - 3.04
      RAWToUVRow_Any_MSA        - 2.88
      
      Review-Url: https://codereview.chromium.org/2600713002 .
      7c64163f
  3. 11 Jan, 2017 1 commit
    • Frank Barchard's avatar
      Libyuv MIPS DSPR2 optimizations. · 000d2fa9
      Frank Barchard authored
      Optimized functions:
      
      I444ToARGBRow_DSPR2
      I422ToARGB4444Row_DSPR2
      I422ToARGB1555Row_DSPR2
      NV12ToARGBRow_DSPR2
      BGRAToUVRow_DSPR2
      BGRAToYRow_DSPR2
      ABGRToUVRow_DSPR2
      ARGBToYRow_DSPR2
      ABGRToYRow_DSPR2
      RGBAToUVRow_DSPR2
      RGBAToYRow_DSPR2
      ARGBToUVRow_DSPR2
      RGB24ToARGBRow_DSPR2
      RAWToARGBRow_DSPR2
      RGB565ToARGBRow_DSPR2
      ARGB1555ToARGBRow_DSPR2
      ARGB4444ToARGBRow_DSPR2
      ScaleAddRow_DSPR2
      
      Bug-fixes in functions:
      
      ScaleRowDown2_DSPR2
      ScaleRowDown4_DSPR2
      
      BUG=
      
      Review-Url: https://codereview.chromium.org/2626123003 .
      000d2fa9
  4. 15 Dec, 2016 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions · a899dea2
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C vectorized)
      ARGBAttenuateRow_MSA          - ~1.1x
      ARGBAttenuateRow_Any_MSA      - ~1.1x
      ARGBToRGB565DitherRow_MSA     - ~6.4x
      ARGBToRGB565DitherRow_Any_MSA - ~6.2x
      ARGBShuffleRow_MSA            - ~5.1x
      ARGBShuffleRow_Any_MSA        - ~1.9x
      ARGBShadeRow_MSA              - ~1.1x
      ARGBGrayRow_MSA               - ~2.6x
      ARGBSepiaRow_MSA              - ~11.6x
      
      Performance Gain (vs C non-vectorized)
      ARGBAttenuateRow_MSA          - ~2.46x
      ARGBAttenuateRow_Any_MSA      - ~2.45x
      ARGBToRGB565DitherRow_MSA     - ~9.4x
      ARGBToRGB565DitherRow_Any_MSA - ~12.5x
      ARGBShuffleRow_MSA            - ~5.2x
      ARGBShuffleRow_Any_MSA        - ~1.9x
      ARGBShadeRow_MSA              - ~4.3x
      ARGBGrayRow_MSA               - ~10.5x
      ARGBSepiaRow_MSA              - ~12.2x
      
      Review-Url: https://codereview.chromium.org/2559693002 .
      a899dea2
  5. 02 Dec, 2016 1 commit
    • Manojkumar Bhosale's avatar
      Add MSA optimized ARGB Multiply/Add/Subtract row functions · 83f460be
      Manojkumar Bhosale authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C vectorized)
      ARGBMultiplyRow_MSA       - 1.4x
      ARGBAddRow_MSA            - 8.6x
      ARGBSubtractRow_MSA       - 8.6x
      
      ARGBMultiplyRow_Any_MSA   - 1.35x
      ARGBAddRow_Any_MSA        - 7.3x
      ARGBSubtractRow_Any_MSA   - 7.2x
      
      Performance Gain (vs C non-vectorized)
      ARGBMultiplyRow_MSA       - 4.4x
      ARGBAddRow_MSA            - 27x
      ARGBSubtractRow_MSA       - 22x
      
      ARGBMultiplyRow_Any_MSA   - 3.5x
      ARGBAddRow_Any_MSA        - 23x
      ARGBSubtractRow_Any_MSA   - 18x
      
      Review URL: https://codereview.chromium.org/2529983002 .
      83f460be
  6. 22 Nov, 2016 1 commit
    • Frank Barchard's avatar
      Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA,… · da0c29da
      Frank Barchard authored
      Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions
      
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gain (vs C vectorized)
      ARGBToRGB565Row_MSA       - ~1.6x
      ARGBToRGB565Row_Any_MSA   - ~1.6x
      ARGBToARGB1555Row_MSA     - ~1.3x
      ARGBToARGB1555Row_Any_MSA - ~1.3x
      ARGBToARGB4444Row_MSA     - ~3.8x
      ARGBToARGB4444Row_Any_MSA - ~3.8x
      ARGBToUV444Row_MSA        - ~2.4x
      ARGBToUV444Row_Any_MSA    - ~2.4x
      
      Performance Gain (vs C non-vectorized)
      ARGBToRGB565Row_MSA       - ~2.8x
      ARGBToRGB565Row_Any_MSA   - ~2.8x
      ARGBToARGB1555Row_MSA     - ~2.2x
      ARGBToARGB1555Row_Any_MSA - ~2.2x
      ARGBToARGB4444Row_MSA     - ~6.8x
      ARGBToARGB4444Row_Any_MSA - ~6.6x
      ARGBToUV444Row_MSA        - ~6.7x
      ARGBToUV444Row_Any_MSA    - ~6.7x
      
      Review URL: https://codereview.chromium.org/2520003004 .
      da0c29da
  7. 18 Nov, 2016 1 commit
  8. 08 Nov, 2016 1 commit
  9. 27 Oct, 2016 1 commit
  10. 26 Oct, 2016 1 commit
  11. 24 Oct, 2016 1 commit
    • Frank Barchard's avatar
      Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions · f5d5bd88
      Frank Barchard authored
      R=fbarchard@google.com
      BUG=libyuv:634
      
      Performance Gains :- (vs C vectorized)
      
      I422ToARGBRow_MSA     : ~1.6x
      I422ToRGBARow_MSA     : ~1.6x
      
      I422ToARGBRow_Any_MSA : ~1.58x
      I422ToRGBARow_Any_MSA : ~1.6x
      
      Performance Gains :- (vs C non-vectorized)
      
      I422ToARGBRow_MSA     : ~7x
      I422ToRGBARow_MSA     : ~7x
      
      I422ToARGBRow_Any_MSA : ~6.9x
      I422ToRGBARow_Any_MSA : ~6.8x
      
      Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
      
      Review URL: https://codereview.chromium.org/2430313005 .
      f5d5bd88
  12. 21 Oct, 2016 1 commit
    • Frank Barchard's avatar
      scale by 1 for neon implemented · 451af5e9
      Frank Barchard authored
      void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
        asm volatile (
        "1:                                          \n"
          MEMACCESS(0)
          "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
          "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
          "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
          "uxtl2      v1.4s, v1.8h                   \n"
          "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
          "scvtf      v1.4s, v1.4s                   \n"
          "fcvtn      v4.4h, v2.4s                   \n"  // 8 floatsgit
          "fcvtn2     v4.8h, v1.4s                   \n"
         MEMACCESS(1)
          "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
          "b.gt       1b                             \n"
        : "+r"(src),    // %0
          "+r"(dst),    // %1
          "+r"(width)   // %2
        :
        : "cc", "memory", "v1", "v2", "v4"
        );
      }
      
      void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
        asm volatile (
        "1:                                          \n"
          MEMACCESS(0)
          "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
          "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
          "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
          "uxtl2      v1.4s, v1.8h                   \n"
          "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
          "scvtf      v1.4s, v1.4s                   \n"
          "fmul       v2.4s, v2.4s, %3.s[0]          \n"  // adjust exponent
          "fmul       v1.4s, v1.4s, %3.s[0]          \n"
          "uqshrn     v4.4h, v2.4s, #13              \n"  // isolate halffloat
          "uqshrn2    v4.8h, v1.4s, #13              \n"
         MEMACCESS(1)
          "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
          "b.gt       1b                             \n"
        : "+r"(src),    // %0
          "+r"(dst),    // %1
          "+r"(width)   // %2
        : "w"(scale * 1.9259299444e-34f)    // %3
        : "cc", "memory", "v1", "v2", "v4"
        );
      }
      
      TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
      BUG=libyuv:560
      R=hubbe@chromium.org
      
      Review URL: https://codereview.chromium.org/2430313008 .
      451af5e9
  13. 20 Oct, 2016 1 commit
  14. 19 Oct, 2016 1 commit
  15. 15 Oct, 2016 1 commit
  16. 13 Oct, 2016 1 commit
  17. 11 Oct, 2016 1 commit
    • Frank Barchard's avatar
      Remove I411 support. · d363ea65
      Frank Barchard authored
      YUV 411 is very uncommon format.  Remove support.
      
      Update documentation to reflect that 411 is deprecated.
      
      Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.
      
      BUG=libyuv:645
      R=kjellander@chromium.org
      
      Review URL: https://codereview.chromium.org/2406123002 .
      d363ea65
  18. 08 Oct, 2016 1 commit
  19. 07 Oct, 2016 1 commit
  20. 06 Oct, 2016 1 commit
    • Frank Barchard's avatar
      YUY2ToI422_Any_Neon clean up to not require 16 pixels · 3b88a19a
      Frank Barchard authored
      YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
      the last pixel.  The replication was not necessary after a previous
      change to treat YUY2 to 4 byte macro pixels.
      
      TBR=harryjin@google.com
      BUG=libyuv:648
      TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"
      
      Review URL: https://codereview.chromium.org/2399143002 .
      3b88a19a
  21. 04 Oct, 2016 1 commit
  22. 30 Sep, 2016 1 commit
  23. 29 Sep, 2016 1 commit
  24. 26 Sep, 2016 1 commit
  25. 22 Sep, 2016 1 commit
  26. 07 Jun, 2016 1 commit
    • Frank Barchard's avatar
      ARGBExtractAlpha 16 pixels at a time for ARM · 65460962
      Frank Barchard authored
      arm64   8     TestARGBExtractAlpha (10019 ms) <-original 64 bit code
      arm64   8 x2  TestARGBExtractAlpha (7639 ms)
      arm64   16    TestARGBExtractAlpha (7369 ms) <- new 64 bit code
      thumb32 8     TestARGBExtractAlpha (9505 ms) <- original 32 bit code
      thumb32 8 x2  TestARGBExtractAlpha (7400 ms)
      thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code
      arm32   8     TestARGBExtractAlpha (10002 ms)
      
      BUG=libyuv:572
      TESTED=local test on nexus 9
      R=harryjin@google.com, wangcheng@google.com
      
      Review URL: https://codereview.chromium.org/2035573002 .
      65460962
  27. 26 May, 2016 1 commit
  28. 23 May, 2016 1 commit
  29. 18 Feb, 2016 1 commit
  30. 05 Feb, 2016 1 commit
  31. 13 Jan, 2016 1 commit
  32. 17 Dec, 2015 1 commit
  33. 09 Dec, 2015 1 commit
    • Frank Barchard's avatar
      BlendPlane any width. · a2ea9056
      Frank Barchard authored
      Benchmark
      out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
      
      Was
      I420Blend_Any (2321 ms)
      I420Blend_Unaligned (1684 ms)
      I420Blend_Opt (1675 ms)
      I420Blend_Invert (1653 ms)
      BlendPlane_Invert (1556 ms)
      BlendPlane_Any (1552 ms)
      BlendPlane_Unaligned (1548 ms)
      BlendPlane_Opt (1535 ms)
      ARGBBlend_Unaligned (659 ms)
      ARGBBlend_Any (596 ms)
      ARGBBlend_Invert (591 ms)
      ARGBBlend_Opt (508 ms)
      BlendPlaneRow_Unaligned (186 ms)
      BlendPlaneRow_Opt (171 ms)
      
      Now
      ARGBBlend_Any (621 ms)
      ARGBBlend_Unaligned (585 ms)
      ARGBBlend_Invert (564 ms)
      ARGBBlend_Opt (512 ms)
      I420Blend_Unaligned (347 ms)
      I420Blend_Invert (345 ms)
      I420Blend_Any (337 ms)
      I420Blend_Opt (327 ms)
      BlendPlane_Unaligned (187 ms)
      BlendPlaneRow_Unaligned (187 ms)
      BlendPlane_Invert (186 ms)
      BlendPlane_Any (186 ms)
      BlendPlaneRow_Opt (173 ms)
      BlendPlane_Opt (171 ms)
      
      which is comparable to aligned case
      out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
      ARGBBlend_Any (625 ms)
      ARGBBlend_Unaligned (602 ms)
      ARGBBlend_Invert (508 ms)
      ARGBBlend_Opt (506 ms)
      I420Blend_Any (353 ms)
      I420Blend_Unaligned (322 ms)
      I420Blend_Invert (304 ms)
      I420Blend_Opt (301 ms)
      BlendPlaneRow_Unaligned (188 ms)
      BlendPlane_Unaligned (186 ms)
      BlendPlane_Invert (185 ms)
      BlendPlane_Any (184 ms)
      BlendPlaneRow_Opt (173 ms)
      BlendPlane_Opt (169 ms)
      
      R=dhrosa@google.com, harryjin@google.com
      BUG=libyuv:527
      
      Review URL: https://codereview.chromium.org/1513443002 .
      a2ea9056
  34. 19 Nov, 2015 1 commit
  35. 10 Nov, 2015 1 commit
  36. 04 Nov, 2015 1 commit
  37. 03 Nov, 2015 1 commit
  38. 02 Nov, 2015 1 commit
  39. 30 Oct, 2015 1 commit
  40. 27 Oct, 2015 1 commit