1. 02 Feb, 2018 2 commits
    • Frank Barchard's avatar
      I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq · 5790a765
      Frank Barchard authored
      I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
      Instead of 2 vpermq and vpunpcklbw:
      vmovdqu    (%1),%%xmm2
      vmovdqu    0x00(%1,%2,1),%%xmm3
      vpermq     $0xd8,%%ymm2,%%ymm2
      vpermq     $0xd8,%%ymm3,%%ymm3
      vpunpcklbw %%ymm3,%%ymm2,%%ymm2
      
      ..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
      vpmovzxbd  (%1),%%ymm2
      vpmovzxbd  0x00(%1,%2,1),%%ymm3
      vpslld     $0x10,%%ymm3,%%ymm3
      vpor       %%ymm3,%%ymm2,%%ymm2
      which reduces the port 5 bottleneck by 1 cycle.
      
      Bug: libyuv:556
      Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt
      
      Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
      Reviewed-on: https://chromium-review.googlesource.com/899873
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      Reviewed-by: 's avatarrichard winterton <rrwinterton@gmail.com>
      Reviewed-by: 's avatarCheng Wang <wangcheng@google.com>
      5790a765
    • Frank Barchard's avatar
      I422ToYUY2Row_AVX2 use vpmovzxbd instead of vpermq · 7ff53f32
      Frank Barchard authored
      I422ToYUY2Row_AVX2 optimized from 7 cycles per 32 pixels to 6 cycles.
      Instead of 2 vpermq and vpunpcklbw:
      vmovdqu    (%1),%%xmm2
      vmovdqu    0x00(%1,%2,1),%%xmm3
      lea        0x10(%1),%1
      vpermq     $0xd8,%%ymm2,%%ymm2
      vpermq     $0xd8,%%ymm3,%%ymm3
      vpunpcklbw %%ymm3,%%ymm2,%%ymm2
      
      ..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
      vpmovzxbd  (%1),%%ymm2
      vpmovzxbd  0x00(%1,%2,1),%%ymm3
      vpslld     $0x10,%%ymm3,%%ymm3
      vpor       %%ymm3,%%ymm2,%%ymm2
      which reduces the port 5 bottleneck by 1 cycle.
      
      Bug: libyuv:556
      Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt
      
      I422ToYUY2Row_AVX2 optimization
      
      Improve performance of AVX2 code by avoiding vpermq
      
      Bug: libyuv:556
      Test: /usr/local/google/home/fbarchard/iaca-lin64/bin/iaca.sh -reduceout -arch BDW out/Release/obj/libyuv_internal/row_gcc.o
      Change-Id: Ie36732da23ecea1ffcc6b297bacc962780b59ef1
      Reviewed-on: https://chromium-review.googlesource.com/898067
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      Reviewed-by: 's avatarrichard winterton <rrwinterton@gmail.com>
      7ff53f32
  2. 01 Feb, 2018 1 commit
    • Frank Barchard's avatar
      I420ToYUY2_AVX2 port · 664c7356
      Frank Barchard authored
      I420 and I422 To YUY2 and UYVY ported from SSE2 to AVX2.
      
      Was SSE2
      I420ToYUY2_Opt (135 ms)
      I420ToUYVY_Opt (148 ms)
      I422ToYUY2_Opt (145 ms)
      I422ToUYVY_Opt (142 ms)
      
      Now AVX2
      I420ToYUY2_Opt (133 ms)
      I420ToUYVY_Opt (130 ms)
      I422ToYUY2_Opt (127 ms)
      I422ToUYVY_Opt (137 ms)
      
      Bug: libyuv:556
      Test: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*I42?To*UY*Opt
      Change-Id: Ic35f97cee02dc009fd98785589ba17c7cf50bb35
      Reviewed-on: https://chromium-review.googlesource.com/892493
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      Reviewed-by: 's avatarrichard winterton <rrwinterton@gmail.com>
      664c7356
  3. 29 Jan, 2018 2 commits
  4. 27 Jan, 2018 1 commit
  5. 26 Jan, 2018 1 commit
  6. 25 Jan, 2018 1 commit
  7. 24 Jan, 2018 2 commits
  8. 23 Jan, 2018 4 commits
  9. 19 Jan, 2018 2 commits
    • Frank Barchard's avatar
      H010ToAR30 in 1 step with SSSE3 assembly · 09db0c4c
      Frank Barchard authored
      Switch YUV conversion macro to output 16 bits per channel.
      STOREAR30 macro to output AR30.
      
      [ RUN      ] LibYUVConvertTest.TestH420ToARGB
      uniques: B 220, G, 220, R 220
      [       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
      [ RUN      ] LibYUVConvertTest.TestH010ToARGB
      uniques: B 256, G, 256, R 256
      [       OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
      [ RUN      ] LibYUVConvertTest.TestH010ToAR30
      uniques: B 883, G, 883, R 883
      [       OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)
      
      Bug: libyuv:751
      Test: LibYUVConvertTest.H010ToAR30_Opt
      Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
      Reviewed-on: https://chromium-review.googlesource.com/869511Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      Reviewed-by: 's avatarMiguel Casas <mcasas@chromium.org>
      Reviewed-by: 's avatarrichard winterton <rrwinterton@gmail.com>
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      09db0c4c
    • Frank Barchard's avatar
      Add LibYUVConvertTest.TestH010ToAR30 unittest · 37f97210
      Frank Barchard authored
      Tests accuracy of H010ToAR30 on grey scale ramp against float
      and computes a histogram to detect number of unique shades for
      each channel.
      
      With 2 step intermediate using 8 bit RGB, the test shows 256
      unique values.
      
      [ RUN      ] LibYUVConvertTest.TestH420ToARGB
      uniques: B 220, G, 220, R 220
      [       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
      [ RUN      ] LibYUVConvertTest.TestH010ToARGB
      uniques: B 256, G, 256, R 256
      [       OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
      [ RUN      ] LibYUVConvertTest.TestH010ToAR30
      uniques: B 256, G, 256, R 256
      [       OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)
      
      Bug: libyuv:751
      Test  LibYUVConvertTest.TestH010ToAR30 unittest
      
      Change-Id: I6b1e1209247cb00b79b594127b02dae5217dc400
      Reviewed-on: https://chromium-review.googlesource.com/875317Reviewed-by: 's avatarMiguel Casas <mcasas@chromium.org>
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      37f97210
  10. 16 Jan, 2018 1 commit
    • Frank Barchard's avatar
      Remove MEMOPREG x64 NaCL macros · ecab5430
      Frank Barchard authored
      MEMOPREG macros are deprecated in row.h
      
      Regular expressions to remove MEMOPREG macros:
      
      MEMOPREG(movd, 0x00, [u_buf], [v_buf], 1, xmm1)                            \
      MEMOPREG\((.*), (.*), (.*), (.*), (.*), (.*)\)
      "\1    \2(%\3,%\4,\5),%%\6            \\n"
      
      MEMOPREG(movdqu,0x00,1,4,1,xmm2)
      MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)
      "\1    \2(%\3,%\4,\5),%%\6            \\n"
      
      MEMOPREG(movdqu,0x00,1,4,1,xmm2)
      MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
      "\1    \2(%\3,%\4,\5),%%\6           \\n"
      
      TBR=braveyao@chromium.org
      
      Bug: libyuv:702
      Test: try bots pass
      Change-Id: If8743abd9af2e8c549d0c7d3d49733a9b0f0ca86
      Reviewed-on: https://chromium-review.googlesource.com/865964Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      ecab5430
  11. 13 Jan, 2018 1 commit
    • Frank Barchard's avatar
      Remove MEMOPMEM x64 NaCL macros · b33e0f97
      Frank Barchard authored
      MEMOPMEM macros are deprecated in row.h
      
      Usage examples
          MEMOPMEM(vmovdqu,ymm0,0x00,0,1,1)          //  vmovdqu %%ymm0,(%0,%1)
          MEMOPMEM(movdqu,xmm2,0x00,1,0,1)
      
      Regular expressions to remove MEMACCESS macros:
      
      MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
      "\1    %%\2,\3(%\4,%\5,\6)\7 \\n"
      
      MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)
      "\1    %%\2,\3(%\4,%\5,\6)            \\n"
      
      TBR=braveyao@chromium.org
      Bug: libyuv:702
      Test: try bots pass
      Change-Id: Id8c6963d544d16e39bb6a9a0536babfb7f554b3a
      Reviewed-on: https://chromium-review.googlesource.com/865934Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      b33e0f97
  12. 12 Jan, 2018 4 commits
    • Frank Barchard's avatar
      Remove VMEMOPREG x64 NaCL macros · a875ed17
      Frank Barchard authored
      VMEMOPREG macros are deprecated in row.h
      
      Usage examples
          VMEMOPREG(vpavgb,0x00,0,4,1,ymm0,ymm0)     // vpavgb (%0,%4,1),%%ymm0,%%ymm0
          VMEMOPREG(vpavgb,0x20,0,4,1,ymm1,ymm1)
      
      Regular expressions to remove MEMACCESS macros:
      
      VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
      "\1    \2(%\3,%\4,\5),%%\6,%%\7      \\n"
      
      VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)
      "\1    \2(%\3,%\4,\5),%%\6,%%\7            \\n"
      
      TBR=braveyao@chromium.org
      
      Bug: libyuv:702
      Test: try bots pass
      Change-Id: I472446606f7fd568fdf33aaacc22d5ed78673dab
      Reviewed-on: https://chromium-review.googlesource.com/865640Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      a875ed17
    • Frank Barchard's avatar
      Remove VEXTOPMEM x64 NaCL macros · 030042a2
      Frank Barchard authored
      VEXTOPMEM macros are deprecated in row.h
      
      Usage examples
          VEXTOPMEM(vextractf128,1,ymm0,0x0,1,2,1) // vextractf128 $1,%%ymm0,(%1,%2,1)
      
      Regular expressions to remove MEMACCESS macros:
      
      VEXTOPMEM\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
      "\1 $\2,%\3,\4(%\5,%\6,\7)        \\n"
      
      Bug: libyuv:702
      Test: try bots pass
      Change-Id: I177edf9813128408e74816672dd25abb03a5e1ca
      Reviewed-on: https://chromium-review.googlesource.com/865283Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      030042a2
    • Frank Barchard's avatar
      Remove MEMACCESS x64 NaCL macros · 5088f001
      Frank Barchard authored
      MEMACCESS macros are deprecated in row.h
      
      Usage examples
          "movdqu    " MEMACCESS(0) ",%%xmm0         \n"
          "movdqu    " MEMACCESS2(0x10,0) ",%%xmm1   \n"
      
      Regular expressions to remove MEMACCESS macros:
      
      " MEMACCESS2\((.*),(.*)\) "(.*)\\n"
      \1(%\2)\3              \\n"
      
      " MEMACCESS\((.*)\) "(.*)\\n"
      (%\1)\2            \\n"
      
      Bug: libyuv:702
      Test: try bots pass
      Change-Id: I42f62d5dede8ef2ea643e78c204371a7659d25e6
      Reviewed-on: https://chromium-review.googlesource.com/862803Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      Commit-Queue: Frank Barchard <fbarchard@chromium.org>
      5088f001
    • Frank Barchard's avatar
      Remove MEMOPARG x64 NaCL macros · e3797d17
      Frank Barchard authored
      MEMOPARG macros are deprecated in row.h
      
        #opcode " " #offset "(%" #base ",%" #index "," #scale "),%" #arg "\n"
      
      Usage examples
          MEMOPARG(movzwl,0x00,1,3,1,k2)             //  movzwl  (%1,%3,1),%k2
      
      Regular expression to remove MEMACCESS macro:
      
      MEMOPARG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
      "\1    \2(%\3,%\4,\5),%\6                \\n"
      
      Bug: libyuv:702
      Test: try bots pass
      Change-Id: I4a5ad2abf5017e651576f4c8c784be1c8dbf5a83
      Reviewed-on: https://chromium-review.googlesource.com/863108Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      e3797d17
  13. 10 Jan, 2018 4 commits
  14. 09 Jan, 2018 1 commit
  15. 08 Jan, 2018 2 commits
  16. 07 Jan, 2018 1 commit
    • Frank Barchard's avatar
      H010ToAR30 optimized to 2 step conversion · 9d2cd6a3
      Frank Barchard authored
      Previously H010ToAR30 was done in a 3 step conversion:
      H010ToH420, H420ToARGB, ARGBToAR30.
      This CL merges the first 2 steps into H010ToARGB, to
      improve performance.
      Caveat - only 10 bit YUV is supported at this time.
      Previously the low level code supported different numbers
      of bits - 9, 10, 12 or 16.
      
      Was 3 step conversion:
      LibYUVConvertTest.H010ToAR30_Any (1263 ms)
      LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
      LibYUVConvertTest.H010ToAR30_Invert (913 ms)
      LibYUVConvertTest.H010ToAR30_Opt (901 ms)
      
      Now 2 step conversion:
      LibYUVConvertTest.H010ToAR30_Any (853 ms)
      LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
      LibYUVConvertTest.H010ToAR30_Invert (781 ms)
      LibYUVConvertTest.H010ToAR30_Opt (755 ms)
      
      Bug: libyuv:751
      Test: LibYUVConvertTest.H010ToAR30_Opt
      Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
      Reviewed-on: https://chromium-review.googlesource.com/853288Reviewed-by: 's avatarFrank Barchard <fbarchard@chromium.org>
      Reviewed-by: 's avatarMiguel Casas <mcasas@chromium.org>
      9d2cd6a3
  17. 05 Jan, 2018 2 commits
  18. 04 Jan, 2018 1 commit
  19. 03 Jan, 2018 1 commit
  20. 02 Jan, 2018 4 commits
  21. 28 Dec, 2017 1 commit
  22. 21 Dec, 2017 1 commit