• Frank Barchard's avatar
    I422ToYUY2Row_AVX2 use vpmovzxbd instead of vpermq · 7ff53f32
    Frank Barchard authored
    I422ToYUY2Row_AVX2 optimized from 7 cycles per 32 pixels to 6 cycles.
    Instead of 2 vpermq and vpunpcklbw:
    vmovdqu    (%1),%%xmm2
    vmovdqu    0x00(%1,%2,1),%%xmm3
    lea        0x10(%1),%1
    vpermq     $0xd8,%%ymm2,%%ymm2
    vpermq     $0xd8,%%ymm3,%%ymm3
    vpunpcklbw %%ymm3,%%ymm2,%%ymm2
    
    ..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
    vpmovzxbd  (%1),%%ymm2
    vpmovzxbd  0x00(%1,%2,1),%%ymm3
    vpslld     $0x10,%%ymm3,%%ymm3
    vpor       %%ymm3,%%ymm2,%%ymm2
    which reduces the port 5 bottleneck by 1 cycle.
    
    Bug: libyuv:556
    Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt
    
    I422ToYUY2Row_AVX2 optimization
    
    Improve performance of AVX2 code by avoiding vpermq
    
    Bug: libyuv:556
    Test: /usr/local/google/home/fbarchard/iaca-lin64/bin/iaca.sh -reduceout -arch BDW out/Release/obj/libyuv_internal/row_gcc.o
    Change-Id: Ie36732da23ecea1ffcc6b297bacc962780b59ef1
    Reviewed-on: https://chromium-review.googlesource.com/898067
    Commit-Queue: Frank Barchard <fbarchard@chromium.org>
    Reviewed-by: 's avatarrichard winterton <rrwinterton@gmail.com>
    7ff53f32
Name
Last commit
Last update
..
compare.cc Loading commit data...
compare_common.cc Loading commit data...
compare_gcc.cc Loading commit data...
compare_msa.cc Loading commit data...
compare_neon.cc Loading commit data...
compare_neon64.cc Loading commit data...
compare_win.cc Loading commit data...
convert.cc Loading commit data...
convert_argb.cc Loading commit data...
convert_from.cc Loading commit data...
convert_from_argb.cc Loading commit data...
convert_jpeg.cc Loading commit data...
convert_to_argb.cc Loading commit data...
convert_to_i420.cc Loading commit data...
cpu_id.cc Loading commit data...
mjpeg_decoder.cc Loading commit data...
mjpeg_validate.cc Loading commit data...
planar_functions.cc Loading commit data...
rotate.cc Loading commit data...
rotate_any.cc Loading commit data...
rotate_argb.cc Loading commit data...
rotate_common.cc Loading commit data...
rotate_gcc.cc Loading commit data...
rotate_msa.cc Loading commit data...
rotate_neon.cc Loading commit data...
rotate_neon64.cc Loading commit data...
rotate_win.cc Loading commit data...
row_any.cc Loading commit data...
row_common.cc Loading commit data...
row_gcc.cc Loading commit data...
row_msa.cc Loading commit data...
row_neon.cc Loading commit data...
row_neon64.cc Loading commit data...
row_win.cc Loading commit data...
scale.cc Loading commit data...
scale_any.cc Loading commit data...
scale_argb.cc Loading commit data...
scale_common.cc Loading commit data...
scale_gcc.cc Loading commit data...
scale_msa.cc Loading commit data...
scale_neon.cc Loading commit data...
scale_neon64.cc Loading commit data...
scale_win.cc Loading commit data...
video_common.cc Loading commit data...