• Frank Barchard's avatar
    I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq · 5790a765
    Frank Barchard authored
    I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
    Instead of 2 vpermq and vpunpcklbw:
    vmovdqu    (%1),%%xmm2
    vmovdqu    0x00(%1,%2,1),%%xmm3
    vpermq     $0xd8,%%ymm2,%%ymm2
    vpermq     $0xd8,%%ymm3,%%ymm3
    vpunpcklbw %%ymm3,%%ymm2,%%ymm2
    
    ..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
    vpmovzxbd  (%1),%%ymm2
    vpmovzxbd  0x00(%1,%2,1),%%ymm3
    vpslld     $0x10,%%ymm3,%%ymm3
    vpor       %%ymm3,%%ymm2,%%ymm2
    which reduces the port 5 bottleneck by 1 cycle.
    
    Bug: libyuv:556
    Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt
    
    Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
    Reviewed-on: https://chromium-review.googlesource.com/899873
    Commit-Queue: Frank Barchard <fbarchard@chromium.org>
    Reviewed-by: 's avatarrichard winterton <rrwinterton@gmail.com>
    Reviewed-by: 's avatarCheng Wang <wangcheng@google.com>
    5790a765
Name
Last commit
Last update
..
compare.cc Loading commit data...
compare_common.cc Loading commit data...
compare_gcc.cc Loading commit data...
compare_msa.cc Loading commit data...
compare_neon.cc Loading commit data...
compare_neon64.cc Loading commit data...
compare_win.cc Loading commit data...
convert.cc Loading commit data...
convert_argb.cc Loading commit data...
convert_from.cc Loading commit data...
convert_from_argb.cc Loading commit data...
convert_jpeg.cc Loading commit data...
convert_to_argb.cc Loading commit data...
convert_to_i420.cc Loading commit data...
cpu_id.cc Loading commit data...
mjpeg_decoder.cc Loading commit data...
mjpeg_validate.cc Loading commit data...
planar_functions.cc Loading commit data...
rotate.cc Loading commit data...
rotate_any.cc Loading commit data...
rotate_argb.cc Loading commit data...
rotate_common.cc Loading commit data...
rotate_gcc.cc Loading commit data...
rotate_msa.cc Loading commit data...
rotate_neon.cc Loading commit data...
rotate_neon64.cc Loading commit data...
rotate_win.cc Loading commit data...
row_any.cc Loading commit data...
row_common.cc Loading commit data...
row_gcc.cc Loading commit data...
row_msa.cc Loading commit data...
row_neon.cc Loading commit data...
row_neon64.cc Loading commit data...
row_win.cc Loading commit data...
scale.cc Loading commit data...
scale_any.cc Loading commit data...
scale_argb.cc Loading commit data...
scale_common.cc Loading commit data...
scale_gcc.cc Loading commit data...
scale_msa.cc Loading commit data...
scale_neon.cc Loading commit data...
scale_neon64.cc Loading commit data...
scale_win.cc Loading commit data...
video_common.cc Loading commit data...