• Frank Barchard's avatar
    RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996… · 3db22ebc
    Frank Barchard authored
    RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996 ms, now 3309.  20.7% faster.
    
    Call a row function for each row, based on ARGBToI400 code.
    But implement row functions as 2 step conversion.  Adds the
    row functions:
    RAWToYJ, RGBToYJ, SSSE3 and AVX2 versions, and Any versions.
    The smaller row buffer is more cache friendly on large images.
    
    The max cache size can be configured, and is currently:
    // Maximum temporary width for wrappers to process at a time, in pixels.
    And the row buffer is
      SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4]);
    So 8192 bytes are used for the row buffer, leaving the rest for source
    and destination buffers.
    
    blaze-bin/third_party/libyuv/libyuv_test '--gunit_filter=*R*To?400_Opt' --libyuv_width=3600 --libyuv_height=2500 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 | sortms
    
    Was
    RAWToJ400_Opt (3996 ms)
    ARGBToI400_Opt (3964 ms)
    RGB24ToJ400_Opt (3960 ms)
    ARGBToJ400_Opt (3909 ms)
    RGBAToJ400_Opt (3885 ms)
    
    Now
    ARGBToJ400_Opt (4091 ms)
    ARGBToI400_Opt (3936 ms)
    RGBAToJ400_Opt (3428 ms)
    RGB24ToJ400_Opt (3324 ms)
    RAWToJ400_Opt (3309 ms)
    
    Bug: libyuv:854, b/147753855
    Change-Id: Ieb65fbda94e812c737f4c3c74107354b73c4bcd2
    Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2016203Reviewed-by: 's avatarrichard winterton <rrwinterton@gmail.com>
    Commit-Queue: Frank Barchard <fbarchard@chromium.org>
    3db22ebc
row_common.cc 114 KB