• Manojkumar Bhosale's avatar
    Add MSA optimized remaining scale row functions · 288bfbef
    Manojkumar Bhosale authored
    R=fbarchard@google.com
    BUG=libyuv:634
    
    Performance Gain (vs C vectorized)
    ScaleRowDown2_MSA            - ~22.3x
    ScaleRowDown2_Any_MSA        - ~19.9x
    ScaleRowDown2Linear_MSA      - ~31.2x
    ScaleRowDown2Linear_Any_MSA  - ~29.4x
    ScaleRowDown2Box_MSA         - ~20.1x
    ScaleRowDown2Box_Any_MSA     - ~19.6x
    ScaleRowDown4_MSA            - ~11.7x
    ScaleRowDown4_Any_MSA        - ~11.2x
    ScaleRowDown4Box_MSA         - ~15.1x
    ScaleRowDown4Box_Any_MSA     - ~15.1x
    ScaleRowDown38_MSA           - ~1x
    ScaleRowDown38_Any_MSA       - ~1x
    ScaleRowDown38_2_Box_MSA     - ~1.7x
    ScaleRowDown38_2_Box_Any_MSA - ~1.7x
    ScaleRowDown38_3_Box_MSA     - ~1.7x
    ScaleRowDown38_3_Box_Any_MSA - ~1.7x
    ScaleAddRow_MSA              - ~1.2x
    ScaleAddRow_Any_MSA          - ~1.15x
    
    Performance Gain (vs C non-vectorized)
    ScaleRowDown2_MSA            - ~22.4x
    ScaleRowDown2_Any_MSA        - ~19.8x
    ScaleRowDown2Linear_MSA      - ~31.6x
    ScaleRowDown2Linear_Any_MSA  - ~29.4x
    ScaleRowDown2Box_MSA         - ~20.1x
    ScaleRowDown2Box_Any_MSA     - ~19.6x
    ScaleRowDown4_MSA            - ~11.7x
    ScaleRowDown4_Any_MSA        - ~11.2x
    ScaleRowDown4Box_MSA         - ~15.1x
    ScaleRowDown4Box_Any_MSA     - ~15.1x
    ScaleRowDown38_MSA           - ~3.2x
    ScaleRowDown38_Any_MSA       - ~3.2x
    ScaleRowDown38_2_Box_MSA     - ~2.4x
    ScaleRowDown38_2_Box_Any_MSA - ~2.3x
    ScaleRowDown38_3_Box_MSA     - ~2.9x
    ScaleRowDown38_3_Box_Any_MSA - ~2.8x
    ScaleAddRow_MSA              - ~8x
    ScaleAddRow_Any_MSA          - ~7.46x
    
    Review-Url: https://codereview.chromium.org/2559683002 .
    288bfbef
scale_msa.cc 19.6 KB