Commits · 56b5bbb0be6368c50dca5e120a3d72379e1ffbad · submodule / libyuv

07 Dec, 2016 1 commit

Add MSA optimized ARGB scaling functions · 56b5bbb0

Manojkumar Bhosale authored Dec 07, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ScaleARGBRowDown2_MSA           - ~2.6x
ScaleARGBRowDown2Linear_MSA     - ~7.9x
ScaleARGBRowDown2Box_MSA        - ~3.7x
ScaleARGBRowDownEven_MSA        - ~1.2x
ScaleARGBRowDownEvenBox_MSA     - ~3.5x

ScaleARGBRowDown2_Any_MSA       - ~2.6x
ScaleARGBRowDown2Linear_Any_MSA - ~7.9x
ScaleARGBRowDown2Box_Any_MSA    - ~3.6x
ScaleARGBRowDownEven_Any_MSA    - ~1.2x
ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x

Performance Gain (vs C non-vectorized)
ScaleARGBRowDown2_MSA           - 2.6x
ScaleARGBRowDown2Linear_MSA     - 13.5x
ScaleARGBRowDown2Box_MSA        - 5.8x
ScaleARGBRowDownEven_MSA        - 1.2x
ScaleARGBRowDownEvenBox_MSA     - 3.7x

ScaleARGBRowDown2_Any_MSA       - 2.6x
ScaleARGBRowDown2Linear_Any_MSA - 13.5x
ScaleARGBRowDown2Box_Any_MSA    - 5.3x
ScaleARGBRowDownEven_Any_MSA    - 1.2x
ScaleARGBRowDownEvenBox_Any_MSA - 3.7x

Review URL: https://codereview.chromium.org/2527983002 .

56b5bbb0

02 Dec, 2016 1 commit

Add MSA optimized ARGB Multiply/Add/Subtract row functions · 83f460be

Manojkumar Bhosale authored Dec 02, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBMultiplyRow_MSA       - 1.4x
ARGBAddRow_MSA            - 8.6x
ARGBSubtractRow_MSA       - 8.6x

ARGBMultiplyRow_Any_MSA   - 1.35x
ARGBAddRow_Any_MSA        - 7.3x
ARGBSubtractRow_Any_MSA   - 7.2x

Performance Gain (vs C non-vectorized)
ARGBMultiplyRow_MSA       - 4.4x
ARGBAddRow_MSA            - 27x
ARGBSubtractRow_MSA       - 22x

ARGBMultiplyRow_Any_MSA   - 3.5x
ARGBAddRow_Any_MSA        - 23x
ARGBSubtractRow_Any_MSA   - 18x

Review URL: https://codereview.chromium.org/2529983002 .

83f460be

22 Nov, 2016 1 commit

Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA,… · da0c29da

Frank Barchard authored Nov 22, 2016

Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions

R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBToRGB565Row_MSA       - ~1.6x
ARGBToRGB565Row_Any_MSA   - ~1.6x
ARGBToARGB1555Row_MSA     - ~1.3x
ARGBToARGB1555Row_Any_MSA - ~1.3x
ARGBToARGB4444Row_MSA     - ~3.8x
ARGBToARGB4444Row_Any_MSA - ~3.8x
ARGBToUV444Row_MSA        - ~2.4x
ARGBToUV444Row_Any_MSA    - ~2.4x

Performance Gain (vs C non-vectorized)
ARGBToRGB565Row_MSA       - ~2.8x
ARGBToRGB565Row_Any_MSA   - ~2.8x
ARGBToARGB1555Row_MSA     - ~2.2x
ARGBToARGB1555Row_Any_MSA - ~2.2x
ARGBToARGB4444Row_MSA     - ~6.8x
ARGBToARGB4444Row_Any_MSA - ~6.6x
ARGBToUV444Row_MSA        - ~6.7x
ARGBToUV444Row_Any_MSA    - ~6.7x

Review URL: https://codereview.chromium.org/2520003004 .

da0c29da

18 Nov, 2016 1 commit
- Add MSA optimized ARGBToRGB24Row_MSA and ARGBToRAWRow_MSA functions · b1504a8e
  Frank Barchard authored Nov 18, 2016
```
R=fbarchard@google.com
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2487913004 .
```
  b1504a8e
09 Nov, 2016 1 commit

disable I422AlphaToARGBRow_SSSE3 for 32 bit fpic · 97fb18b8

Frank Barchard authored Nov 09, 2016

BUG=libyuv:658
TEST=g++ -I include  -fPIC -m32 -msse2 -Os -fno-omit-frame-pointer -c source/row_gcc.cc -o row_gcc.o
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2482263003 .

97fb18b8

08 Nov, 2016 3 commits

clang-format row_gcc.cc with some functions disabled · 3028e1bd

Frank Barchard authored Nov 08, 2016

BUG=libyuv:654
TEST=try bots build
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2484083003 .

3028e1bd

Remove unused time variables · c2bc1561
Frank Barchard authored Nov 08, 2016
```
BUG=None
TEST=None

Review URL: https://codereview.chromium.org/2487603002 .
```
c2bc1561

clang-format libyuv · e62309f2

Frank Barchard authored Nov 08, 2016

BUG=libyuv:654
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2469353005 .

e62309f2

07 Nov, 2016 1 commit

HalfFloat neon armv7 fix for destination pointer. · f2c27daf

Frank Barchard authored Nov 07, 2016

Improved unittests detect different in arm64 rounding.

TEST=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*Half* -a "--libyuv_width=640 --libyuv_height=360"
BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2478313004 .

f2c27daf

01 Nov, 2016 1 commit

HalfFloat Neon for ARMv7. · eca08525

Frank Barchard authored Nov 01, 2016

64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.

TEST=out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2467723002 .

eca08525

27 Oct, 2016 1 commit

Add MSA optimized I422ToRGB565Row_MSA, I422ToARGB4444Row_MSA and I422ToARGB1555Row_MSA functions · 10ce829b

Frank Barchard authored Oct 27, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422ToRGB565Row_MSA             : ~1.5x
I422ToRGB565Row_Any_MSA         : ~1.5x
I422ToARGB4444Row_MSA           : ~1.4x
I422ToARGB4444Row_Any_MSA       : ~1.4x
I422ToARGB1555Row_MSA           : ~1.4x
I422ToARGB1555Row_Any_MSA       : ~1.4x

Performance Gain (vs C non-vectorized)
I422ToRGB565Row_MSA             : ~6.8x
I422ToRGB565Row_Any_MSA         : ~6.8x
I422ToARGB4444Row_MSA           : ~6.6x
I422ToARGB4444Row_Any_MSA       : ~6.6x
I422ToARGB1555Row_MSA           : ~6.6x
I422ToARGB1555Row_Any_MSA       : ~6.6x

Review URL: https://codereview.chromium.org/2445343007 .

10ce829b

26 Oct, 2016 3 commits

Add MSA optimized I422AlphaToARGBRow_MSA and I422ToRGB24Row_MSA functions · 532f5708

Frank Barchard authored Oct 26, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422AlphaToARGBRow_MSA      : ~1.4x
I422AlphaToARGBRow_Any_MSA  : ~1.4x
I422ToRGB24Row_MSA          : ~4.8x
I422ToRGB24Row_Any_MSA      : ~4.8x

Performance Gain (vs C non-vectorized)
I422AlphaToARGBRow_MSA      : ~7.0x
I422AlphaToARGBRow_Any_MSA  : ~7.0x
I422ToRGB24Row_MSA          : ~7.9x
I422ToRGB24Row_Any_MSA      : ~7.7x

Review URL: https://codereview.chromium.org/2454433003 .

532f5708

Line continuation at end of line with NOLINT before that. · 02ae8b60

Frank Barchard authored Oct 26, 2016

BUG=libyuv:634
TEST=git cl lint
TBR=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2453013003 .

02ae8b60

document GN for ios · 2c94d6bd

Frank Barchard authored Oct 26, 2016

BUG=libyuv:643
TEST=gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"arm64\"" && ninja -v -C out/Release libyuv_unittest
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2450853003 .

2c94d6bd

25 Oct, 2016 3 commits

cherry picking changes needed for deps roll. · 7c309c45

Frank Barchard authored Oct 25, 2016

DEPS roll is needed for mips builds.  These additional changes are also
needed for that DEPS roll.  These can be done separately.

TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=try bots

Review URL: https://codereview.chromium.org/2446043003 .

7c309c45

White spaces, comments and lint fixes for msa. · 2488b310

Frank Barchard authored Oct 25, 2016

no functional changes.

TBR=kjellander@chromium.org
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2446313002 .

2488b310

use __OPTIMIZE__ macro to determine debug vs release. · c2073823

Frank Barchard authored Oct 25, 2016

Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.

BUG=libyuv:602
TEST=untested
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2451503002 .

c2073823

24 Oct, 2016 1 commit

Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions · f5d5bd88

Frank Barchard authored Oct 24, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gains :- (vs C vectorized)

I422ToARGBRow_MSA     : ~1.6x
I422ToRGBARow_MSA     : ~1.6x

I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x

Performance Gains :- (vs C non-vectorized)

I422ToARGBRow_MSA     : ~7x
I422ToRGBARow_MSA     : ~7x

I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x

Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.

Review URL: https://codereview.chromium.org/2430313005 .

f5d5bd88

21 Oct, 2016 1 commit

scale by 1 for neon implemented · 451af5e9

Frank Barchard authored Oct 21, 2016

void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fcvtn      v4.4h, v2.4s                   \n"  // 8 floatsgit
    "fcvtn2     v4.8h, v1.4s                   \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  :
  : "cc", "memory", "v1", "v2", "v4"
  );
}

void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fmul       v2.4s, v2.4s, %3.s[0]          \n"  // adjust exponent
    "fmul       v1.4s, v1.4s, %3.s[0]          \n"
    "uqshrn     v4.4h, v2.4s, #13              \n"  // isolate halffloat
    "uqshrn2    v4.8h, v1.4s, #13              \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  : "w"(scale * 1.9259299444e-34f)    // %3
  : "cc", "memory", "v1", "v2", "v4"
  );
}

TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2430313008 .

451af5e9

20 Oct, 2016 2 commits

HalfFloat avx2 unpack bug fix. · 550cf829

Frank Barchard authored Oct 20, 2016

AVX unpack parameters were reverse ordered causing incorrect results
on AVX2 hardware.

TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2438893002 .

550cf829

HalfFloatPlane unittest for denormal half floats · f553db2d

Frank Barchard authored Oct 20, 2016

Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2424233004 .

f553db2d

19 Oct, 2016 1 commit

Add MSA optimized ARGB4444ToI420 and ARGB4444ToARGB functions · 78c58ab8

Frank Barchard authored Oct 19, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance gains : (Auto-vectorized C vs MSA SIMD)

ARGB4444ToYRow_MSA        : ~3.0x
ARGB4444ToUVRow_MSA       : ~1.8x
ARGB4444ToARGBRow_MSA     : ~3.4x

ARGB4444ToYRow_Any_MSA    : ~2.8x
ARGB4444ToUVRow_Any_MSA   : ~1.7x
ARGB4444ToARGBRow_Any_MSA : ~3.2x

Review URL: https://codereview.chromium.org/2421843002 .

78c58ab8

18 Oct, 2016 3 commits

cpu_id cleanup. no functional change. · e16e3a62

Frank Barchard authored Oct 18, 2016

remove old comment about initialize to zero.
remove ifdef and replace with macro defined to zero.

BUG=None
TEST=try bots
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2425623004 .

e16e3a62

landmine to clobber old GYP build artifacts to enable moving to GN. · 93f47948
Henrik Kjellander authored Oct 18, 2016
```
BUG=chromium:652188
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/2427643003 .
```
93f47948

PRESUBMIT: Remove GYP trybots · e0056693

Henrik Kjellander authored Oct 18, 2016

As they're being removed from the try server.

BUG=chromium:652188
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/2426693003 .

e0056693

17 Oct, 2016 3 commits

landmine to clobber old GYP build artifacts to enable moving to GN. · a0a549c5
Henrik Kjellander authored Oct 17, 2016
```
BUG=chromium:652188
TBR=ehmaldonado@chromium.org

Review URL: https://codereview.chromium.org/2421343002 .
```
a0a549c5

Add landmine support · 3d047196

Henrik Kjellander authored Oct 17, 2016

After switching bots from GYP to GN, build artifacts are left that fails
the next builds. Since it's unfeasible to clean out all bot machines
it's better to have an automated system for this, which is what landmines is.

By adding a line to tools/get_landmines.py it is possible to clobber each bot
that syncs past that "landmine CL".

BUG=chromium:652188
TBR=ehmaldonado@chromium.org

Review URL: https://codereview.chromium.org/2427633003 .

3d047196

PRESUBMIT: rename trybots from gn to gyp. · fcbb30f5

Henrik Kjellander authored Oct 17, 2016

After switching the default bots from GYP to GN,
we now only have a few GYP bots left, so rename the trybots
accordingly

BUG=chromium:652188
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/2425693002 .

fcbb30f5

15 Oct, 2016 1 commit

Port HalfFloatRow_SSE2 to AVX2 but not using F16C. · 2d80fc31

Frank Barchard authored Oct 15, 2016

R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2421993002 .

2d80fc31

14 Oct, 2016 2 commits

Add f16c (halffloat) cpuid · fdcf524a

Frank Barchard authored Oct 14, 2016

R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2418763006 .

fdcf524a

Port ARGBExtractAlpha_AVX2 function to windows. · 5333e94e

Frank Barchard authored Oct 14, 2016

BUG=libyuv:572
TEST=try bots
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2416783004 .

5333e94e

13 Oct, 2016 2 commits

Add ARGBExtractAlpha_AVX2 function · a5e93766

Frank Barchard authored Oct 13, 2016

Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2420553002 .

a5e93766

Add linux_use_bundled_binutils_override = true to build_overrides. · 9fb3c31b

Frank Barchard authored Oct 13, 2016

This variable was introduced in https://codereview.chromium.org/2293853002
and causes builds to fail, since is not defined in WebRTC.

BUG=webrtc:6281
TBR=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2418643003 .

9fb3c31b

12 Oct, 2016 2 commits

Cast for clang-cl 64 bit build warnings in unittests · 198bce39
Frank Barchard authored Oct 12, 2016
```
R=kjellander@chromium.org
BUG=libyuv:649

Review URL: https://codereview.chromium.org/2414763002 .
```
198bce39

Add GN files that need exec_script to list for win64 clang-cl · a7166c33

Frank Barchard authored Oct 12, 2016

TBR=kjellander@chromium.org
BUG=libyuv:649
TEST=call gn gen out\Release "--args=is_debug=false is_clang=true"

Review URL: https://codereview.chromium.org/2414783002 .

a7166c33

11 Oct, 2016 2 commits

Remove I411 support. · d363ea65

Frank Barchard authored Oct 11, 2016

YUV 411 is very uncommon format.  Remove support.

Update documentation to reflect that 411 is deprecated.

Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.

BUG=libyuv:645
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2406123002 .

d363ea65

Side by side 420 test · 0071f46a

Frank Barchard authored Oct 11, 2016

I420 output can be slow due to multi channel write.
Putting the U and V into a single side by side buffer can improve performance.

TBR=wangcheng@google.com
BUG=None

Review URL: https://codereview.chromium.org/2403223003 .

0071f46a

08 Oct, 2016 2 commits

YUY2ToI422 coalesce rows for small images · af87c11c

Frank Barchard authored Oct 08, 2016

TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToI422_Opt

Review URL: https://codereview.chromium.org/2393393006 .

af87c11c

libyuv::YUY2ToY for isolating Y channel of YUY2. · edd3a84d

Frank Barchard authored Oct 08, 2016

This function is the first step of YUY2 To I420.
Provided primarily for diagnostics.

TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToY_Opt

Review URL: https://codereview.chromium.org/2399153004 .

edd3a84d

07 Oct, 2016 1 commit

Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions · a2891ec7

Frank Barchard authored Oct 07, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance gains as below,

YUY2ToI422, YUY2ToI420 :-

YUY2ToYRow_MSA          : ~10x
YUY2ToUVRow_MSA         : ~11x
YUY2ToUV422Row_MSA      : ~9x
YUY2ToYRow_Any_MSA      : ~6x
YUY2ToUVRow_Any_MSA     : ~5x
YUY2ToUV422Row_Any_MSA  : ~4x

UYVYToI422, UYVYToI420 :-

UYVYToYRow_MSA          : ~10x
UYVYToUVRow_MSA         : ~11x
UYVYToUV422Row_MSA      : ~9x
UYVYToYRow_Any_MSA      : ~6x
UYVYToUVRow_Any_MSA     : ~5x
UYVYToUV422Row_Any_MSA  : ~4x

Review URL: https://codereview.chromium.org/2397693002 .

a2891ec7