Commits · f2c27dafa2950510ba767cd59937ddf5d1974937 · submodule / libyuv

07 Nov, 2016 1 commit

HalfFloat neon armv7 fix for destination pointer. · f2c27daf

Frank Barchard authored Nov 07, 2016

Improved unittests detect different in arm64 rounding.

TEST=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*Half* -a "--libyuv_width=640 --libyuv_height=360"
BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2478313004 .

f2c27daf

01 Nov, 2016 1 commit

HalfFloat Neon for ARMv7. · eca08525

Frank Barchard authored Nov 01, 2016

64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.

TEST=out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2467723002 .

eca08525

27 Oct, 2016 1 commit

Add MSA optimized I422ToRGB565Row_MSA, I422ToARGB4444Row_MSA and I422ToARGB1555Row_MSA functions · 10ce829b

Frank Barchard authored Oct 27, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422ToRGB565Row_MSA             : ~1.5x
I422ToRGB565Row_Any_MSA         : ~1.5x
I422ToARGB4444Row_MSA           : ~1.4x
I422ToARGB4444Row_Any_MSA       : ~1.4x
I422ToARGB1555Row_MSA           : ~1.4x
I422ToARGB1555Row_Any_MSA       : ~1.4x

Performance Gain (vs C non-vectorized)
I422ToRGB565Row_MSA             : ~6.8x
I422ToRGB565Row_Any_MSA         : ~6.8x
I422ToARGB4444Row_MSA           : ~6.6x
I422ToARGB4444Row_Any_MSA       : ~6.6x
I422ToARGB1555Row_MSA           : ~6.6x
I422ToARGB1555Row_Any_MSA       : ~6.6x

Review URL: https://codereview.chromium.org/2445343007 .

10ce829b

26 Oct, 2016 3 commits

Add MSA optimized I422AlphaToARGBRow_MSA and I422ToRGB24Row_MSA functions · 532f5708

Frank Barchard authored Oct 26, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422AlphaToARGBRow_MSA      : ~1.4x
I422AlphaToARGBRow_Any_MSA  : ~1.4x
I422ToRGB24Row_MSA          : ~4.8x
I422ToRGB24Row_Any_MSA      : ~4.8x

Performance Gain (vs C non-vectorized)
I422AlphaToARGBRow_MSA      : ~7.0x
I422AlphaToARGBRow_Any_MSA  : ~7.0x
I422ToRGB24Row_MSA          : ~7.9x
I422ToRGB24Row_Any_MSA      : ~7.7x

Review URL: https://codereview.chromium.org/2454433003 .

532f5708

Line continuation at end of line with NOLINT before that. · 02ae8b60

Frank Barchard authored Oct 26, 2016

BUG=libyuv:634
TEST=git cl lint
TBR=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2453013003 .

02ae8b60

document GN for ios · 2c94d6bd

Frank Barchard authored Oct 26, 2016

BUG=libyuv:643
TEST=gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"arm64\"" && ninja -v -C out/Release libyuv_unittest
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2450853003 .

2c94d6bd

25 Oct, 2016 3 commits

cherry picking changes needed for deps roll. · 7c309c45

Frank Barchard authored Oct 25, 2016

DEPS roll is needed for mips builds.  These additional changes are also
needed for that DEPS roll.  These can be done separately.

TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=try bots

Review URL: https://codereview.chromium.org/2446043003 .

7c309c45

White spaces, comments and lint fixes for msa. · 2488b310

Frank Barchard authored Oct 25, 2016

no functional changes.

TBR=kjellander@chromium.org
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2446313002 .

2488b310

use __OPTIMIZE__ macro to determine debug vs release. · c2073823

Frank Barchard authored Oct 25, 2016

Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.

BUG=libyuv:602
TEST=untested
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2451503002 .

c2073823

24 Oct, 2016 1 commit

Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions · f5d5bd88

Frank Barchard authored Oct 24, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance Gains :- (vs C vectorized)

I422ToARGBRow_MSA     : ~1.6x
I422ToRGBARow_MSA     : ~1.6x

I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x

Performance Gains :- (vs C non-vectorized)

I422ToARGBRow_MSA     : ~7x
I422ToRGBARow_MSA     : ~7x

I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x

Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.

Review URL: https://codereview.chromium.org/2430313005 .

f5d5bd88

21 Oct, 2016 1 commit

scale by 1 for neon implemented · 451af5e9

Frank Barchard authored Oct 21, 2016

void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fcvtn      v4.4h, v2.4s                   \n"  // 8 floatsgit
    "fcvtn2     v4.8h, v1.4s                   \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  :
  : "cc", "memory", "v1", "v2", "v4"
  );
}

void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fmul       v2.4s, v2.4s, %3.s[0]          \n"  // adjust exponent
    "fmul       v1.4s, v1.4s, %3.s[0]          \n"
    "uqshrn     v4.4h, v2.4s, #13              \n"  // isolate halffloat
    "uqshrn2    v4.8h, v1.4s, #13              \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  : "w"(scale * 1.9259299444e-34f)    // %3
  : "cc", "memory", "v1", "v2", "v4"
  );
}

TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2430313008 .

451af5e9

20 Oct, 2016 2 commits

HalfFloat avx2 unpack bug fix. · 550cf829

Frank Barchard authored Oct 20, 2016

AVX unpack parameters were reverse ordered causing incorrect results
on AVX2 hardware.

TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2438893002 .

550cf829

HalfFloatPlane unittest for denormal half floats · f553db2d

Frank Barchard authored Oct 20, 2016

Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2424233004 .

f553db2d

19 Oct, 2016 1 commit

Add MSA optimized ARGB4444ToI420 and ARGB4444ToARGB functions · 78c58ab8

Frank Barchard authored Oct 19, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance gains : (Auto-vectorized C vs MSA SIMD)

ARGB4444ToYRow_MSA        : ~3.0x
ARGB4444ToUVRow_MSA       : ~1.8x
ARGB4444ToARGBRow_MSA     : ~3.4x

ARGB4444ToYRow_Any_MSA    : ~2.8x
ARGB4444ToUVRow_Any_MSA   : ~1.7x
ARGB4444ToARGBRow_Any_MSA : ~3.2x

Review URL: https://codereview.chromium.org/2421843002 .

78c58ab8

18 Oct, 2016 3 commits

cpu_id cleanup. no functional change. · e16e3a62

Frank Barchard authored Oct 18, 2016

remove old comment about initialize to zero.
remove ifdef and replace with macro defined to zero.

BUG=None
TEST=try bots
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2425623004 .

e16e3a62

landmine to clobber old GYP build artifacts to enable moving to GN. · 93f47948
Henrik Kjellander authored Oct 18, 2016
```
BUG=chromium:652188
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/2427643003 .
```
93f47948

PRESUBMIT: Remove GYP trybots · e0056693

Henrik Kjellander authored Oct 18, 2016

As they're being removed from the try server.

BUG=chromium:652188
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/2426693003 .

e0056693

17 Oct, 2016 3 commits

landmine to clobber old GYP build artifacts to enable moving to GN. · a0a549c5
Henrik Kjellander authored Oct 17, 2016
```
BUG=chromium:652188
TBR=ehmaldonado@chromium.org

Review URL: https://codereview.chromium.org/2421343002 .
```
a0a549c5

Add landmine support · 3d047196

Henrik Kjellander authored Oct 17, 2016

After switching bots from GYP to GN, build artifacts are left that fails
the next builds. Since it's unfeasible to clean out all bot machines
it's better to have an automated system for this, which is what landmines is.

By adding a line to tools/get_landmines.py it is possible to clobber each bot
that syncs past that "landmine CL".

BUG=chromium:652188
TBR=ehmaldonado@chromium.org

Review URL: https://codereview.chromium.org/2427633003 .

3d047196

PRESUBMIT: rename trybots from gn to gyp. · fcbb30f5

Henrik Kjellander authored Oct 17, 2016

After switching the default bots from GYP to GN,
we now only have a few GYP bots left, so rename the trybots
accordingly

BUG=chromium:652188
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/2425693002 .

fcbb30f5

15 Oct, 2016 1 commit

Port HalfFloatRow_SSE2 to AVX2 but not using F16C. · 2d80fc31

Frank Barchard authored Oct 15, 2016

R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2421993002 .

2d80fc31

14 Oct, 2016 2 commits

Add f16c (halffloat) cpuid · fdcf524a

Frank Barchard authored Oct 14, 2016

R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2418763006 .

fdcf524a

Port ARGBExtractAlpha_AVX2 function to windows. · 5333e94e

Frank Barchard authored Oct 14, 2016

BUG=libyuv:572
TEST=try bots
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2416783004 .

5333e94e

13 Oct, 2016 2 commits

Add ARGBExtractAlpha_AVX2 function · a5e93766

Frank Barchard authored Oct 13, 2016

Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2420553002 .

a5e93766

Add linux_use_bundled_binutils_override = true to build_overrides. · 9fb3c31b

Frank Barchard authored Oct 13, 2016

This variable was introduced in https://codereview.chromium.org/2293853002
and causes builds to fail, since is not defined in WebRTC.

BUG=webrtc:6281
TBR=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2418643003 .

9fb3c31b

12 Oct, 2016 2 commits

Cast for clang-cl 64 bit build warnings in unittests · 198bce39
Frank Barchard authored Oct 12, 2016
```
R=kjellander@chromium.org
BUG=libyuv:649

Review URL: https://codereview.chromium.org/2414763002 .
```
198bce39

Add GN files that need exec_script to list for win64 clang-cl · a7166c33

Frank Barchard authored Oct 12, 2016

TBR=kjellander@chromium.org
BUG=libyuv:649
TEST=call gn gen out\Release "--args=is_debug=false is_clang=true"

Review URL: https://codereview.chromium.org/2414783002 .

a7166c33

11 Oct, 2016 2 commits

Remove I411 support. · d363ea65

Frank Barchard authored Oct 11, 2016

YUV 411 is very uncommon format.  Remove support.

Update documentation to reflect that 411 is deprecated.

Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.

BUG=libyuv:645
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2406123002 .

d363ea65

Side by side 420 test · 0071f46a

Frank Barchard authored Oct 11, 2016

I420 output can be slow due to multi channel write.
Putting the U and V into a single side by side buffer can improve performance.

TBR=wangcheng@google.com
BUG=None

Review URL: https://codereview.chromium.org/2403223003 .

0071f46a

08 Oct, 2016 2 commits

YUY2ToI422 coalesce rows for small images · af87c11c

Frank Barchard authored Oct 08, 2016

TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToI422_Opt

Review URL: https://codereview.chromium.org/2393393006 .

af87c11c

libyuv::YUY2ToY for isolating Y channel of YUY2. · edd3a84d

Frank Barchard authored Oct 08, 2016

This function is the first step of YUY2 To I420.
Provided primarily for diagnostics.

TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToY_Opt

Review URL: https://codereview.chromium.org/2399153004 .

edd3a84d

07 Oct, 2016 1 commit

Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions · a2891ec7

Frank Barchard authored Oct 07, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance gains as below,

YUY2ToI422, YUY2ToI420 :-

YUY2ToYRow_MSA          : ~10x
YUY2ToUVRow_MSA         : ~11x
YUY2ToUV422Row_MSA      : ~9x
YUY2ToYRow_Any_MSA      : ~6x
YUY2ToUVRow_Any_MSA     : ~5x
YUY2ToUV422Row_Any_MSA  : ~4x

UYVYToI422, UYVYToI420 :-

UYVYToYRow_MSA          : ~10x
UYVYToUVRow_MSA         : ~11x
UYVYToUV422Row_MSA      : ~9x
UYVYToYRow_Any_MSA      : ~6x
UYVYToUVRow_Any_MSA     : ~5x
UYVYToUV422Row_Any_MSA  : ~4x

Review URL: https://codereview.chromium.org/2397693002 .

a2891ec7

06 Oct, 2016 1 commit

YUY2ToI422_Any_Neon clean up to not require 16 pixels · 3b88a19a

Frank Barchard authored Oct 06, 2016

YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
the last pixel. The replication was not necessary after a previous
change to treat YUY2 to 4 byte macro pixels.

TBR=harryjin@google.com
BUG=libyuv:648
TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"

Review URL: https://codereview.chromium.org/2399143002 .

3b88a19a

05 Oct, 2016 1 commit

GN: Add default target · 1cd38414

Frank Barchard authored Oct 05, 2016

This reduces the number of objects when not specifying a
build target during compile. This is especially significant for Android
where the number of objects decreases from 3322 to 1761.

BUG=libyuv:644
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/2395743002 .

1cd38414

04 Oct, 2016 2 commits

Enable optimize max for GN builds + update docs · 4b3b310e

Frank Barchard authored Oct 04, 2016

Optimize max enables O2 for official builds.  Normally release builds
are O2 but the official build is Os, affecting performance.
The GYP file was previously updated to enable optimize max,
which enables ltcg and O2.

Documentation updated to show GN builds in docs/getting_started.md

BUG=libyuv:642
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2386093003 .

4b3b310e

Add MSA optimized I422ToYUY2Row, I422ToUYVYRow functions · 7018f5be

Frank Barchard authored Oct 04, 2016

R=fbarchard@google.com
BUG=libyuv:634

Performance gains :-

I422ToYUY2Row_MSA     - ~12x
I422ToYUY2Row_Any_MSA - ~7x

I422ToUYVYRow_MSA     - ~12x
I422ToUYVYRow_Any_MSA - ~7x

Review URL: https://codereview.chromium.org/2378753004 .

7018f5be

03 Oct, 2016 1 commit

HalfFloat_SSE2 for Visual C · aa197ee1

Frank Barchard authored Oct 03, 2016

Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows
R=hubbe@chromium.org, wangcheng@google.com

Review URL: https://codereview.chromium.org/2387713002 .

aa197ee1

30 Sep, 2016 1 commit

HalfFloat_SSE2 port from C algorithm to SSE2 · 4a14cb2e

Frank Barchard authored Sep 30, 2016

Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2381493006 .

4a14cb2e

29 Sep, 2016 1 commit
- Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion. · 7fc932dd
  Frank Barchard authored Sep 29, 2016
```
BUG=libyuv:560,chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2371293002 .
```
  7fc932dd
28 Sep, 2016 1 commit

bt709 coefficients for video constrained space · c11e9b7f

Frank Barchard authored Sep 28, 2016

Original bt709 color space coefficients were full range yuv for higher
quality.  This change makes the coefficients use the video constrained
color space the same as bt601 which is 16 to 240 for Y and 16 to 235 for
chroma channels.

BUG=libyuv:639
TEST=libyuv unittests run locally
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2367253003 .

c11e9b7f