- 07 Dec, 2016 1 commit
-
-
Manojkumar Bhosale authored
R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) ScaleARGBRowDown2_MSA - ~2.6x ScaleARGBRowDown2Linear_MSA - ~7.9x ScaleARGBRowDown2Box_MSA - ~3.7x ScaleARGBRowDownEven_MSA - ~1.2x ScaleARGBRowDownEvenBox_MSA - ~3.5x ScaleARGBRowDown2_Any_MSA - ~2.6x ScaleARGBRowDown2Linear_Any_MSA - ~7.9x ScaleARGBRowDown2Box_Any_MSA - ~3.6x ScaleARGBRowDownEven_Any_MSA - ~1.2x ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x Performance Gain (vs C non-vectorized) ScaleARGBRowDown2_MSA - 2.6x ScaleARGBRowDown2Linear_MSA - 13.5x ScaleARGBRowDown2Box_MSA - 5.8x ScaleARGBRowDownEven_MSA - 1.2x ScaleARGBRowDownEvenBox_MSA - 3.7x ScaleARGBRowDown2_Any_MSA - 2.6x ScaleARGBRowDown2Linear_Any_MSA - 13.5x ScaleARGBRowDown2Box_Any_MSA - 5.3x ScaleARGBRowDownEven_Any_MSA - 1.2x ScaleARGBRowDownEvenBox_Any_MSA - 3.7x Review URL: https://codereview.chromium.org/2527983002 .
-
- 02 Dec, 2016 1 commit
-
-
Manojkumar Bhosale authored
R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) ARGBMultiplyRow_MSA - 1.4x ARGBAddRow_MSA - 8.6x ARGBSubtractRow_MSA - 8.6x ARGBMultiplyRow_Any_MSA - 1.35x ARGBAddRow_Any_MSA - 7.3x ARGBSubtractRow_Any_MSA - 7.2x Performance Gain (vs C non-vectorized) ARGBMultiplyRow_MSA - 4.4x ARGBAddRow_MSA - 27x ARGBSubtractRow_MSA - 22x ARGBMultiplyRow_Any_MSA - 3.5x ARGBAddRow_Any_MSA - 23x ARGBSubtractRow_Any_MSA - 18x Review URL: https://codereview.chromium.org/2529983002 .
-
- 22 Nov, 2016 1 commit
-
-
Frank Barchard authored
Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) ARGBToRGB565Row_MSA - ~1.6x ARGBToRGB565Row_Any_MSA - ~1.6x ARGBToARGB1555Row_MSA - ~1.3x ARGBToARGB1555Row_Any_MSA - ~1.3x ARGBToARGB4444Row_MSA - ~3.8x ARGBToARGB4444Row_Any_MSA - ~3.8x ARGBToUV444Row_MSA - ~2.4x ARGBToUV444Row_Any_MSA - ~2.4x Performance Gain (vs C non-vectorized) ARGBToRGB565Row_MSA - ~2.8x ARGBToRGB565Row_Any_MSA - ~2.8x ARGBToARGB1555Row_MSA - ~2.2x ARGBToARGB1555Row_Any_MSA - ~2.2x ARGBToARGB4444Row_MSA - ~6.8x ARGBToARGB4444Row_Any_MSA - ~6.6x ARGBToUV444Row_MSA - ~6.7x ARGBToUV444Row_Any_MSA - ~6.7x Review URL: https://codereview.chromium.org/2520003004 .
-
- 18 Nov, 2016 1 commit
-
-
Frank Barchard authored
R=fbarchard@google.com BUG=libyuv:634 Review URL: https://codereview.chromium.org/2487913004 .
-
- 09 Nov, 2016 1 commit
-
-
Frank Barchard authored
BUG=libyuv:658 TEST=g++ -I include -fPIC -m32 -msse2 -Os -fno-omit-frame-pointer -c source/row_gcc.cc -o row_gcc.o R=wangcheng@google.com Review URL: https://codereview.chromium.org/2482263003 .
-
- 08 Nov, 2016 3 commits
-
-
Frank Barchard authored
BUG=libyuv:654 TEST=try bots build R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2484083003 .
-
Frank Barchard authored
BUG=None TEST=None Review URL: https://codereview.chromium.org/2487603002 .
-
Frank Barchard authored
BUG=libyuv:654 R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2469353005 .
-
- 07 Nov, 2016 1 commit
-
-
Frank Barchard authored
Improved unittests detect different in arm64 rounding. TEST=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*Half* -a "--libyuv_width=640 --libyuv_height=360" BUG=libyuv:560 R=wangcheng@google.com Review URL: https://codereview.chromium.org/2478313004 .
-
- 01 Nov, 2016 1 commit
-
-
Frank Barchard authored
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values. TEST=out/Release/libyuv_unittest --gtest_filter=*Half* BUG=libyuv:560 R=wangcheng@google.com Review URL: https://codereview.chromium.org/2467723002 .
-
- 27 Oct, 2016 1 commit
-
-
Frank Barchard authored
R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) I422ToRGB565Row_MSA : ~1.5x I422ToRGB565Row_Any_MSA : ~1.5x I422ToARGB4444Row_MSA : ~1.4x I422ToARGB4444Row_Any_MSA : ~1.4x I422ToARGB1555Row_MSA : ~1.4x I422ToARGB1555Row_Any_MSA : ~1.4x Performance Gain (vs C non-vectorized) I422ToRGB565Row_MSA : ~6.8x I422ToRGB565Row_Any_MSA : ~6.8x I422ToARGB4444Row_MSA : ~6.6x I422ToARGB4444Row_Any_MSA : ~6.6x I422ToARGB1555Row_MSA : ~6.6x I422ToARGB1555Row_Any_MSA : ~6.6x Review URL: https://codereview.chromium.org/2445343007 .
-
- 26 Oct, 2016 3 commits
-
-
Frank Barchard authored
R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) I422AlphaToARGBRow_MSA : ~1.4x I422AlphaToARGBRow_Any_MSA : ~1.4x I422ToRGB24Row_MSA : ~4.8x I422ToRGB24Row_Any_MSA : ~4.8x Performance Gain (vs C non-vectorized) I422AlphaToARGBRow_MSA : ~7.0x I422AlphaToARGBRow_Any_MSA : ~7.0x I422ToRGB24Row_MSA : ~7.9x I422ToRGB24Row_Any_MSA : ~7.7x Review URL: https://codereview.chromium.org/2454433003 .
-
Frank Barchard authored
BUG=libyuv:634 TEST=git cl lint TBR=kjellander@chromium.org Review URL: https://codereview.chromium.org/2453013003 .
-
Frank Barchard authored
BUG=libyuv:643 TEST=gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"arm64\"" && ninja -v -C out/Release libyuv_unittest R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2450853003 .
-
- 25 Oct, 2016 3 commits
-
-
Frank Barchard authored
DEPS roll is needed for mips builds. These additional changes are also needed for that DEPS roll. These can be done separately. TBR=kjellander@chromium.org BUG=libyuv:634 TEST=try bots Review URL: https://codereview.chromium.org/2446043003 .
-
Frank Barchard authored
no functional changes. TBR=kjellander@chromium.org BUG=libyuv:634 Review URL: https://codereview.chromium.org/2446313002 .
-
Frank Barchard authored
Debug builds of x86 gcc/clang can run out of register. Previously NDEBUG or _DEBUG was used to detect a debug build. But those macros are not set by gentoo builds. This CL switches to the compiler predefine __OPTIMIZE__ which is built into clang and gcc. BUG=libyuv:602 TEST=untested R=wangcheng@google.com Review URL: https://codereview.chromium.org/2451503002 .
-
- 24 Oct, 2016 1 commit
-
-
Frank Barchard authored
R=fbarchard@google.com BUG=libyuv:634 Performance Gains :- (vs C vectorized) I422ToARGBRow_MSA : ~1.6x I422ToRGBARow_MSA : ~1.6x I422ToARGBRow_Any_MSA : ~1.58x I422ToRGBARow_Any_MSA : ~1.6x Performance Gains :- (vs C non-vectorized) I422ToARGBRow_MSA : ~7x I422ToRGBARow_MSA : ~7x I422ToARGBRow_Any_MSA : ~6.9x I422ToRGBARow_Any_MSA : ~6.8x Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA. Review URL: https://codereview.chromium.org/2430313005 .
-
- 21 Oct, 2016 1 commit
-
-
Frank Barchard authored
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fcvtn v4.4h, v2.4s \n" // 8 floatsgit "fcvtn2 v4.8h, v1.4s \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : : "cc", "memory", "v1", "v2", "v4" ); } void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent "fmul v1.4s, v1.4s, %3.s[0] \n" "uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat "uqshrn2 v4.8h, v1.4s, #13 \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : "w"(scale * 1.9259299444e-34f) // %3 : "cc", "memory", "v1", "v2", "v4" ); } TEST=LibYUVPlanarTest.TestHalfFloatPlane_One BUG=libyuv:560 R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2430313008 .
-
- 20 Oct, 2016 2 commits
-
-
Frank Barchard authored
AVX unpack parameters were reverse ordered causing incorrect results on AVX2 hardware. TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half* BUG=libyuv:560 R=wangcheng@google.com Review URL: https://codereview.chromium.org/2438893002 .
-
Frank Barchard authored
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals. TEST=TestHalfFloatPlane_denormal BUG=libyuv:560 R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2424233004 .
-
- 19 Oct, 2016 1 commit
-
-
Frank Barchard authored
R=fbarchard@google.com BUG=libyuv:634 Performance gains : (Auto-vectorized C vs MSA SIMD) ARGB4444ToYRow_MSA : ~3.0x ARGB4444ToUVRow_MSA : ~1.8x ARGB4444ToARGBRow_MSA : ~3.4x ARGB4444ToYRow_Any_MSA : ~2.8x ARGB4444ToUVRow_Any_MSA : ~1.7x ARGB4444ToARGBRow_Any_MSA : ~3.2x Review URL: https://codereview.chromium.org/2421843002 .
-
- 18 Oct, 2016 3 commits
-
-
Frank Barchard authored
remove old comment about initialize to zero. remove ifdef and replace with macro defined to zero. BUG=None TEST=try bots R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2425623004 .
-
Henrik Kjellander authored
BUG=chromium:652188 TBR=fbarchard@chromium.org Review URL: https://codereview.chromium.org/2427643003 .
-
Henrik Kjellander authored
As they're being removed from the try server. BUG=chromium:652188 TBR=fbarchard@chromium.org Review URL: https://codereview.chromium.org/2426693003 .
-
- 17 Oct, 2016 3 commits
-
-
Henrik Kjellander authored
BUG=chromium:652188 TBR=ehmaldonado@chromium.org Review URL: https://codereview.chromium.org/2421343002 .
-
Henrik Kjellander authored
After switching bots from GYP to GN, build artifacts are left that fails the next builds. Since it's unfeasible to clean out all bot machines it's better to have an automated system for this, which is what landmines is. By adding a line to tools/get_landmines.py it is possible to clobber each bot that syncs past that "landmine CL". BUG=chromium:652188 TBR=ehmaldonado@chromium.org Review URL: https://codereview.chromium.org/2427633003 .
-
Henrik Kjellander authored
After switching the default bots from GYP to GN, we now only have a few GYP bots left, so rename the trybots accordingly BUG=chromium:652188 TBR=fbarchard@chromium.org Review URL: https://codereview.chromium.org/2425693002 .
-
- 15 Oct, 2016 1 commit
-
-
Frank Barchard authored
R=wangcheng@google.com, hubbe@chromium.org BUG=libyuv:560 Review URL: https://codereview.chromium.org/2421993002 .
-
- 14 Oct, 2016 2 commits
-
-
Frank Barchard authored
R=wangcheng@google.com, hubbe@chromium.org BUG=libyuv:560 Review URL: https://codereview.chromium.org/2418763006 .
-
Frank Barchard authored
BUG=libyuv:572 TEST=try bots R=wangcheng@google.com, magjed@chromium.org Review URL: https://codereview.chromium.org/2416783004 .
-
- 13 Oct, 2016 2 commits
-
-
Frank Barchard authored
Port SSE2 version to AVX2. BUG=libyuv:572 TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract* R=wangcheng@google.com, magjed@chromium.org Review URL: https://codereview.chromium.org/2420553002 .
-
Frank Barchard authored
This variable was introduced in https://codereview.chromium.org/2293853002 and causes builds to fail, since is not defined in WebRTC. BUG=webrtc:6281 TBR=kjellander@chromium.org Review URL: https://codereview.chromium.org/2418643003 .
-
- 12 Oct, 2016 2 commits
-
-
Frank Barchard authored
R=kjellander@chromium.org BUG=libyuv:649 Review URL: https://codereview.chromium.org/2414763002 .
-
Frank Barchard authored
TBR=kjellander@chromium.org BUG=libyuv:649 TEST=call gn gen out\Release "--args=is_debug=false is_clang=true" Review URL: https://codereview.chromium.org/2414783002 .
-
- 11 Oct, 2016 2 commits
-
-
Frank Barchard authored
YUV 411 is very uncommon format. Remove support. Update documentation to reflect that 411 is deprecated. Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now. BUG=libyuv:645 R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2406123002 .
-
Frank Barchard authored
I420 output can be slow due to multi channel write. Putting the U and V into a single side by side buffer can improve performance. TBR=wangcheng@google.com BUG=None Review URL: https://codereview.chromium.org/2403223003 .
-
- 08 Oct, 2016 2 commits
-
-
Frank Barchard authored
TBR=wangcheng@google.com BUG=libyuv:647 TESTED=LibYUVConvertTest.YUY2ToI422_Opt Review URL: https://codereview.chromium.org/2393393006 .
-
Frank Barchard authored
This function is the first step of YUY2 To I420. Provided primarily for diagnostics. TBR=wangcheng@google.com BUG=libyuv:647 TESTED=LibYUVConvertTest.YUY2ToY_Opt Review URL: https://codereview.chromium.org/2399153004 .
-
- 07 Oct, 2016 1 commit
-
-
Frank Barchard authored
R=fbarchard@google.com BUG=libyuv:634 Performance gains as below, YUY2ToI422, YUY2ToI420 :- YUY2ToYRow_MSA : ~10x YUY2ToUVRow_MSA : ~11x YUY2ToUV422Row_MSA : ~9x YUY2ToYRow_Any_MSA : ~6x YUY2ToUVRow_Any_MSA : ~5x YUY2ToUV422Row_Any_MSA : ~4x UYVYToI422, UYVYToI420 :- UYVYToYRow_MSA : ~10x UYVYToUVRow_MSA : ~11x UYVYToUV422Row_MSA : ~9x UYVYToYRow_Any_MSA : ~6x UYVYToUVRow_Any_MSA : ~5x UYVYToUV422Row_Any_MSA : ~4x Review URL: https://codereview.chromium.org/2397693002 .
-