- 09 Mar, 2018 1 commit
-
-
fenglei.tian authored
-
- 08 Mar, 2018 15 commits
-
-
fenglei.tian authored
-
fenglei.tian authored
-
fenglei.tian authored
:
-
Fenglei Tian authored
Merge branch 'tfl/gpu_emitter_template' of github.com:NervanaSystems/private-ngraph-cpp into tfl/gpu_emitter_template
-
Fenglei Tian authored
-
fenglei.tian authored
-
Fenglei Tian authored
-
Fenglei Tian authored
-
Chris Sullivan authored
* straightforward gpu.cos implementation following previous patterns prior to refactor * Generalized unary elementwise gpu op impl.. New unary elementwise ops can be added to the type annotations in gpu_cuda_kernel_ops.hpp. Next step is to refactor the llvm interface in gpu_emitters.hpp for similar generality. * Added gpu_emitter.hpp:EmitUnaryElementwise. Function adds cuda kernel based on ngraph::op::op_type::description. This can service all unary elementwise ops run on the gpu. * The following elementwise unary ops now use the EmitUnaryElementwise emitter: * GPU.abs * GPU.acos * GPU.asin * GPU.atan * GPU.ceiling * GPU.cos * GPU.cosh * GPU.exp * GPU.floor * GPU.log * GPU.not * GPU.sign * GPU.sin * GPU.sinh * GPU.tan * GPU.tanh Unary elementwise ops Sign and Not need extra consideration. * tanh test changed to test::all_close for fp comparison (also done for tan in commit 65fa7c6de34c8277fe2a4801644f6bb64574f4ff). * GPU backend skips added for recent softmax test and updated aliased output test that uses op::Constant. * code format update * changed cuda builder interface names to unary/binary/arbitrary, added impl. note to gpu_cuda_kernel_ops, cleaned code format * updated ngraph-cpp reference * Fixing incorrect github conflict resolution. * Added GPU emitter for op::Result. For now it simply copies the output tensor. All but 3 tests now pass. The remaining failing tests are: * GPU.dot_0_0 * GPU.dot_matrix_2x0_0x2 * GPU.dot_2x0_0 * Removed call to handle memory aliasing in gpu_external_function. * fix gpu emitter bug that will return in the middle of function * Merge pull request #609 from NervanaSystems/tfl/fix_return_bug fix gpu emitter bug that will return in the middle of function * GPU backend skips added for recent softmax test and updated aliased output test that uses op::Constant.
-
fenglei.tian authored
-
fenglei.tian authored
-
Fenglei Tian authored
-
Fenglei authored
Fix constant bug on GPU
-
Robert Kimball authored
-
Chris Sullivan authored
* Added GPU emitter for op::Result. For now it simply copies the output tensor. All but 3 tests now pass. The remaining failing tests are: * GPU.dot_0_0 * GPU.dot_matrix_2x0_0x2 * GPU.dot_2x0_0 * Removed call to handle memory aliasing in gpu_external_function. * fix gpu emitter bug that will return in the middle of function * Merge pull request #609 from NervanaSystems/tfl/fix_return_bug fix gpu emitter bug that will return in the middle of function * GPU backend skips added for recent softmax test and updated aliased output test that uses op::Constant.
-
- 07 Mar, 2018 10 commits
-
-
Chris Sullivan authored
-
Pruthvi authored
* - Added support optimized bn mkldnn implementation in cpu emitter - modified bn unit_test to support new implementation - added layout assignment for bn op - Style Fix (cherry picked from commit 7747a40806d62c126059d5c873adcd2e61a0adb0) * modified value initilization in cpu_fusion to be float explicit (cherry picked from commit 03499d380073d0197ab8cbd154eb03f63b042a48) * fix compilation issue * Addressed PR comments - added exception if gamma and beta layout isnot equal to memory::format::x - throw exception if bn Op is not mkldnn op * fix compilation issue * added support to handle multiple o/ps in fprop bn fusion * - Removed laytout pass for bn - fixed autodiff bug in bn - added "Add" for the dispatcher in cpu-layout pass * style fix * Fix bprop batchnorm test with get_output_elements * Style fix
-
fenglei.tian authored
-
fenglei.tian authored
-
Scott Cyphers authored
-
L.S. Cook authored
-
Jai Menon authored
-
fenglei.tian authored
-
fenglei.tian authored
-
Fenglei Tian authored
-
- 06 Mar, 2018 13 commits
-
-
Jai Menon authored
* CPU: Padded Convolution fusion * CPU: Non-reshaped fusion pattern for zero-padded convolutions * CPU: Refactor consistency checks * CPU: Rewrite hoisted reshape expression and add tests * CPU: Merge leftovers
-
DawnStone authored
-
Nick Korovaiko authored
* generalize matmulbias fixes disable logging * unit-test failures
-
Nick Korovaiko authored
* the first stab at op::Result format fixes disabling logging op::Result, 2nd attempt purge stale code disable logging fix copyright header * initial cleanup * cleanup2 * remove dead code * result.cpp, fix comments * fix comment
-
Fenglei authored
-
Robert Kimball authored
* patch working * wip * fix patcher * remove debug message: * cleanup * fix typo
-
fenglei.tian authored
-
fenglei.tian authored
Merge branch 'tfl/gpu_fix_constant_bug' of github.com:NervanaSystems/ngraph-cpp into tfl/gpu_fix_constant_bug
-
fenglei.tian authored
-
Fenglei authored
-
Fenglei authored
* add gpu broadcast * add broadcast kernel * fix bug for cumemdopyDtD usage in gpu_external_function.cpp
-
fenglei.tian authored
-
L.S. Cook authored
-
- 05 Mar, 2018 1 commit
-
-
DawnStone authored
* limited parallel make processes to make -j 16 by default for contrib/docker/Makefile * set the default to make -j 22 for parallel make in contrib/docker/Makefile
-