- 11 Mar, 2018 3 commits
-
-
Robert Kimball authored
-
Robert Kimball authored
use op::Constant's data rather than emitting the data in the generated cpp code. This make compile times for trained models something like 100x faster. (#624)
-
Jayaram Bobba authored
-
- 10 Mar, 2018 2 commits
-
-
Jayaram Bobba authored
Add mkldnn layouts to Maxpool and Maxpoolbackprop
-
Jayaram Bobba authored
-
- 09 Mar, 2018 10 commits
-
-
Chris Sullivan authored
* Refactored unary elementwise ops into a single interface that is adaptable to elementwise ops with arbitrary number of inputs. * Renamed EmitUnaryElementwise -> EmitElementwise. Implemented first binary elementwise op (Power). * Refactored some of the boiler plate code for emitting cuda kernels to nvrtc out of the emit functions and into the CudaFunctionPool static singleton. CodeWriter now saves cuda kernels to ./gpu_codegen. * Added ops Divide, Subtract & Sign to the GPU transformer. Subtract and Sign both use custom device helper functions which have math kernels defined for the op in gpu_cuda_kernel_ops.hpp, and which are built by a new get_device_helper function.
-
Louis Feng authored
NGMX-296 Convolution + Bias with MKLDNN
-
Louis Feng authored
-
Louis Feng authored
-
Louis Feng authored
Also fixed conv+bias cpu layout bugs.
-
Louis Feng authored
* fixed memory leak. * clang format.
-
Fenglei authored
* update gpu_emitter use template * add template
-
Nick Korovaiko authored
-
Nick Korovaiko authored
* removing extra copies due to op::Result * remove comment * fix comment * switch to a flag version * add copyright header #pragma once * add impl file, rename result_elimination.hpp to result_copy_elimination.hpp to match the opt name * add cpp suffix to result_copy_elimination * use member in-class member init
-
Pruthvi authored
* - Added sigmoid fusion pass - added mkldnn emitter code for sigmoid * - corrected sigmoid expected values - add layout assignment for sigmoid op * - added assert's in cpu fusion for sigmoid - style fix * remove debug prints * NGMX-371 #comment addressed PR comments - Added sigmoid unit test case with 3D input ii) support in cpu_emmiter for sigmoid to handle all input shapes * NGMX-371 #comment use shape_size() to calculate the 1d input size
-
- 08 Mar, 2018 16 commits
-
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
Jbobba/batchnorm layouts
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Louis Feng authored
-
Jayaram Bobba authored
-
Nick Korovaiko authored
* remove broadcast from matmulbias * fix comments * working gemm-based broadcast * fix clang warning
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Chris Sullivan authored
* straightforward gpu.cos implementation following previous patterns prior to refactor * Generalized unary elementwise gpu op impl.. New unary elementwise ops can be added to the type annotations in gpu_cuda_kernel_ops.hpp. Next step is to refactor the llvm interface in gpu_emitters.hpp for similar generality. * Added gpu_emitter.hpp:EmitUnaryElementwise. Function adds cuda kernel based on ngraph::op::op_type::description. This can service all unary elementwise ops run on the gpu. * The following elementwise unary ops now use the EmitUnaryElementwise emitter: * GPU.abs * GPU.acos * GPU.asin * GPU.atan * GPU.ceiling * GPU.cos * GPU.cosh * GPU.exp * GPU.floor * GPU.log * GPU.not * GPU.sign * GPU.sin * GPU.sinh * GPU.tan * GPU.tanh Unary elementwise ops Sign and Not need extra consideration. * tanh test changed to test::all_close for fp comparison (also done for tan in commit 65fa7c6de34c8277fe2a4801644f6bb64574f4ff). * GPU backend skips added for recent softmax test and updated aliased output test that uses op::Constant. * code format update * changed cuda builder interface names to unary/binary/arbitrary, added impl. note to gpu_cuda_kernel_ops, cleaned code format * updated ngraph-cpp reference * Fixing incorrect github conflict resolution. * Added GPU emitter for op::Result. For now it simply copies the output tensor. All but 3 tests now pass. The remaining failing tests are: * GPU.dot_0_0 * GPU.dot_matrix_2x0_0x2 * GPU.dot_2x0_0 * Removed call to handle memory aliasing in gpu_external_function. * fix gpu emitter bug that will return in the middle of function * Merge pull request #609 from NervanaSystems/tfl/fix_return_bug fix gpu emitter bug that will return in the middle of function * GPU backend skips added for recent softmax test and updated aliased output test that uses op::Constant.
-
Fenglei authored
Fix constant bug on GPU
-
Robert Kimball authored
-
Chris Sullivan authored
* Added GPU emitter for op::Result. For now it simply copies the output tensor. All but 3 tests now pass. The remaining failing tests are: * GPU.dot_0_0 * GPU.dot_matrix_2x0_0x2 * GPU.dot_2x0_0 * Removed call to handle memory aliasing in gpu_external_function. * fix gpu emitter bug that will return in the middle of function * Merge pull request #609 from NervanaSystems/tfl/fix_return_bug fix gpu emitter bug that will return in the middle of function * GPU backend skips added for recent softmax test and updated aliased output test that uses op::Constant.
-
- 07 Mar, 2018 9 commits
-
-
Chris Sullivan authored
-
Pruthvi authored
* - Added support optimized bn mkldnn implementation in cpu emitter - modified bn unit_test to support new implementation - added layout assignment for bn op - Style Fix (cherry picked from commit 7747a40806d62c126059d5c873adcd2e61a0adb0) * modified value initilization in cpu_fusion to be float explicit (cherry picked from commit 03499d380073d0197ab8cbd154eb03f63b042a48) * fix compilation issue * Addressed PR comments - added exception if gamma and beta layout isnot equal to memory::format::x - throw exception if bn Op is not mkldnn op * fix compilation issue * added support to handle multiple o/ps in fprop bn fusion * - Removed laytout pass for bn - fixed autodiff bug in bn - added "Add" for the dispatcher in cpu-layout pass * style fix * Fix bprop batchnorm test with get_output_elements * Style fix
-
Scott Cyphers authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
- Pass mkldnn workspaces through runtime context - Move maxpool ops and avgpool backprop to mkldnn emitter
-
Louis Feng authored
-
L.S. Cook authored
-