- 14 Mar, 2018 4 commits
-
-
Fenglei authored
* add onehot op * refactor broadcast and onehot op
-
Chris Sullivan authored
* Added corresponding cudaFree to the cudaMalloc for the cuda pool_base_ptr memory buffer. * Check for temporary buffer allocation prior to freeing. Add null check on cudaFree.
-
Robert Kimball authored
* Add cpio file read/write class and unit tests add reserializer Add unit test for serialize constants to cpio file. Fix bug in serializer if function has no parameters.
-
Jayaram Bobba authored
Jbobba/mkldnn v0.13
-
- 13 Mar, 2018 13 commits
-
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Robert Kimball authored
-
Jayaram Bobba authored
-
Chris Sullivan authored
* GPU elementwise emitters now respect input and output tensor types. This enables the use of binary comparison ops and op::Convert. * Removed comments. * All kernels now have type signature even if the i/o tensors are equivalent type so that kernels for specific type tensors are unique. NGMX-391 #close
-
Pruthvi authored
* Fix bn construtor - assert if gamma or beta dont have rank 1 - remove redundant checks * - added gaurds to check if the input and delta shape to mkldnn bn fprop and bprop op has a rank of 4
-
Fenglei authored
gpu dot bug fix for bprop
-
fenglei.tian authored
-
Chris Sullivan authored
* Updated namespace use in cpp files.
-
Fenglei authored
-
Pruthvi authored
* - Added pattern matcher for bprop sigmoid - mkldnn emitter code for sigmoid bprop - Fusion pass unit test for sigmoid bprop - style fix * Added test case for bprop sigmoid * fixed sigmoid bprop test case failure * fixed bprop unit test values for sigmoid * style fix * fix typo * Addressed PR comments - added layout assignment pass to ensure delta and input have same layout for SigmoidBprop
-
fenglei.tian authored
-
- 12 Mar, 2018 7 commits
-
-
Fenglei authored
-
Jayaram Bobba authored
Batchnorm bprop layouts and move the last few mkldnn ops to mkldnn_emitter
-
Jayaram Bobba authored
-
fenglei.tian authored
-
fenglei.tian authored
-
fenglei.tian authored
-
Christian Convey authored
-
- 11 Mar, 2018 6 commits
-
-
Robert Kimball authored
* fix detailed timing flag * more detailed info
-
Robert Kimball authored
-
Robert Kimball authored
use op::Constant's data rather than emitting the data in the generated cpp code. This make compile times for trained models something like 100x faster. (#624)
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
- 10 Mar, 2018 9 commits
-
-
Jayaram Bobba authored
-
Jayaram Bobba authored
Add mkldnn layouts to Maxpool and Maxpoolbackprop
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
Jayaram Bobba authored
-
fenglei.tian authored
-
fenglei.tian authored
-
fenglei.tian authored
-
- 09 Mar, 2018 1 commit
-
-
Chris Sullivan authored
* Refactored unary elementwise ops into a single interface that is adaptable to elementwise ops with arbitrary number of inputs. * Renamed EmitUnaryElementwise -> EmitElementwise. Implemented first binary elementwise op (Power). * Refactored some of the boiler plate code for emitting cuda kernels to nvrtc out of the emit functions and into the CudaFunctionPool static singleton. CodeWriter now saves cuda kernels to ./gpu_codegen. * Added ops Divide, Subtract & Sign to the GPU transformer. Subtract and Sign both use custom device helper functions which have math kernels defined for the op in gpu_cuda_kernel_ops.hpp, and which are built by a new get_device_helper function.
-