- 08 Jul, 2018 2 commits
-
-
Robert Kimball authored
* Memory Layout pass optimizations * rename SIMPLE memory allocator
-
Robert Kimball authored
-
- 07 Jul, 2018 4 commits
-
-
shssf authored
-
Robert Kimball authored
* complete the new backend construction/destruction API * close each dlopen * don't close libraries for now as it causes python to segfault
-
Nick Korovaiko authored
-
Pruthvi authored
-
- 06 Jul, 2018 4 commits
-
-
Jayaram Bobba authored
* inplace compute * fix warnings * Initial support for convolution sum fusion * Added in-place support for conv sum fusion and test cases * reverting spurious changes * Bug fix to account for inplace input in conv sum fusion * fix compilation error * Addressed PR feedback * Handle corner cases for conv sum fusion. Skip computation reuse while using an inplace kernel * Check node argument for in-place relu assignment * Addressed PR comments * Addressed PR feedback
-
Nishant Patel authored
* Usage of mkldnn reshape updated * update reshape condition for mkldnn * Add a test case and order in which conditions are checked
-
Nick Korovaiko authored
* collect matched nodes * clear m_matched_list * tests * address feedback
-
Adam Rogowiec authored
-
- 05 Jul, 2018 4 commits
-
-
Scott Cyphers authored
* Fix short markup * Minor adjustments, license requirements.
-
Nick Korovaiko authored
-
Fenglei authored
* extra *block_size * change grid_size to threads
-
Yixing Lao authored
-
- 04 Jul, 2018 1 commit
-
-
Artur Wojcik authored
-
- 03 Jul, 2018 6 commits
-
-
Adam Procter authored
-
Louis Feng authored
* hacking to support dot of 3 by 2 inputs with gemm_batch. * clean up.
-
Robert Kimball authored
* nbench cleanup * update style
-
Nick Korovaiko authored
* tf group convolution * change perms
-
tsocha authored
-
Artur Wojcik authored
* onnx: add core wrappers Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> * onnx: add '\n' at end of files Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> * onnx: fix compilation with clang Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> * onnx: fix code style Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
-
- 02 Jul, 2018 5 commits
-
-
Sandeep authored
* declare sigmoid for core fusion * add simple test for sigmoid * info fusion status * cp op as main op * builds as expected * move sigmoid fusion code * add reference kernel * sigmoid bprop reference kernel and clang-format * add delta to bprop * fprop called * compiles bprop * move tests * serializer support * address comments in code * add doc * naming similar to core ops * fix failing test * fix failing test * address clang issue * more changes * change test macro
-
L.S. Cook authored
-
Pruthvi authored
* 1. Added MKLDNNN BoundedRelu op support for Relu6 2. CpuLayout && CPU assignment pass for BoundedRelu Op 3. Unit test inter v/s CPU for BoundedReluOp 4. MKLDNN and default emitter code for BoundedReluOp * Removed Debug prints * 1. Added support for boundedrelu to work on any constant literal 2. unit test case for rank2, rank3, rank4 for bounded relu without serialized graph * Removed is_six() method
-
Louis Feng authored
* Reshape bias to 1D for conv + bias bprop fusion * Reshape goe2 back to 2D before replacing * added shape checks to validate conv+bias op. * removed conv+bias backprop merge for separate PR review. * fixed conv_bias_bprop test. * minor changes to error messages.
-
Fenglei authored
* add gpu_timer to external function * compiled version * working version * using block_begin and block_end * add the missing ' ;' * move slice to cuda emiter * change size_t to uint32_t in kernel * working version * change block size from 1 to 64 * fix bugs * nthreads need to be size_t in broadcast op * add rank to kernel name hash * update slice in convolution * resolve index conflict * change align to align_to_blocksize, add overflow check * add gird size check and fix pool merge bug * code style, change names
-
- 30 Jun, 2018 2 commits
-
-
Pruthvi authored
* - Fixed replace output for the multi layer recurrent cell state tensor output - Modified rnn add_output to consider direction and n_layer while calculating the output size for mkldnn dst_layer and dst_iter * fix unit test failure
-
Nick Korovaiko authored
* collector * keeping track of inputs; simplifying a merging stratey; adding LKGraph * LoopKernel Collector * address feedback * address feedback 2 * address feedback 3
-
- 29 Jun, 2018 4 commits
-
-
Yixing Lao authored
* add lambda handler support for logger * reuse logger function
-
Chris Sullivan authored
* Added blank convolution kernel and refactored coordinate transform kernel helper. * Added op::Reshape to the CUDAEmitter. * Added 2-Nd tiled convolution. * Bug fixes with data_dilation and filter loop. Still need to add test for coverage of register tiling. * Styling. * Removed some comments and code added for testing. * Some tests became enabled in merge, removing them.
-
shssf authored
* IntelGPUBackend: create_tensor * 9 tests are passes. List updated
-
Nick Korovaiko authored
* workaround for depthwise convolution * fixe error msg
-
- 28 Jun, 2018 8 commits
-
-
Nishant Patel authored
* Reshape bias to 1D for conv + bias bprop fusion * Reshape goe2 back to 2D before replacing
-
Fenglei authored
-
Nishant Patel authored
* Reshape 4d * Support dimshuffles/transpose with MKLDNN * Addressing PR Feedback * Use Eigen for 3D dimshuffles
-
Pruthvi authored
- fixes segfault issue for GNMT model execution through ngraph-mxnet
-
Matthew Brookhart authored
-
Fenglei authored
* enable multi datatpye support for Cudnn. refactor binary ops using cudnn * fix bugs * add tests to skip list that CUDNN does not support * not int support on cudnn for backward pooling * no GPU.dot_4d_5d_multi_axis_big_fp64_VERY_SLOW test anymore * clang format * throw if datatype is int8 or int32 for backward pooling * comments * fix list in unit_test.manifest * add type support for alpha, beta * fix bugs * datatype support for alpha, beta * missing () * clang format * batchnorm backward bug fix * remove debug info * change member function name to snake case. remove comments * use nullptr instead of NULL * code style, use cuDNN everywhere in comments * add cudnn host parameters memory manager. * change name to allocate_by_datatype * compiled * debug * fix bug: using list instead of vector, vector address will change each time it resize * add CUDNN_DATA_UINT8 and CUDNN_DATA_UINT8x4
-
Adam Straw authored
* constant broadcast folding * code review feedback
-
Chris Sullivan authored
* Move maxpool and avgpool into CudaKernelBuilder and add cache parameters to kernel name for broadcast which are required for correct lookup. * Styling. * Add space before avg_pool.
-