- 10 Aug, 2018 3 commits
-
-
Jaikrishnan Menon authored
-
Robert Kimball authored
-
Chris Sullivan authored
* Support GPUKernelArgs in Elementwise-collective and Nd-Convolution. * Update op::ReplaceSlice to use GPUKernelArgs and unroll coordinate transform loop. * Formatting. * Moved function signature for global kernels back to emitter body. * Formatting.
-
- 09 Aug, 2018 6 commits
-
-
Robert Kimball authored
-
Jaikrishnan Menon authored
-
Pruthvi authored
* added DEX support for BatchNormRelu * - templatized build_batchnorm_emitter
-
shssf authored
-
Scott Cyphers authored
-
dmyershov authored
* IntelGPU backend: Concat operation implementation * Several remarks were fixed * Remaining remarks were fixed; List of tests for INTELGPU was updated * PR1363: Minor Fixes
-
- 08 Aug, 2018 19 commits
-
-
Robert Kimball authored
-
Jaikrishnan Menon authored
* Make pool alignment a constexpr * Fix ODR-use
-
Jaikrishnan Menon authored
* Add an option to exclude the first iteration * Switch to warmup iterations * Cleanup
-
Pruthvi authored
-
Jaikrishnan Menon authored
-
Pruthvi authored
* - Added DEX support for BoundedRelu - Refactored bounded_relu in cpu_emitter to use mkldnn_emitter helper methods * remove unwanted templatization for bounded_relu mkldnn_emitter
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Pruthvi authored
* - Added DEX execution support for Lstm - Added DEX execution support for Rnn * style fix * - used mkldnn_utils helper function for building DEX Lstm memory desc * - used mkldnn_utils helper function for building DEX Rnn memory desc * addressed PR comments * - Refactored rnn & lstm cpu_emitter code to use the mkldnn_emitter helper methods
-
L.S. Cook authored
* Re-align docs with code example for mnist * Also fix dist_mnist and add context highlight
-
Jaikrishnan Menon authored
-
shssf authored
* IntelGPU backend: Tests updated. Code refactored. No algorithms changed. * PR1362. debug code removed
-
Chris Sullivan authored
* GPUShape(int32_t) -> NVShape(uint32_2), NVDiff(int32_t) * Update code merged from master. * Add nvshape.hpp and nvdiff.hpp.
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
* CPU Direct Execution: Implement ReplaceSlice * Remove scalar variant
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
* CPU Direct Execution: Implement Reduce * Workarounds for ancient CI compilers * Fix return types * Review comments
-
Nick Korovaiko authored
* batchdot with debug statements * clean up * address feedback
-
Jaikrishnan Menon authored
-
- 07 Aug, 2018 12 commits
-
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Nick Korovaiko authored
* DEX LRN * merge after jbobba's changes
-
Matthew Brookhart authored
* reduce fprop cache outputs * refactor traverse nodes * Slight refactor, add test, adress PR comments * fix formatting
-
Jaikrishnan Menon authored
* Add helper macros to select from a partial set of ranks and element types * CPU Direct Execution: Implement Softmax * Add softmax builder to the build script * Update
-
Jaikrishnan Menon authored
-
dmyershov authored
-
Anna Alberska authored
* IntelGPU backend: And, Or operations * Code format update: intelgpu_backend.cpp and intelgpu_op_custom_kernels.cpp * Update logical operations
-
Fenglei authored
* Updated softmax. * Formatting. * Updated convolution. * Use build_primitive overloading. Add helper to emit type_string given a node. * Formatting. * Update ConvolutionBackpropData. * convolution backprop & max pool memory primitive cacheing (#1303) * Updated ConvolutionBackpropFilters. * Update MaxPool. * Update Max and Min. (#1307) * softmax optimization * fix bug * fix bugs * clang format * remove comments * add softmax divide * fix bugs * fix bug * fix bug * clang format * remove unused header * register * using single parameters instead of array * using build_elementwise instead of build_elementwise_collective * remove workspace as csullivan suggested
-
Anna Alberska authored
* IntelGPU backend: AvgPool operation(partially) * Code format update intelgpu_backend.cpp * Delete code duplication in pooling ops intelgpu_backend.cpp
-
Chris Sullivan authored
* Add GPUKernelArgs for storing kernel arguments. * Formatting. * Resolve tensor addresses when extracting arg list via GPUKernelArgs. * Updated arg list resolution so that placeholder arguments can be added anywhere in the argument list. * const ref. args and changed add_args to use add_arg. also expanded type_names map. * GPUKernelArgs bug fix for return values. * add_placeholders expects pointers for later resolution * Formatting. * Add comments to GPUKernelArgs * Changed GPUKernelArgs interface to use a runtime variable number of arguments. * Removed/updated comment. * Address review comments: Remove combined address resolution and argument list retrieval. Remove unecessary extra type entries in type_map. * Add space between pragma once and includes. * Broadcast optimization (#1322) * Implement GPUKernelArgs with op::Broadcast. * Removed excess type insertion in kernel signature for broadcast impl. * Support new auto kernel signature generation for op::Broadcast. Add boolean to helpers to determine if parameters are registers or arrays. * Removed commented code. * Update broadcast impl. for new GPUKernelArgs interface. * Updated based on interface change to GPUKernelArgs. * Formatting. * CUDNNHostParameters now implement GPUHostParameters. (#1324) * Formatting.
-