- 07 Aug, 2018 6 commits
-
-
Anna Alberska authored
* IntelGPU backend: And, Or operations * Code format update: intelgpu_backend.cpp and intelgpu_op_custom_kernels.cpp * Update logical operations
-
Fenglei authored
* Updated softmax. * Formatting. * Updated convolution. * Use build_primitive overloading. Add helper to emit type_string given a node. * Formatting. * Update ConvolutionBackpropData. * convolution backprop & max pool memory primitive cacheing (#1303) * Updated ConvolutionBackpropFilters. * Update MaxPool. * Update Max and Min. (#1307) * softmax optimization * fix bug * fix bugs * clang format * remove comments * add softmax divide * fix bugs * fix bug * fix bug * clang format * remove unused header * register * using single parameters instead of array * using build_elementwise instead of build_elementwise_collective * remove workspace as csullivan suggested
-
Anna Alberska authored
* IntelGPU backend: AvgPool operation(partially) * Code format update intelgpu_backend.cpp * Delete code duplication in pooling ops intelgpu_backend.cpp
-
Chris Sullivan authored
* Add GPUKernelArgs for storing kernel arguments. * Formatting. * Resolve tensor addresses when extracting arg list via GPUKernelArgs. * Updated arg list resolution so that placeholder arguments can be added anywhere in the argument list. * const ref. args and changed add_args to use add_arg. also expanded type_names map. * GPUKernelArgs bug fix for return values. * add_placeholders expects pointers for later resolution * Formatting. * Add comments to GPUKernelArgs * Changed GPUKernelArgs interface to use a runtime variable number of arguments. * Removed/updated comment. * Address review comments: Remove combined address resolution and argument list retrieval. Remove unecessary extra type entries in type_map. * Add space between pragma once and includes. * Broadcast optimization (#1322) * Implement GPUKernelArgs with op::Broadcast. * Removed excess type insertion in kernel signature for broadcast impl. * Support new auto kernel signature generation for op::Broadcast. Add boolean to helpers to determine if parameters are registers or arrays. * Removed commented code. * Update broadcast impl. for new GPUKernelArgs interface. * Updated based on interface change to GPUKernelArgs. * Formatting. * CUDNNHostParameters now implement GPUHostParameters. (#1324) * Formatting.
-
Jayaram Bobba authored
* Switch to using mkldnn memory descriptors for layout * More changes for using mkldnn descriptor instead of format * Removed mkldnn format from cpu layout descriptor. TODO - shuffle folding * Rotate mkldnn layouts on transpose * Modifications to builder reshape to skip rotated layouts * More fixes to layouts and removes axis order from cpu layout descriptor * Code cleanup * Removed shuffle folding pass since the functionality is subsumed by the layout pass * Canonicalize a few more formats to keep MKLDNN happy. * Style fixes * Style fixes * Style fixes * Addressed PR feedback and added reshape passthrough for non-transpose cases * Adjust named formats for weights tensors to keep MKLDNN happy * Style fixes * resolved merge issues
-
Jaikrishnan Menon authored
-
- 06 Aug, 2018 3 commits
-
-
Jaikrishnan Menon authored
* CPU Direct Execution: Implement Pad * Add Pad builder to the build script * Add missed changes during commit
-
shssf authored
-
shssf authored
* IntelGPU backend: Sum operation bug fix * PR1330. Style fix
-
- 05 Aug, 2018 4 commits
- 04 Aug, 2018 2 commits
-
-
Chris Sullivan authored
* Bug fix: StaticInitializer. * Make CudaContextManager a member of GPU_Backend::BackendContext. * fix formatting
-
shssf authored
-
- 03 Aug, 2018 15 commits
-
-
Robert Kimball authored
* add option to run all models in a directory * add print for exception from benchmark
-
Nick Korovaiko authored
-
Chris Sullivan authored
* Utilize GPUMemoryManager/Allocator for preallocation of intermediate tensor buffer memory. * Formatting. * Merge with master required rework of memory due to CFE pass. Moved function memory pool allocation to pass as a result. * Formatting. * Added pass source files. * Updated tests to account for new assert check. All GPUAllocators should be deconstructed before allocation is made in GPUMemoryManager. * GPUAllocator::close() can be used to close the allocator prior to destruction * Removed open allocators. Replaced check with inspection of pass::MemoryManager node list. * Formatting. * Rename m_memory_buffers -> m_tensor_memory_buffers. Use full path to static alignment variable. * FunctionMemoryReservation -> TensorMemoryReservation. Only return true in pass if reservation is made (bug fix). * Moved static compilation mutex. * Update external function with new pass name. * GPU_ExternalFunction: Add s_memory_pool_alignment, remove optimize_and_assemble method.
-
shssf authored
-
Robert Kimball authored
-
L.S. Cook authored
* update frameworkdocs * revise docs with new MXNet bridge code instructions * revise docs with new MXNet bridge code instructions * remove broken merge conflict
-
dmyershov authored
-
L.S. Cook authored
* update frameworkdocs * revise docs with new MXNet bridge code instructions * revise docs with new MXNet bridge code instructions
-
Pruthvi authored
-
Pruthvi authored
* - Added DEX support for MaxPoolBackprop op for CPU backend * Added DEX execution support for AvgPoolBackprop
-
Jayaram Bobba authored
-
Robert Kimball authored
* compiles but does not link
-
shssf authored
-
Nick Korovaiko authored
* dex max_pool_with_indices * maxpoolwithindices (#1300)
-
Nick Korovaiko authored
-
- 02 Aug, 2018 10 commits
-
-
Jaikrishnan Menon authored
-
Nick Korovaiko authored
* lrn init * fix comment * mkldnn lrn (#1295) * add serializer + fix compiler warnings
-
Jaikrishnan Menon authored
* Fix the first_iteration flag so it works when more than one call-frame exists Static variables defined in lambda expressions are not private to a lambda so move this to the runtime context * Shave off a few microseconds by initializing intermediates exactly once * Make all execution paths use first_iteration in the runtime context
-
Michał Karzyński authored
* [Py] Add __repr__ to Strides and CoordDiff * Apply clang-format * Repr fix * Apply clang-format
-
Michał Karzyński authored
* [Py] Add convolution_backprop_data to API * Conv fix
-
Chris Sullivan authored
* Updated softmax. * Formatting. * Updated convolution. * Use build_primitive overloading. Add helper to emit type_string given a node. * Formatting. * Update ConvolutionBackpropData. * convolution backprop & max pool memory primitive cacheing (#1303) * Updated ConvolutionBackpropFilters. * Update MaxPool. * Update Max and Min. (#1307)
-
Fenglei authored
* move add,mult,min,max,sqrt to elementwise_op, increase op per threads
-
Amy Zhuang authored
* Implement trigonometric ops for direct execution. * Rename files.
-
Robert Kimball authored
* build on suse w/gcc 4.8.5 * fix SUSE build error * add comments * remove template function * update per review comment * fix nan check emitted code
-
varun-intel authored
* updated * type prop * disable test in manifest * try to exclude * style * double * dobule * more * style * more * vecs * fix goe
-