Commits · 8476dea071fc08039aa59e5db6b0d5bd2b122660 · submodule / ngraph

07 Aug, 2018 3 commits

Auto. gen. kernel signatures and argument expansion (#1326) · 8476dea0

Chris Sullivan authored Aug 07, 2018

* Add GPUKernelArgs for storing kernel arguments.

* Formatting.

* Resolve tensor addresses when extracting arg list via GPUKernelArgs.

* Updated arg list resolution so that placeholder arguments can be added anywhere in the argument list.

* const ref. args and changed add_args to use add_arg. also expanded type_names map.

* GPUKernelArgs bug fix for return values.

* add_placeholders expects pointers for later resolution

* Formatting.

* Add comments to GPUKernelArgs

* Changed GPUKernelArgs interface to use a runtime variable number of arguments.

* Removed/updated comment.

* Address review comments: Remove combined address resolution and argument list retrieval. Remove unecessary extra type entries in type_map.

* Add space between pragma once and includes.

* Broadcast optimization (#1322)

* Implement GPUKernelArgs with op::Broadcast.

* Removed excess type insertion in kernel signature for broadcast impl.

* Support new auto kernel signature generation for op::Broadcast. Add boolean to helpers to determine if parameters are registers or arrays.

* Removed commented code.

* Update broadcast impl. for new GPUKernelArgs interface.

* Updated based on interface change to GPUKernelArgs.

* Formatting.

* CUDNNHostParameters now implement GPUHostParameters. (#1324)

* Formatting.

8476dea0

Switch to using more expressive layout descriptors instead of numeric layout names (#1278) · 69c51c27

Jayaram Bobba authored Aug 07, 2018

* Switch to using mkldnn memory descriptors for layout

* More changes for using mkldnn descriptor instead of format

* Removed mkldnn format from cpu layout descriptor. TODO - shuffle folding

* Rotate mkldnn layouts on transpose

* Modifications to builder reshape to skip rotated layouts

* More fixes to layouts and removes axis order from cpu layout descriptor

* Code cleanup

* Removed shuffle folding pass since the functionality is subsumed by the layout pass

* Canonicalize a few more formats to keep MKLDNN happy.

* Style fixes

* Style fixes

* Style fixes

* Addressed PR feedback and added reshape passthrough for non-transpose cases

* Adjust named formats for weights tensors to keep MKLDNN happy

* Style fixes

* resolved merge issues

69c51c27

Fix date in license header (#1342) · 5f77fe86
Jaikrishnan Menon authored Aug 07, 2018

5f77fe86

06 Aug, 2018 3 commits
- CPU Direct Execution: Implement Pad (#1320) · e2064cc2
  Jaikrishnan Menon authored Aug 06, 2018
```
* CPU Direct Execution: Implement Pad

* Add Pad builder to the build script

* Add missed changes during commit
```
  e2064cc2
- IntelGPU backend: Product operation (#1334) · f1c3e4ab
  shssf authored Aug 06, 2018
  
  f1c3e4ab
- IntelGPU backend: Sum operation bug fix (#1330) · 81216a9e
  shssf authored Aug 06, 2018
```
* IntelGPU backend: Sum operation bug fix

* PR1330. Style fix
```
  81216a9e
05 Aug, 2018 4 commits
- IntelGPU backend: Max and Min operations (#1333) · 4f26640b
  shssf authored Aug 05, 2018
  
  4f26640b
- IntelGPU backend: Greater, Less, Equal operations (#1331) · f9ded0b1
  shssf authored Aug 05, 2018
  
  f9ded0b1
- IntelGPU backend: Dot_2x2 operation bug fix (#1329) · c5889b2b
  shssf authored Aug 05, 2018
  
  c5889b2b
- IntelGPU backend: Allow zero size Shape (#1332) · 0405a870
  shssf authored Aug 05, 2018
  
  0405a870
04 Aug, 2018 2 commits
- Fix bugs in StaticInitializer and CudaContextManager (#1321) · 8009b475
  Chris Sullivan authored Aug 04, 2018
```
* Bug fix: StaticInitializer.

* Make CudaContextManager a member of GPU_Backend::BackendContext.

* fix formatting
```
  8009b475
- IntelGPU backend: Code refactored. No algo changed. (#1328) · 8ab89b29
  shssf authored Aug 04, 2018
  
  8ab89b29
03 Aug, 2018 15 commits

nbench: add option to run all models in a directory (#1279) · 2b26df18
Robert Kimball authored Aug 03, 2018
```
* add option to run all models in a directory

* add print for exception from benchmark
```
2b26df18
bn bprop test fix, comments and throws (#1325) · 11b992a7
Nick Korovaiko authored Aug 03, 2018

11b992a7

Preallocate intermediate buffers (#1231) · 0599a628

Chris Sullivan authored Aug 03, 2018

* Utilize GPUMemoryManager/Allocator for preallocation of intermediate tensor buffer memory.

* Formatting.

* Merge with master required rework of memory due to CFE pass. Moved function memory pool allocation to pass as a result.

* Formatting.

* Added pass source files.

* Updated tests to account for new assert check. All GPUAllocators should be deconstructed before allocation is made in GPUMemoryManager.

* GPUAllocator::close() can be used to close the allocator prior to destruction

* Removed open allocators. Replaced check with inspection of pass::MemoryManager node list.

* Formatting.

* Rename m_memory_buffers -> m_tensor_memory_buffers. Use full path to static alignment variable.

* FunctionMemoryReservation -> TensorMemoryReservation. Only return true in pass if reservation is made (bug fix).

* Moved static compilation mutex.

* Update external function with new pass name.

* GPU_ExternalFunction: Add s_memory_pool_alignment, remove optimize_and_assemble method.

0599a628

IntelGPU backend: BatchNorm operation completly redeveloped (#1318) · 45b50d06
shssf authored Aug 03, 2018

45b50d06
fix travis build...I hope (#1317) · 39278e7d
Robert Kimball authored Aug 03, 2018

39278e7d

Upstream for versioning (#1316) · b99dc1ef

L.S. Cook authored Aug 03, 2018

* update frameworkdocs

* revise docs with new MXNet bridge code instructions

* revise docs with new MXNet bridge code instructions

* remove broken merge conflict

b99dc1ef

IntelGPU backend: Select operation (#1314) · 77a703c2
dmyershov authored Aug 03, 2018

77a703c2

Upstream for versioning (#1309) · f8926a7b

L.S. Cook authored Aug 03, 2018

* update frameworkdocs

* revise docs with new MXNet bridge code instructions

* revise docs with new MXNet bridge code instructions

f8926a7b

Added DEX execution support for ReluBprop (#1305) · 87b5758d
Pruthvi authored Aug 03, 2018

87b5758d
Added DEX support for (MaxPool + AvgPool) Backprop op for CPU backend (#1302) · 1fdf2d98
Pruthvi authored Aug 03, 2018
```
* - Added DEX support for MaxPoolBackprop op for CPU backend

* Added DEX execution support for AvgPoolBackprop
```
1fdf2d98
Propagate input buffers for passthrough kernels (#1312) · 2a0e43ef
Jayaram Bobba authored Aug 03, 2018

2a0e43ef
Start of windows build (#1306) · ef309cf6
Robert Kimball authored Aug 03, 2018
```
* compiles but does not link
```
ef309cf6
IntelGPU backend: Slice operation (#1304) · 7d6a41f3
shssf authored Aug 03, 2018

7d6a41f3
DEX MaxPoolWithIndices (#1299) · c38c76a7
Nick Korovaiko authored Aug 03, 2018
```
* dex max_pool_with_indices

* maxpoolwithindices (#1300)
```
c38c76a7
dex group convolution (#1297) · b1239af4
Nick Korovaiko authored Aug 03, 2018

b1239af4

02 Aug, 2018 12 commits

CPU Direct Execution: Implement product reductions (#1296) · 1011f6c7
Jaikrishnan Menon authored Aug 02, 2018

1011f6c7

LRN (#1282) · 237c4803

Nick Korovaiko authored Aug 02, 2018

* lrn init

* fix comment

* mkldnn lrn (#1295)

* add serializer + fix compiler warnings

237c4803

Fix first_iteration (#1294) · 83a9d252

Jaikrishnan Menon authored Aug 02, 2018

* Fix the first_iteration flag so it works when more than one call-frame exists

Static variables defined in lambda expressions are not private to a lambda so
move this to the runtime context

* Shave off a few microseconds by initializing intermediates exactly once

* Make all execution paths use first_iteration in the runtime context

83a9d252

[Py] Add __repr__ to Strides and CoordDiff (#1291) · 870ab827
Michał Karzyński authored Aug 02, 2018
```
* [Py] Add __repr__ to Strides and CoordDiff

* Apply clang-format

* Repr fix

* Apply clang-format
```
870ab827
[Py] Add convolution_backprop_data to API (#1292) · 5c56923a
Michał Karzyński authored Aug 02, 2018
```
* [Py] Add convolution_backprop_data to API

* Conv fix
```
5c56923a

softmax & convolution memory primitive cacheing (#1290) · bb94fa85

Chris Sullivan authored Aug 02, 2018

* Updated softmax.

* Formatting.

* Updated convolution.

* Use build_primitive overloading. Add helper to emit type_string given a node.

* Formatting.

* Update ConvolutionBackpropData.

* convolution backprop & max pool memory primitive cacheing (#1303)

* Updated ConvolutionBackpropFilters.
* Update MaxPool.

* Update Max and Min. (#1307)

bb94fa85

gpu element op optimize (#1287) · eba9439b
Fenglei authored Aug 02, 2018
```
* move add,mult,min,max,sqrt to elementwise_op, increase op per threads
```
eba9439b
Implement trigonometric ops for direct execution. (#1289) · bcd1daa2
Amy Zhuang authored Aug 02, 2018
```
* Implement trigonometric ops for direct execution.

* Rename files.
```
bcd1daa2

Fix SUSE build and run errors (#1284) · 54e5a816

Robert Kimball authored Aug 02, 2018

* build on suse w/gcc 4.8.5

* fix SUSE build error

* add comments

* remove template function

* update per review comment

* fix nan check emitted code

54e5a816

Interpreter implementation of batch norm bprop (#934) · c6a0fae3

varun-intel authored Aug 02, 2018

* updated

* type prop

* disable test in manifest

* try to exclude

* style

* double

* dobule

* more

* style

* more

* vecs

* fix goe

c6a0fae3

wip (#1293) · 2a64baca
Robert Kimball authored Aug 02, 2018

2a64baca

Work around some buggy (and deprecated) rpath directives (#1256) · 84546bbc

Jaikrishnan Menon authored Aug 02, 2018

* Work around some buggy (and deprecated) rpath directives

* Add missing newline

* Revert "Add missing newline"

This reverts commit 95aebb7f14850afcd59c53ece0bb4663b8c38660.

* Encoding fixes

84546bbc

01 Aug, 2018 1 commit

More efficient sum for some cases (#1251) · f8941a12

Louis Feng authored Aug 01, 2018

* hacking to support dot of 3 by 2 inputs with gemm_batch.

* clean up.

* testing inplace reshape.

* fixed a compile error.

* added comments on todo.

* check for output.

* check for annotation.

* more optimizations WIP.

* sum simd.

* moved parallel for

* testing sum vectorization.

* fixed merge errors.

* sum wip.

* more logic.

* sum refactor and clean up.

* clean up.

* removed unrelated changes.

* removed related changes from merge.

* fixed clang compile errors.

f8941a12