Commits · 55a25d415d1c68d0f2802fa60af01c7f37ad45e0 · submodule / ngraph

13 Jul, 2018 11 commits

Refactored GPU backend state into BackendContext (#1186) · 55a25d41

Chris Sullivan authored Jul 13, 2018

* Refactored GPU backend state into BackendContext and moved it to the highest level GPU_Backend.
Some bugs have appeared in so doing. Needs investigation.

* extra *block_size

* change grid_size to threads

* Bug fix in softmax cache parameters.

* Additional bug fix for maxpool1d cache parameters.

* Bug fix in softmax cache parameters.

* Additional bug fix for maxpool1d cache parameters.

* Remove temporary print statements.

* Use nthreads in primitive hash.

* Switched from using stack references for cudnn and cublas handles to heap pointers held only the c-struct GPURuntimeContext but managed by the GPU_Backend.

* Refactored the use of GPURuntimeContext* ctx throughout the emitters.

* Use std::prev instead of operator-- for memory iteratory capture

* bug fix from abaf1d7

55a25d41

Backend/API: Implementation of the call method for IntelGPU (#1199) · 8bde818c

dmyershov authored Jul 13, 2018

* Backend/API: Implementation of the call method for IntelGPU

* intel_gpu_style_fix_1199

* Copy memory from clDNN to Tensor

* Code style fix in 1199.2

8bde818c

get_subgraph_outputs (towards checking that intermediate nodes in a matched graph not used) (#1207) · 83e7dba5
Nick Korovaiko authored Jul 13, 2018
```
* get_subgraph_outputs

* simplify the condition
```
83e7dba5
minor speed increase (#1218) · 33b54ce1
Robert Kimball authored Jul 13, 2018

33b54ce1
CPU Direct Execution: Implement Reshape (#1225) · 346f480f
Jaikrishnan Menon authored Jul 13, 2018

346f480f

Jbobba/dex computation reuse (#1219) · 7d59542d

Jayaram Bobba authored Jul 13, 2018

* CPU Direct Execution: Implement ConvertLayout and refactor

* CPU Direct Execution: Implement Convolution

* 1) Adds computation reuse to direct execution
2) Add avg_pool, broadcast and convolution_bias to direct execution
3) Moved some computation reuse utility functions to graph_utils

* Use lists instead of vectors to avoid reallocation overheads

* - Style fix

* style fix

7d59542d

gpu_external_function and gpu constant memory refactor (#1189) · 260cb90d

Fenglei authored Jul 13, 2018

* refactor external function

* wokring version

* fix bug

* add emit_fucntions, emit_declare_constants, emit_declare_functions

* add std::

* add functions declaration

* fix bugs

* fix bugs

* separate temp memory allocation and release

* add invoke_constant_ptr function, clean up outputs for function

* fix bugs, compiled ok

* add ctx to emit_declare_constant

* cleanup code, code style

* remove using std, code style

* revert std changes

* change function names based Chris's comments

* add ResultCopyElimination to pass_manager

* clang format

260cb90d

Backend/API: Implementation of ADD and MUL operations in the compile() (#1200) · 2c345798

shssf authored Jul 13, 2018

* Backend/API: Implementation of ADD and MUL operations in the compile method for IntelGPU

* Branch merge conflicts resolved

* Parameters number check moved to function. RESULT operation handling added.

2c345798

reshape inplace without copy data if possible. (#1206) · 268853d0
Louis Feng authored Jul 13, 2018

268853d0

Fix incorrect hash strings for softmax and 1d maxpool. (#1195) · 4659d60d

Chris Sullivan authored Jul 13, 2018

* Bug fix in softmax cache parameters.

* Additional bug fix for maxpool1d cache parameters.

* Formatting.

* Use nthreads in primitive hash.

4659d60d

gpu reshape optimization (#1174) · b5e69eaa

Fenglei authored Jul 13, 2018

* add gpu_timer to external function

* compiled version

* working version

* using block_begin and block_end

* add the missing '
;'

* move slice to cuda emiter

* change size_t to uint32_t in kernel

* working version

* change block size from 1 to 64

* fix bugs

* nthreads need to be size_t in broadcast op

* add rank to kernel name hash

* change reshape to cuda_emitter

* fix bugs

* bug, remove rank from kernel

* clang format

* update slice in convolution

* resolve index conflict

* change align to align_to_blocksize, add overflow check

* add gird size check and fix pool merge bug

* code style, change names

* fix merge conflict

* change kernel_runner to kernel_launch

b5e69eaa

12 Jul, 2018 4 commits

Added reshape and broadcast to CSE (#1221) · cf568ef9

Louis Feng authored Jul 12, 2018

* reshape inplace without copy data if possible.

* added reshape and broadcast to CSE.

* Fixed debug messages.

cf568ef9

remove custom install path (#1164) · 41942f8b

Robert Kimball authored Jul 12, 2018

* remove custom install path

* fix travis build

* Add NGRAPH_INSTALL_PREFIX as an alias for CMAKE_INSTALL_PREFIX to make our unit tests pass.

* change install path setting

41942f8b

Bob/backend list (#1220) · 8e1954d0

Robert Kimball authored Jul 12, 2018

* open only the unversioned library but check that it is built against the correct version of ngraph

* review comments

8e1954d0

gpu safe call - add CUDA_RT_SAFE_CALL (#1222) · 97b19515

Fenglei authored Jul 12, 2018

* add CUDA_SAFE_CALL to all cuda calls

* add CUDA_RT_SAFE_CALL

* add null ptr check before free

* init pointer to nullptr

* consolidate conditions

97b19515

11 Jul, 2018 2 commits
- DEX Part 3 (#1184) · d37fa712
  Jaikrishnan Menon authored Jul 11, 2018
```
* CPU Direct Execution: Implement ConvertLayout and refactor

* CPU Direct Execution: Implement Convolution
```
  d37fa712
- Disabeled RNN fusion pass in IA transformer (#1217) · 4cd2c602
  Pruthvi authored Jul 11, 2018
  
  4cd2c602
10 Jul, 2018 1 commit
- [Py] Enable retrieve data from constant node. (#1214) · 785c1ce7
  Adam Rogowiec authored Jul 10, 2018
```
* Enable retrieving data from Constant in python.

* Test on wide value range.
```
  785c1ce7
09 Jul, 2018 4 commits
- Liveness optimizations (#1210) · 0c721561
  Robert Kimball authored Jul 09, 2018
```
* Faster liveness.

Memory manager optimized for non-sharing of tensors.
Add pass manager profiler.

* Move pass profiler to a separate PR

* Move Memory Layout optimizations to a separate PR

* use find instead of count
```
  0c721561
- Cache functions so the backend does not need to recompile (#1209) · ffe3a631
  Robert Kimball authored Jul 09, 2018
```
* Cache some generated functions in backwards tests to speed performance

* more caching
```
  ffe3a631
- [ONNX] Apply code review comments (#1213) · 9fecc560
  Michał Karzyński authored Jul 09, 2018
  
  9fecc560
- Support for multiple precompiled header files (#1208) · 198431b6
  Robert Kimball authored Jul 09, 2018
```
Better CI performance
```
  198431b6
08 Jul, 2018 2 commits
- Memory Layout pass optimizations (#1212) · 0165b27e
  Robert Kimball authored Jul 08, 2018
```
* Memory Layout pass optimizations

* rename SIMPLE memory allocator
```
  0165b27e
- add pass profiler (#1211) · e3d95453
  Robert Kimball authored Jul 08, 2018
  
  e3d95453
07 Jul, 2018 4 commits
- Backend/API: cmake module to find Intel clDNN (#1155) · 26645912
  shssf authored Jul 07, 2018
  
  26645912
- New backend construction/destruction API (#1171) · ad4dd5b0
  Robert Kimball authored Jul 07, 2018
```
* complete the new backend construction/destruction API
* close each dlopen
* don't close libraries for now as it causes python to segfault
```
  ad4dd5b0
- adding comment (#1193) · 21d22459
  Nick Korovaiko authored Jul 07, 2018
  
  21d22459
- Added predicate for alpha, in BoundedRelu (#1205) · f2b73a76
  Pruthvi authored Jul 07, 2018
  
  f2b73a76
06 Jul, 2018 4 commits

Jbobba/conv sum cleanup (#1167) · 0768a969

Jayaram Bobba authored Jul 06, 2018

* inplace compute

* fix warnings

* Initial support for convolution sum fusion

* Added in-place support for conv sum fusion and test cases

* reverting spurious changes

* Bug fix to account for inplace input in conv sum fusion

* fix compilation error

* Addressed PR feedback

* Handle corner cases for conv sum fusion. Skip computation reuse while using an inplace kernel

* Check node argument for in-place relu assignment

* Addressed PR comments

* Addressed PR feedback

0768a969

Use mkldnn reorder only for transpose/dimshuffles. (#1188) · 5be99c0a

Nishant Patel authored Jul 06, 2018

* Usage of mkldnn reshape updated

* update reshape condition for mkldnn

* Add a test case and order in which conditions are checked

5be99c0a

Collect matched nodes (#1166) · e07637c0
Nick Korovaiko authored Jul 06, 2018
```
* collect matched nodes

* clear m_matched_list

* tests

* address feedback
```
e07637c0
[Py] Expose logical And, Or operations. (#1198) · 137f002b
Adam Rogowiec authored Jul 06, 2018

137f002b

05 Jul, 2018 4 commits
- Cyphers/contrib (#1202) · 7cd38322
  Scott Cyphers authored Jul 05, 2018
```
* Fix short markup

* Minor adjustments, license requirements.
```
  7cd38322
- make logical ops input type aware (#1203) · a7c5eb01
  Nick Korovaiko authored Jul 05, 2018
  
  a7c5eb01
- fix bugs in align_to_block_size function (#1191) · f1ebcd3e
  Fenglei authored Jul 05, 2018
```
* extra *block_size

* change grid_size to threads
```
  f1ebcd3e
- fix namespace error in macro (#1194) · af956916
  Yixing Lao authored Jul 05, 2018
  
  af956916
04 Jul, 2018 1 commit
- [ONNX] add 'Add' operator (#1192) · 15d743f1
  Artur Wojcik authored Jul 04, 2018
  
  15d743f1
03 Jul, 2018 3 commits
- Update documentation link to new ngraph-tf (#1185) · 08cabb12
  Adam Procter authored Jul 03, 2018
  
  08cabb12
- Batch dot operation for rank 3 multiply with rank 2 tensors (#1180) · 238ce788
  Louis Feng authored Jul 03, 2018
```
* hacking to support dot of 3 by 2 inputs with gemm_batch.

* clean up.
```
  238ce788
- nbench cleanup (#1183) · 9d09c7e5
  Robert Kimball authored Jul 03, 2018
```
* nbench cleanup

* update style
```
  9d09c7e5