Commits · 7bc6b785993d17f2875bd7e83db6b4456406214d · submodule / ngraph

09 May, 2018 2 commits

RTLD_GLOBAL fix codgen link (#984) · 7bc6b785
Yixing Lao authored 6 years ago

7bc6b785

CUDNN and CUDA kernels for AvgPool (forward/backward) (#951) · b1b3d4d6

Chris Sullivan authored 6 years ago

* Added op::AvgPool cudnn impl. which works for 2-3 spatial dimesions and no/symmetric padding. Enabled tests.

* Added cuda-c implementation of average pool which handles 1-3 spatial
dimensions as well as asymmetric padding. This commit also introduces
several helper functions for performing fast integer division and
fast constant memory access.

* Formatting. Removed bool that was used for testing to force the cuda impl. over cudnn.

* Added CUDNN AvgPoolBackprop implementation.

* Removed inline enum in preference of a helper struct. Removed instances of multiple declarations on a single line. Updated comments.

* Removed _prefix to helper functions in anonymous namespace.

b1b3d4d6

08 May, 2018 8 commits

[cuDNN:Part 1] minimal refactoring of op::reduce (#965) · 682f7b04

Chris Sullivan authored 6 years ago

* Refactored the cudnn reduce kernel to use the nGraph Shape -> cudnnTensorDescriptor cudnn helpers that the other kernels use.

* Added cacheing to cudnn reduce op.

* Adding back hashing call before returning primitive index to op::Reduce (bug fix).

* [cuDNN:Part 2] Descriptor Creation/Destruction refactoring (#969)

* Added a cuDNN descriptor factory which manages the construction and destruction of cuDNN descriptors.
It correctly calls Create/Destroy based on the cuDNN descriptor type. Previously the Destroy functions were not being called.

* Removed commented code and changed class to struct on cudnn_descriptor.

* Added comments and updated a few variable names.

* Clang compiled cuDNN kernels (those not part of CUDNNEmitter)
now use the CUDNNDescriptor factory.

682f7b04

add gpu concat op (#931) · 57d58e50

Fenglei authored 6 years ago

* add concat op

* change to concat

* add more code for gpu concat

* compile sucess version with bug

* add emit_concat_op

* runable with wrong result

* working version

* add some comments

* delete old comments.

* delete old comments.

* remove bug doxyen comments

57d58e50

add a missing return if maxpool not found to fix segfault (#975) · 443b51b7
Nick Korovaiko authored 6 years ago

443b51b7
Fixes compiler warning. (#974) · acc4e46d
Christian Convey authored 6 years ago

acc4e46d
fix shared object file name check to look for all lower case name (#970) · 4c1e4741
Robert Kimball authored 6 years ago

4c1e4741
Algebraic Simplification for Product (#949) · 659d2565
Nick Korovaiko authored 6 years ago
```
* product simplifier

* char -> signed char
```
659d2565

Computation reuse (#945) · 41c50b44

Jayaram Bobba authored 6 years ago

* Make temp memory pools static to avoid memory allocation overheads

* Initial implementation for graph control to enable caching and computation reuse

* Added sphinx documentation

* Turned off memory buffer reuse in CPU transformer to support computation reuse. Added unit test

* Change memoizable to cacheable

* Change memoizable to cacheable

* Rename variables

41c50b44

MaxPoolWithIndices (#900) · a174c8c9

Nick Korovaiko authored 6 years ago

* MaxPoolWithIndices CPU Fusion

* fix test to pass checks in cpu_fusion

* pass test

* clean up

* add a new pass, add layouts

* remove the opt from cpu_fusion

* refactor cpu_layout logic for maxpool, clean up comments

* add comment w.r.t. indices tensor

* rename to cpu_workspace_insertion

* add CPUWorkspaceInsertion pass for TF

a174c8c9

07 May, 2018 2 commits

Reverse Sequence (#920) · 23913010

Nick Korovaiko authored 6 years ago

* sequence reverse

* fix test

* more tests for reverse_sequence

* remove debug prints, change perms

* fix formatting; remove dead code

* make seq_lengths a parameter

* autodiff + tests

23913010

Infrastructure for Common Subexpression Elimination (#927) · 1c2b0dc9
Nick Korovaiko authored 6 years ago
```
* cse init

* init tests

* clean up; more tests

* remove visualizations
```
1c2b0dc9

06 May, 2018 1 commit
- simplifier for log (#962) · c349056e
  Nick Korovaiko authored 6 years ago
  
  c349056e
05 May, 2018 1 commit

add gpu reverse (#952) · af946b7d

Fenglei authored 6 years ago

* add code to gpu reverse

* add reverse emitter and kernel builder

* working versrion

af946b7d

04 May, 2018 9 commits
- fix clang error (#966) · a2ba10b5
  Yixing Lao authored 6 years ago
  
  a2ba10b5
- Clean up is_constant is_input is_output in Tensor (#943) · f6eec91f
  Yixing Lao authored 6 years ago
```
* remove in tensor.hpp and cpp

* remove in constructors

* more clean up at tv_wrapper and set_is_output()

* fix liveness

* fix liveness.cpp

* finally fixed liveness

* fix PrimaryTensorView constructor in node.cpp

* fix PrimaryTensorView constructor in cpu_tensor_view

* clang-format

* update tensor print

* clean comments

* rename
```
  f6eec91f
- Shuffle folding to collapse transposes into layout conversions (#950) · b88fa59d
  Jayaram Bobba authored 6 years ago
  
  b88fa59d
- Adding support for mkldnn convolution+bias+relu kernel (#913) · b5844622
  Jayaram Bobba authored 6 years ago
```
* Adding support for mkldnn convolution+bias+relu op to use in batch norm folding

* Style fix

* Style fix
```
  b5844622
- Fix Matcher's getters to adhere to ngraph coding guidelines (#916) · 606ad20b
  Nick Korovaiko authored 6 years ago
```
* rename getters to adhere to ngraph coding guidelines

* fix renaminb

* fix build errors
```
  606ad20b
- rename const_ops to simplifier_ops (#933) · 4f2316e8
  Nick Korovaiko authored 6 years ago
  
  4f2316e8
- remove ie backend again (#939) · 2ce06602
  Robert Kimball authored 6 years ago
  
  2ce06602
- Force backends to register by calling dlopen (#938) · 49c2059c
  Robert Kimball authored 6 years ago
```
* if a requested backend is not registered then try dlopen to force it to register

* call 'extern C create_backend()' in the opened shared object to register the backend

* use a single name to test for backend
```
  49c2059c
- construct_relu_pattern -> construct_relu (#932) · 5648af4e
  Nick Korovaiko authored 6 years ago
  
  5648af4e
03 May, 2018 2 commits
- Auto generate version number and apply it to install dir and libngraph.so (#925) · c428a97b
  Robert Kimball authored 6 years ago
```
* migrate files

* apply version number to libngraph.so

* fix cmake running with empty build directory
```
  c428a97b
- Make temp memory pools static to avoid memory allocation overheads (#941) · 17c33415
  Jayaram Bobba authored 6 years ago
  
  17c33415
01 May, 2018 3 commits
- Remove single-line comments ending with backslash (#928) · cd0f4fbd
  Adam Procter authored 6 years ago
```
* Remove single-line comments ending with backslash

These comments cause warnings on gcc.

* Add -DNGRAPH_DEBUG_ENABLE cmake option

* Forgot to update log.hpp

* Better comment in CMakeLists
```
  cd0f4fbd
- Doc through relu, use mathtt instead of texttt in equations. (#936) · 93b7b517
  Scott Cyphers authored 6 years ago
```
* Doc through relu, use mathtt instead of texttt in equations.

* Review comments.
```
  93b7b517
- Add autodiff for the arc trig ops (#935) · 0c3bc7d0
  Matthew Brookhart authored 6 years ago
  
  0c3bc7d0
30 Apr, 2018 2 commits
- A reference implementation of batchnorm fprop and tests. (#861) · 833a05b2
  varun-intel authored 6 years ago
```
* interpreter implementation and tests

* style

* correct

* tolerance

* skip

* type

* cast

* double

* types

* format

* add bn to the inference engine backend
```
  833a05b2
- sum fix (#930) · 15f50ce1
  Nick Korovaiko authored 6 years ago
  
  15f50ce1
27 Apr, 2018 2 commits

add templated get_data_ptr() methods to HostTensorView and Constant to make… · 5f0a9f96
Robert Kimball authored 7 years ago
```
add templated get_data_ptr() methods to HostTensorView and Constant to make using them a little cleaner. (#924)
```
5f0a9f96

gpu select (#919) · 30d24597

Fenglei authored 7 years ago

* add select op, pass data type for each operand

* fix bugs and apply clang format

* fix index bug

30d24597

26 Apr, 2018 6 commits
- Some more op documentation (#918) · e8a1e549
  Scott Cyphers authored 7 years ago
```
* Some more op documentation

* Review comments
```
  e8a1e549
- Replace interpreter with Adam's simplified implementation (#915) · 89963725
  Robert Kimball authored 7 years ago
```
* wip

* simplified interpreter backend
```
  89963725
- Support for batchnorm+relu fusion for all batchnorm variants. (#903) · 66198b33
  Jayaram Bobba authored 7 years ago
  
  66198b33
- Replace with Broadcast if exists in Algebraic Simplifier (#908) · 82c19d24
  Nick Korovaiko authored 7 years ago
```
* pick broadcast if exists

* remove logic for sum

* get at broadcast using the label-on-skip approach

* tests for broadcast fix

*  add comments
```
  82c19d24
- cleanup some enable flags in cmake. add flags to optionally disable building… · 2be55e02
  Robert Kimball authored 7 years ago
```
cleanup some enable flags in cmake. add flags to optionally disable building unit tests and tool (#917)
```
  2be55e02
- Algebraic Simplifier for Sum (#907) · a28f9a67
  Nick Korovaiko authored 7 years ago
```
* simplifier for sum

* add comment, remove visualization passes
```
  a28f9a67
25 Apr, 2018 2 commits

CUDNN BatchNorm (inference/forward/backward) (#893) · 23ac5e5a

Chris Sullivan authored 7 years ago

* Added cudnn batch norm operation to GPU transformer.
Brought batchnorm tests out of cpu_tests and into
backend_tests. Need to add JIRA ticket for interpreter
SKIPS.

* CUDNN batchnorm is implemented. In the ForwardTraining branch
CUDNN seems to calculate the batch mean correctly but the batch variance incorrectly.
Currently the batchnorm output and mean are calculated correctly for tests:
* GPU.batchnorm_fprop_b2c2h3w3_mean_var
* GPU.batchnorm_fprop_b1c2h2w2
* GPU.batchnorm_fprop_b2c2h2w1
but the variance calculated for the batches in these tests is incorrectly calculated by CUDNN.

Also added an additional test and cleaned up some of the old tests.

* MKLDNN internally utilizes the biased estimate of the population variance
and the tests have been crafted to suit MKLDNN. According to the original
batchnorm publication (https://arxiv.org/pdf/1502.03167v3.pdf), population
(unbiased) statistics should be used for inference, and mini-batch (biased)
statistics should be used training (forward/backward). For the variance this
means utlitizing the following equations, respectively:

(biased) Var[X] = 1/m * Sum_i(x_i-mu)^2 :: used in training
(unbiased) Var[X] = 1/(m-1) * Sum_i(x_i-mu)^2 :: used in inference

s.t. x_i are elements of X and m = N*D*H*W.

For large batch sizes in inference this may not impact convergence as m >> 1,
but for small batch sizes it will. CUDNN internally utilizes the unbiased
variance.

Changes:
* Added Multiply op to Forward pass of batchnorm to convert
the unbiased variance to a biased one. The op utilizes the
blending scaling factors to apply the bias factor.
* Adds emission for the BatchNormBackprop kernel and cleans up
the emitter implementation.

* Added hashing to cudnn::batchnorm op.

* Formatting.

* Changed hashing of epsilon in cudnn batchnorm.

* Remove implicit conversion and default case in switch for bn.

* Added skips for IE transformer on batchnorm.

* add cudnn include path to compiler.cpp

* seperate two path

* PR #892 and #825 which were recently merged both forgot skips for the GPU backend.
Adding them in as they are unimplemented ops.

* The allocation and deletion of primitives was occuring in seperate
translation units with raw c pointers. Because of this, it was not
clear that these were being freed appropriate, nor did it indicate
ownership of the pointers.

In this commit these raw pointers have been converted over to
std::unique_ptrs such that the construction/destruction is managed
automatically. Furthermore, GPUPrimitiveEmitter::insert now only
takes an r-value reference, requiring move-semantics to indicate
that when inserting a primitive, the GPUPrimitiveEmitter takes
ownership of the pointer.

All instances of primitive creation have been modified.

* CUDNN_SAFE_CALL

* Removed redundant comment and made variable names more verbose.

* Change from conditionals to case-switch in pooling to conform to
batchnorm per @fengleitian's suggestion.

23ac5e5a

add cudnn include path to compiler.cpp (#902) · b0421577

Fenglei authored 7 years ago

* add cudnn include path to compiler.cpp

* seperate two path

* Skipping one_hot tests for CPU as
CI is failing. JIRA bug report: https://jira01.devtools.intel.com/browse/NGRAPH-1682.

b0421577