Commits · 8fdefa525595e9f9fcd4f5ea577965bc35ec0be6 · submodule / ngraph

29 Aug, 2018 1 commit

Change license header to use single-line comment (#1508) · a17ec605

Robert Kimball authored 6 years ago

* use line comments instead of multiline comments for license header

* update more

* update new files

* more header updates

* style

a17ec605

13 Aug, 2018 1 commit
- enable parameter validation for all unit tests (#1385) · 24b41844
  Robert Kimball authored 6 years ago
```
* enable parameter validation for all unit tests
```
  24b41844
18 Jul, 2018 1 commit
- Pool tests updated to check all backends (#1245) · e2255fbd
  Robert Kimball authored 6 years ago
```
* make pool test check backends other than CPU

* more unit test cleanup
```
  e2255fbd
25 Jun, 2018 1 commit

inplace compute (#1141) · 88aa9e9c

Nick Korovaiko authored 6 years ago

* inplace compute

* fix warnings

* address bob's feedback

* bob's feedback 2

* bobs feedback 3

* address bob's feedback 4

88aa9e9c

15 Jun, 2018 1 commit
- move tbb test from backend_test to cpu_test because it is CPU only (#1102) · 7d6a0d1c
  Robert Kimball authored 6 years ago
  
  7d6a0d1c
23 May, 2018 1 commit

LSTM fusion + RNN fusion across time slice's for single layer (#826) · 1d08f073

Pruthvi authored 6 years ago

* - Added pattren matcher for LSTM cell

* WIP added support to replace lstm cell instead of subgraph

* WIP LSTM pattern matcher, fuses recurrent cells

* WIP added RNN CPU op

* WIP mkldnn emmiter code for fprop RNN

* WIP RNN mkldnn integration
- Added mkldnn kernel for uni directional LSTM in the CPU emitter

* add a getter for root node

* recurrent graph rewrite

* fix perms, rename match_root -> get_match_root

* fix comp errors

* make match_root return the topmost match; fix tests

* - WIP GetOutputElement for handling multiple LSTM o/ps
- use RecurrentGraphRewrite for replacing node after matching LSTM cells

* WIP LSTM multi Output + debug prints

* moved LSTM fusion to cpu_fusion

* WIP added RNN superfused OP

* WIP towards RNN layer fusion

* WIP multiple output slicing RNN

* WIP RNN mulitple o/ps fusion across layer

* WIP corrected input params for fused RNN OP

* concat corrosponding param's across differnt LSTM to form inputs to RNN fused op

* i) Added  test case for RNN kernel ii) runs without error's

* refactored and moved LSTM class to standalone file

* Rename RNN -> Rnn , LSTM -> Lstm

* WIP replace lstm slices to the consumer op

* Slicing works on multiple RNN layers

* fixed all bugs

* - Added CPU RNN Recurrent Fusion
- Added CPU LSTM fusion
- removed debug code
- style fix

* - Added support to compute src_iter and dst_iter instead of taking zero_memory_desc
- Added unit test to compute one LSTM cell

*  changed RNN op signature to accept number of states in basic unit of RNN(GRU/LSTM/ vanilla RNN) cell

* added sanity checks for RNN op

* Fixed issue related to patching the graph while replacing the RNN sliced outputs

* Fixed issue to feed the input symbols in the order X0, X1, ...Xt to the RNN op

* Added unit test for multi layer RNN fusion

* Removed debug statements

* Added mulitlayered serialized graph ii) fixed compilation issue

* Addressed PR comments

* i) WIP MKLDNN layout for RNN Op ii) added test case for INTERPRETER v/s CPU Rnn results

* - Fixed bug w.r.to src_layer feature size in rnn mkldnn emitter code
- Refactored cpu_fusion rnn test case

* merge origin/master with branch pruthvi/lstm_fusion

* style fix

* Added test case for multiple RNN layers

* i) make rnn as mkldnn op if it meets the constraints ii) assert if rnn is not mkldnn op

* fix unit test failure

* - Added support to reliabily identify the hiddent state and input symbols from the nodes collected by Pattern matcher
- Fixed failing unit tests

* style fix

* - removed "node type" dependency to replace the intermediate LSTM outputs

* Addressed PR comments

* Fix unit test

* - added MKLDNN emitter for LSTM op
- graph pass to concat LSTM input recurrent state tensors
- CPU layout assignment for LSTM Op
- Fixed bug in rnn/lstm unit test's
- made changes to use replace_output instead of replace_node for replacing matched graph nodes in LSTM/RNN fusion pass

(cherry picked from commit d16fc709265cc0a73e60c6d5f6d2878e7b908aca)

* style fix

* Renamed passes and style fixes

1d08f073

18 May, 2018 1 commit

Enable reverse_sequence for Interpreter (#977) · cd59bfe4

Nick Korovaiko authored 6 years ago

* use reference kernel for reverse_sequence for int

* move tests

* resolve CI errors

* TEST to NGRAPH_TEST

cd59bfe4

07 May, 2018 1 commit

Reverse Sequence (#920) · 23913010

Nick Korovaiko authored 6 years ago

* sequence reverse

* fix test

* more tests for reverse_sequence

* remove debug prints, change perms

* fix formatting; remove dead code

* make seq_lengths a parameter

* autodiff + tests

23913010

25 Apr, 2018 1 commit

CUDNN BatchNorm (inference/forward/backward) (#893) · 23ac5e5a

Chris Sullivan authored 6 years ago

* Added cudnn batch norm operation to GPU transformer.
Brought batchnorm tests out of cpu_tests and into
backend_tests. Need to add JIRA ticket for interpreter
SKIPS.

* CUDNN batchnorm is implemented. In the ForwardTraining branch
CUDNN seems to calculate the batch mean correctly but the batch variance incorrectly.
Currently the batchnorm output and mean are calculated correctly for tests:
* GPU.batchnorm_fprop_b2c2h3w3_mean_var
* GPU.batchnorm_fprop_b1c2h2w2
* GPU.batchnorm_fprop_b2c2h2w1
but the variance calculated for the batches in these tests is incorrectly calculated by CUDNN.

Also added an additional test and cleaned up some of the old tests.

* MKLDNN internally utilizes the biased estimate of the population variance
and the tests have been crafted to suit MKLDNN. According to the original
batchnorm publication (https://arxiv.org/pdf/1502.03167v3.pdf), population
(unbiased) statistics should be used for inference, and mini-batch (biased)
statistics should be used training (forward/backward). For the variance this
means utlitizing the following equations, respectively:

(biased) Var[X] = 1/m * Sum_i(x_i-mu)^2 :: used in training
(unbiased) Var[X] = 1/(m-1) * Sum_i(x_i-mu)^2 :: used in inference

s.t. x_i are elements of X and m = N*D*H*W.

For large batch sizes in inference this may not impact convergence as m >> 1,
but for small batch sizes it will. CUDNN internally utilizes the unbiased
variance.

Changes:
* Added Multiply op to Forward pass of batchnorm to convert
the unbiased variance to a biased one. The op utilizes the
blending scaling factors to apply the bias factor.
* Adds emission for the BatchNormBackprop kernel and cleans up
the emitter implementation.

* Added hashing to cudnn::batchnorm op.

* Formatting.

* Changed hashing of epsilon in cudnn batchnorm.

* Remove implicit conversion and default case in switch for bn.

* Added skips for IE transformer on batchnorm.

* add cudnn include path to compiler.cpp

* seperate two path

* PR #892 and #825 which were recently merged both forgot skips for the GPU backend.
Adding them in as they are unimplemented ops.

* The allocation and deletion of primitives was occuring in seperate
translation units with raw c pointers. Because of this, it was not
clear that these were being freed appropriate, nor did it indicate
ownership of the pointers.

In this commit these raw pointers have been converted over to
std::unique_ptrs such that the construction/destruction is managed
automatically. Furthermore, GPUPrimitiveEmitter::insert now only
takes an r-value reference, requiring move-semantics to indicate
that when inserting a primitive, the GPUPrimitiveEmitter takes
ownership of the pointer.

All instances of primitive creation have been modified.

* CUDNN_SAFE_CALL

* Removed redundant comment and made variable names more verbose.

* Change from conditionals to case-switch in pooling to conform to
batchnorm per @fengleitian's suggestion.

23ac5e5a

13 Apr, 2018 1 commit

Remove legacy Backend API (#848) · ec501913

Robert Kimball authored 6 years ago

* remove deprecated

* remove all legacy Backend API usage

remove deprecated files

* pull in changes from master

* fix GPU calls

* disable tests in convolution generator

* update per PR comments. Enable performance counter feature.

* update per PR comments

* fix build error

* fix conditionally compiled test :(

ec501913

04 Apr, 2018 1 commit

Support multi-output ops in Adjoints (#796) · 5f0e8dc3

Nick Korovaiko authored 6 years ago

* refactor Adjoints to support multi-output ops

* passing tests

* switch to generate_adjoints(deltas) and backprop_node

* remove debugging code

* fix error msg

* fix typo adjoitns

* fix comp errors in mnist_mlp

5f0e8dc3

02 Apr, 2018 1 commit

Pruthvi/bn to support globalstats (#783) · 1d80cabe

Pruthvi authored 6 years ago

* WIP support bn training for global_stats

(cherry picked from commit eb81a37328ea177b1d58c9eebdbb345e0fa25f0d)

* - Style fix
- Fix test case

* Addressed PR comments
- added support for bn training/inference with a same ctor
- added more verbose comments in bn header

* Fixed bn serializer and default value in bn ctor for bwd compatibility

* proposed docs change

* - Addressed PR comments
  - added support to compute bn inference/training using same mkldnn kernel with global stats

* fix unit bn relu unit test

1d80cabe

28 Mar, 2018 1 commit
- Split cpu_fusion.cpp into cpu_fusion.cpp and cpu_test.cpp; clean up headers (#748) · 23195230
  Nick Korovaiko authored 6 years ago
```
*  split cpu_fusion into cpu_fusion and cpu_test; clean up headers

* fix formatting

* add new line to tensor_mask.hpp
```
  23195230