Commits · b007440aaa99fddd3518d31d5c2a9155bf3b437d · submodule / ngraph

27 Feb, 2019 1 commit

Reuse memory for CPU backend. (#2238) · b277627a

Amy Zhuang authored 5 years ago

* Reuse memory for CPU backend.

* Use NGRAPH_REUSE_MEMORY to enable memory reuse.

* Add a test.

* Move make_function to test_tools.cpp.

* Add more comments.

* Address PR Feedback: add a method to CPU backend.

* *Add a member to CPUOpAnnotations to remove redundant code.

*Overload compile function for CPU backend.

* Move make_function out of test_tools.

* Address PR Feedback.

* Use modified liveness analysis in CPUMemoryAssignment pass.

* Use lambda expression.

* Fix style error.

* Check if any user of the tensor has destructive io when building tensor alias map.

* Fix a bug.

* Check if tensor has multiple users.

* Allow tensor alias for destructive oi node.

* Update multiple_users_tensor set along the chain of in place ops.

* No tensor alias if input is parameter or constant.

* Use buffer sets in cpu memory assignment,
tensors sharing the same memory buffer are put into the same set.

* Add more checks and do not combine sets when allowing destructive oi.

* Style fix.

* Do no allow destructive oi if the input tensor uses function input memory.

Update set label.

* Add unit tests.

* Style fix.

* Get the correct size for memcpy when the input is padded.

* Style fix.

* Address PR feedback.

* Address PR feedback.

* Move make_function in cpu_test after #if 0 and before the disabled test.

* Add utility functions.

Use iterator.

Rename variables.

* Add pass attributes and move cpu memory assignment to common passes (#2504)

b277627a

25 Feb, 2019 1 commit

Pruthvi/bi rnn (#2232) · a444f7a9

Pruthvi authored 5 years ago

* - Added reorder support for rnn weights_layer/iter

* i) fixed compilation issues ii) working but still observing precision error

* i) fixed failing rnn unit test for DEX ii) refactored workspace in RNN mkldnn emitter

* i) added support for src reorder to TNC from NTC

* reorder support for rnn output fron NTC to TNC

* - added support for rnn weight reorder ldgoi -> ldigo
- code refactor for lstm/rnn kernel in mkldnn emitter

* - refactor rnn mkldnnn kernel, change variable names

* fix RNN codegen kernel

* disbale layer rnn fusion pass, to test CI

* method to validate recurrent rnn inputs

* add correlated macthes for Recurrent RNN PM

* - simplify reorder logic for rnn_weights
- fix graph pattern for fusing rnn cell across time steps

* do weights reorders in rnn timesteps fusion

* refactored LSTM graph pass

* - Bug fix for finding the lstm inputs determenstically
- Refactored LSTM graph pass to single pass
- made changes to LSTM RNN time step fusion graph pass

* - use replace_node instead of replace_output in Lstm_step_wise fusion graph pass

* fix compilation error

* Fix GNMT rnn fusion

* check if the node is in use before replacing in RNN graph passes

*  i) fix style ii) fix topo sort issue in RNN graph pass

* style fix

* fix bug in simplify_concat pass

* replaces Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Concat -> Lstm2 with Lstm1 -> Lstm2

* cse for convert layout

* addressed PR comments

* - optimization pass to remove  Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Lstm2
- conditional fusing of LSTM cells only for the decoder

* made changes to multi layer RNN fusion callback

* fix asserts in RNN op

* - added support to fuse layers when slc=dlc for RNN cells
- bug fix on the sanity checks for RNN Op

* - support RNN layer fusion till slc = dlc
- bug fixes in multi layer rnn fusion call back

* capture reshape in the RNN weights

* Addressed PR comments

* - added comments in multi layer PM call back
- fuse only if slc == DLC across layers

* restore deleted 3_lstm_cell_forward.json file

* fix typo

* fix failing unit tets

* When processing in place slice, do not change the offset of the slice node if the argument pointer comes from function input.

* Address PR feedback: process in place slice after propagating in place input.

* Set INTERMEDIATE role before propagating in place input.

* Do not add temporaries to the variable name map before propagating in place input in codegen.

* Fix a bug in codegen.

* Fix a bug in codegen slice.

* reenable disabled rnn unit test

* fix compiler error

* - bug fix in the slicing logic for the layer fused rnn cell
- fix failing rnn unit test

* - Addressed PR comments
- removed redundant checks from the rnn graph pass
- simplified rnn call back replace node logic

* - added new multilayer rnn *.json file
- fix test case

* [PRIVATE BRANCH] Style fixes (#2080)

* Style fixes

* change order of lstm gates

* WIP bi rnn

* [PRIVATE BRANCH] Jbobba/rnn fusion review (#2113)

* Style fixes for single-layer RNN fusion

* Style fixes to multi-layer RNN

* added callback routine for bi-directional rnn

* fix rnn op ctor, rnn mkldnn emitter to accomodate bi directional rnn

* style fix

* added helper function for rnn's to query direction and cell_type

* fix clang error

* - unit test case for bi rnn fusion
- style fix

* - updated bi-rnn graph pass to handle reverse and reverse_seq ops in the predicate
- added bi-rnn inter v/s cpu unit test case
- add support to in mkldnn_utils to create_md with tnc/ntc format

* - added enum type to deduce rnn_type

* Addressed PR comments
    - handle reshapes from {t, n, c} to {n, t, c} in the graph pass

* fix style

* fix clang error

* fix style

* i) move enum specific to rnn to seperate header

a444f7a9

02 Feb, 2019 1 commit

Pruthvi/fix input matrix fusion (#2381) · 917efb94

Pruthvi authored 5 years ago

* -   check to verify if the data_slices shares the same weights

* add the serialized graph

* - explicitly fuse the data slices, so all the parameter partitioned by slices are in contigous memory location
- fixes all the failing test cases

917efb94

06 Dec, 2018 1 commit

Pruthvi/fix rnn precision (#1874) · 73da681a

Pruthvi authored 6 years ago

* - Added reorder support for rnn weights_layer/iter

* i) fixed compilation issues ii) working but still observing precision error

* i) fixed failing rnn unit test for DEX ii) refactored workspace in RNN mkldnn emitter

* i) added support for src reorder to TNC from NTC

* reorder support for rnn output fron NTC to TNC

* - added support for rnn weight reorder ldgoi -> ldigo
- code refactor for lstm/rnn kernel in mkldnn emitter

* - refactor rnn mkldnnn kernel, change variable names

* fix RNN codegen kernel

* disbale layer rnn fusion pass, to test CI

* method to validate recurrent rnn inputs

* add correlated macthes for Recurrent RNN PM

* - simplify reorder logic for rnn_weights
- fix graph pattern for fusing rnn cell across time steps

* do weights reorders in rnn timesteps fusion

* refactored LSTM graph pass

* - Bug fix for finding the lstm inputs determenstically
- Refactored LSTM graph pass to single pass
- made changes to LSTM RNN time step fusion graph pass

* - use replace_node instead of replace_output in Lstm_step_wise fusion graph pass

* fix compilation error

* Fix GNMT rnn fusion

* check if the node is in use before replacing in RNN graph passes

*  i) fix style ii) fix topo sort issue in RNN graph pass

* style fix

* fix bug in simplify_concat pass

* replaces Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Concat -> Lstm2 with Lstm1 -> Lstm2

* cse for convert layout

* addressed PR comments

* - optimization pass to remove  Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Lstm2
- conditional fusing of LSTM cells only for the decoder

* made changes to multi layer RNN fusion callback

* fix asserts in RNN op

* - added support to fuse layers when slc=dlc for RNN cells
- bug fix on the sanity checks for RNN Op

* - support RNN layer fusion till slc = dlc
- bug fixes in multi layer rnn fusion call back

* capture reshape in the RNN weights

* Addressed PR comments

* - added comments in multi layer PM call back
- fuse only if slc == DLC across layers

* restore deleted 3_lstm_cell_forward.json file

* fix typo

* fix failing unit tets

* When processing in place slice, do not change the offset of the slice node if the argument pointer comes from function input.

* Address PR feedback: process in place slice after propagating in place input.

* Set INTERMEDIATE role before propagating in place input.

* Do not add temporaries to the variable name map before propagating in place input in codegen.

* Fix a bug in codegen.

* Fix a bug in codegen slice.

* reenable disabled rnn unit test

* fix compiler error

* - bug fix in the slicing logic for the layer fused rnn cell
- fix failing rnn unit test

* - Addressed PR comments
- removed redundant checks from the rnn graph pass
- simplified rnn call back replace node logic

* - added new multilayer rnn *.json file
- fix test case

* [PRIVATE BRANCH] Style fixes (#2080)

* Style fixes

* change order of lstm gates

* [PRIVATE BRANCH] Jbobba/rnn fusion review (#2113)

* Style fixes for single-layer RNN fusion

* Style fixes to multi-layer RNN

* style fix

* disable GPU test

73da681a

26 Sep, 2018 1 commit

add nGraph quantize op (#1661) · d640fac3

Adam Straw authored 6 years ago

* adding nGraph Quantize op

* unit test failing for floating point exception

* unit test working in float

* unit test working in uint8

* improved type checking and polished unit test - passing

* quantized axes working

* inclusive project method

* add round mode

* TODO cleanup

* code format

* adding serializer support - fails build

* add serializer support

* make CPU quantize op work; new tests for int8, clamp)

* fix build failure

* fix GPU build issue

* fix GPU unit test manifest

* use quantized offset

* add is_quantized field to element::Type

* add reduce function to coordinate.hpp

d640fac3

30 Jun, 2018 1 commit

Pruthvi/fix rnn output (#1135) · c4c24cb0

Pruthvi authored 6 years ago

* - Fixed replace output for the multi layer recurrent cell state tensor output
- Modified rnn add_output to consider direction and n_layer while calculating the output size for mkldnn dst_layer and dst_iter

* fix unit test failure

c4c24cb0

15 Jun, 2018 1 commit

RNN fusion across layers (#1085) · f75b8006

Pruthvi authored 6 years ago

* - Added graph pass for fusing RNN op across layer
- Added test case for inter v/s cpu for verifying layer fused RNN
- more sanity checks in the RNN fusion graph pass
- added support to replace the recurrent cell state correctly in the fused RNN op

* Fixed multi layer rnn fusion unit test failure

* Addressed PR comments

f75b8006

07 Jun, 2018 1 commit

ngraph-1676 batch dot fusion (#1071) · 6f5e3ac7

Louis Feng authored 6 years ago

* batch dot pattern wip.

* batch dot pattern wip.

* added batch dot op.

* batch dot compute testing.

* correct gemm parameters.

* renaming matrix fusions passes and update tests.

* clean up.

* clang format.

* more clean ups.

* clang format.

* added CPUBatchDotFusion to default cpu passes.

* added missing header.

* added element type check.

6f5e3ac7

31 May, 2018 1 commit
- NGRAPH-1605 Sigmoid multiply fusion (#964) · 5a7d60a1
  Louis Feng authored 6 years ago
  
  5a7d60a1
23 May, 2018 1 commit

LSTM fusion + RNN fusion across time slice's for single layer (#826) · 1d08f073

Pruthvi authored 6 years ago

* - Added pattren matcher for LSTM cell

* WIP added support to replace lstm cell instead of subgraph

* WIP LSTM pattern matcher, fuses recurrent cells

* WIP added RNN CPU op

* WIP mkldnn emmiter code for fprop RNN

* WIP RNN mkldnn integration
- Added mkldnn kernel for uni directional LSTM in the CPU emitter

* add a getter for root node

* recurrent graph rewrite

* fix perms, rename match_root -> get_match_root

* fix comp errors

* make match_root return the topmost match; fix tests

* - WIP GetOutputElement for handling multiple LSTM o/ps
- use RecurrentGraphRewrite for replacing node after matching LSTM cells

* WIP LSTM multi Output + debug prints

* moved LSTM fusion to cpu_fusion

* WIP added RNN superfused OP

* WIP towards RNN layer fusion

* WIP multiple output slicing RNN

* WIP RNN mulitple o/ps fusion across layer

* WIP corrected input params for fused RNN OP

* concat corrosponding param's across differnt LSTM to form inputs to RNN fused op

* i) Added  test case for RNN kernel ii) runs without error's

* refactored and moved LSTM class to standalone file

* Rename RNN -> Rnn , LSTM -> Lstm

* WIP replace lstm slices to the consumer op

* Slicing works on multiple RNN layers

* fixed all bugs

* - Added CPU RNN Recurrent Fusion
- Added CPU LSTM fusion
- removed debug code
- style fix

* - Added support to compute src_iter and dst_iter instead of taking zero_memory_desc
- Added unit test to compute one LSTM cell

*  changed RNN op signature to accept number of states in basic unit of RNN(GRU/LSTM/ vanilla RNN) cell

* added sanity checks for RNN op

* Fixed issue related to patching the graph while replacing the RNN sliced outputs

* Fixed issue to feed the input symbols in the order X0, X1, ...Xt to the RNN op

* Added unit test for multi layer RNN fusion

* Removed debug statements

* Added mulitlayered serialized graph ii) fixed compilation issue

* Addressed PR comments

* i) WIP MKLDNN layout for RNN Op ii) added test case for INTERPRETER v/s CPU Rnn results

* - Fixed bug w.r.to src_layer feature size in rnn mkldnn emitter code
- Refactored cpu_fusion rnn test case

* merge origin/master with branch pruthvi/lstm_fusion

* style fix

* Added test case for multiple RNN layers

* i) make rnn as mkldnn op if it meets the constraints ii) assert if rnn is not mkldnn op

* fix unit test failure

* - Added support to reliabily identify the hiddent state and input symbols from the nodes collected by Pattern matcher
- Fixed failing unit tests

* style fix

* - removed "node type" dependency to replace the intermediate LSTM outputs

* Addressed PR comments

* Fix unit test

* - added MKLDNN emitter for LSTM op
- graph pass to concat LSTM input recurrent state tensors
- CPU layout assignment for LSTM Op
- Fixed bug in rnn/lstm unit test's
- made changes to use replace_output instead of replace_node for replacing matched graph nodes in LSTM/RNN fusion pass

(cherry picked from commit d16fc709265cc0a73e60c6d5f6d2878e7b908aca)

* style fix

* Renamed passes and style fixes

1d08f073

30 Mar, 2018 1 commit

RNN Fusion using Pattern Matcher (#741) · 2db236b7

Nick Korovaiko authored 6 years ago

* initial refactoring using PM

* unit test pass

* cosmetic changes

* add another rnn test

* address louis' feedback

* lower-case labels

2db236b7

09 Mar, 2018 1 commit

Pruthvi/sigmoid (#614) · 5885c09a

Pruthvi authored 6 years ago

* - Added sigmoid fusion pass
- added mkldnn emitter code for sigmoid

* - corrected sigmoid expected values
- add layout assignment for sigmoid op

* - added assert's in cpu fusion for sigmoid
- style fix

* remove debug prints

* NGMX-371 #comment addressed PR comments - Added sigmoid unit test case with 3D input ii) support in cpu_emmiter for sigmoid to handle all input shapes

* NGMX-371 #comment use shape_size() to calculate the 1d input size

5885c09a

20 Feb, 2018 1 commit
- add mxnet sockeye Seq2Seq model (#508) · 607bcbc4
  Ashok Emani authored 6 years ago
```
* add mxnet sockeye Seq2Seq model

* update test with sockeye model
```
  607bcbc4
14 Feb, 2018 1 commit

pattern matcher for BatchnormFprop + mkldnn integration in the CPU emitter (#468) · 34b1322d

Pruthvi authored 6 years ago

* fuse dot(a,b) + c

cblas_gemm working on mlp

rebase & small fixes

enable debug output

support replacing function's outputs

* WIP pattern matching for variance

* - Added pattern matcher graph to look up variance(sub graph) in bn
- Added test case to verify the variance graph pattern

* added batch norm mean pattern matcher.

* remove reshapes

(cherry picked from commit ecad321fb1b1bc3f7facda229beb940118ca0701)

* fixed mean test to use Matcher.

* resolve merge conflict in test/pattern.cpp

* WIP bn fprop pattern

* fprop bn fusion working

* - Added unit test case to read the bn serializeed *.json file and run bn fprop fusion pass
- Added batchnorm header file and defined the bn class to emit the mkldnn kernel
- Added pattern matcher for fprop bn in CPU graph_rewrite pass

* WIP MKLDNN fprop bn emitter code

* completed fprop batchnorm kernel in CPU emitter

* fixed bug in the emitter code for fprop bn

* - Fixed copilation issues
- unit tests are passing for bn emitter fprop code

* Added support to compute fprop bn with mean annd variance as input

* resolved compilation issues

* refactored bn fprop code

* - added batchnorm src file to the CMakeFilelist
- moved bn fusion under CPU runtime/pass/cpu_fusion
- fixed compilation issue

* Resolved compilation issues in bn emitted code

* Addded debug statements in fprop bn emitted code

* added batchnorm.cpp src file

* - Added test case to test fprop batchnorm with known tensor values
- fixed bug related to defining weights in fprop bn

* - Added test case for fprop batchnorm Op
- Added test case for mean and variance pattern matcher
- Added fprop bn *.json file with input having 4dmis mb2c3h2w2
- refactored fprop bn op class

* Style fix

* - Removed Debug symbols

* - Fixed header template with correct year
- appended mkldnn.hpp in the CPU generated code

*  Addressed PR review comments
 -  added support for batchnorm op in serializer and de-serializer
 - added more sanity in bn constructor
 - renamed "BatchnormFprop" -> BatchNorm

* - Addressed PR review comments
- replaced auto with speicfic mkldnn::type in emitted bn kernel
- modified function signature to take 'eps' as double instead of <Node> type

* added missing header files, resolved compilation issue

* style fix

* Addressed PR comments
1. initilized member variables for bn in the same order as they are defined
2. renamed bn member variables to start with m_* as per coding convention
3. moved bn fusion test to test/cpu_fusion.cpp
4. style fix
5. added more checks to evaluate type and shape of inputs to bn

* Added support for EMITDECL macro for batchnorm

* - made correction to batchnorm src file name batchnorm -> batch_norm as per coding guidelines
- corrected bn copy_with_new_args() method

* Removed redundant SqrtOp support in serializer

34b1322d

01 Feb, 2018 1 commit

Reshape Transformations + Simplification pass (#427) · f5930d37

Nick Korovaiko authored 6 years ago

* simplification pass

* serializer change to test models

* some small test fixes

* addressing Scott's feedback

* missed one nn

* formatting fixes

* simplification -> reshape_elimination

f5930d37

17 Jan, 2018 1 commit
- Add mxnet seq2seq serialized model for benchmarking (#385) · 5ad1de22
  Robert Kimball authored 7 years ago
```
* add mxnet seq2seq forward and backward

* add benchmarks for seq2seq forward and backward
```
  5ad1de22
29 Dec, 2017 1 commit

Get value types out of public API, multi-values from Function (#340) · d092cb91

Scott Cyphers authored 7 years ago

* Function can have multiple results
Remove external use of ValueType, TupleType, Tuple
Remove many external uses of Output and Input

* corresponding CPU backend changes

* Update master changes.

* Remove type arg from Function, add changes.md

* Merge changes.

* Move bodies to .cpp, add brief doc

* Merge CPU changes.

* Remove xla includes from non-xla files

* Remove xla from tests

* First part of xla tuple support

* change fprop_cache to assume multi-output bprop functions

* New wrappers for handling tuples with XLA

* Review comments

* remove old xla files

* fix merge errors

* hand edit models to use multi output instead of tuples

d092cb91

28 Dec, 2017 1 commit
- Add bigger models to performance benchmarks (#342) · 2d2fc8c2
  Robert Kimball authored 7 years ago
```
* add larger test models
```
  2d2fc8c2
12 Dec, 2017 1 commit
- MNIST MLP benchmark test · 0014de5f
  Robert Kimball authored 7 years ago
```
LSTM benchmark test

performance counters
```
  0014de5f