• Pruthvi's avatar
    LSTM fusion + RNN fusion across time slice's for single layer (#826) · 1d08f073
    Pruthvi authored
    * - Added pattren matcher for LSTM cell
    
    * WIP added support to replace lstm cell instead of subgraph
    
    * WIP LSTM pattern matcher, fuses recurrent cells
    
    * WIP added RNN CPU op
    
    * WIP mkldnn emmiter code for fprop RNN
    
    * WIP RNN mkldnn integration
    - Added mkldnn kernel for uni directional LSTM in the CPU emitter
    
    * add a getter for root node
    
    * recurrent graph rewrite
    
    * fix perms, rename match_root -> get_match_root
    
    * fix comp errors
    
    * make match_root return the topmost match; fix tests
    
    * - WIP GetOutputElement for handling multiple LSTM o/ps
    - use RecurrentGraphRewrite for replacing node after matching LSTM cells
    
    * WIP LSTM multi Output + debug prints
    
    * moved LSTM fusion to cpu_fusion
    
    * WIP added RNN superfused OP
    
    * WIP towards RNN layer fusion
    
    * WIP multiple output slicing RNN
    
    * WIP RNN mulitple o/ps fusion across layer
    
    * WIP corrected input params for fused RNN OP
    
    * concat corrosponding param's across differnt LSTM to form inputs to RNN fused op
    
    * i) Added  test case for RNN kernel ii) runs without error's
    
    * refactored and moved LSTM class to standalone file
    
    * Rename RNN -> Rnn , LSTM -> Lstm
    
    * WIP replace lstm slices to the consumer op
    
    * Slicing works on multiple RNN layers
    
    * fixed all bugs
    
    * - Added CPU RNN Recurrent Fusion
    - Added CPU LSTM fusion
    - removed debug code
    - style fix
    
    * - Added support to compute src_iter and dst_iter instead of taking zero_memory_desc
    - Added unit test to compute one LSTM cell
    
    *  changed RNN op signature to accept number of states in basic unit of RNN(GRU/LSTM/ vanilla RNN) cell
    
    * added sanity checks for RNN op
    
    * Fixed issue related to patching the graph while replacing the RNN sliced outputs
    
    * Fixed issue to feed the input symbols in the order X0, X1, ...Xt to the RNN op
    
    * Added unit test for multi layer RNN fusion
    
    * Removed debug statements
    
    * Added mulitlayered serialized graph ii) fixed compilation issue
    
    * Addressed PR comments
    
    * i) WIP MKLDNN layout for RNN Op ii) added test case for INTERPRETER v/s CPU Rnn results
    
    * - Fixed bug w.r.to src_layer feature size in rnn mkldnn emitter code
    - Refactored cpu_fusion rnn test case
    
    * merge origin/master with branch pruthvi/lstm_fusion
    
    * style fix
    
    * Added test case for multiple RNN layers
    
    * i) make rnn as mkldnn op if it meets the constraints ii) assert if rnn is not mkldnn op
    
    * fix unit test failure
    
    * - Added support to reliabily identify the hiddent state and input symbols from the nodes collected by Pattern matcher
    - Fixed failing unit tests
    
    * style fix
    
    * - removed "node type" dependency to replace the intermediate LSTM outputs
    
    * Addressed PR comments
    
    * Fix unit test
    
    * - added MKLDNN emitter for LSTM op
    - graph pass to concat LSTM input recurrent state tensors
    - CPU layout assignment for LSTM Op
    - Fixed bug in rnn/lstm unit test's
    - made changes to use replace_output instead of replace_node for replacing matched graph nodes in LSTM/RNN fusion pass
    
    (cherry picked from commit d16fc709265cc0a73e60c6d5f6d2878e7b908aca)
    
    * style fix
    
    * Renamed passes and style fixes
    1d08f073
cpu_fusion.cpp 62.1 KB