1. 12 Oct, 2018 1 commit
  2. 05 Oct, 2018 1 commit
    • Chris Sullivan's avatar
      RNN fusion (inference) (#1459) · 4df5ea8b
      Chris Sullivan authored
      * Add op::Sigmoid to nvgpu.
      
      * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.
      
      * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.
      
      * Formatting
      
      * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).
      
      * Add RNN fusion of LSTM cells within a recurrent layer.
      
      * Formatting.
      
      * Add fusion across RNN layers.
      
      * Formatting.
      
      * Add algebraic simplification.
      
      * Added rnn fusion tests.
      
      * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.
      
      * Formatting.
      
      * Removed print statements.
      
      * Formatting.
      
      * Committing missing file.
      
      * Remove concat inputs pass and mkldnn references.
      
      * fix cmake paths
      
      * conflict resolution with merge from master.
      
      * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.
      
      * Formatting.
      
      * Use NGRAPH_ASSERT. Formatting of intel copyright.
      
      * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.
      
      * fix wrong rnn header
      
      * Formatting.
      
      * Add back lstm op to dispatch table.
      
      * Added RNN test which shows cudnn rnn kernel is not producing correct results.
      
      * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.
      
      * Bug fix in parameter tensor packing.
      
      * Alias third output element of RNN for cell state (bug fix).
      
      * Resolve numerical correctness issue with negative values in RNN (bug fix).
      Add minimal test to evaluate LSTM and compare with values calculated by hand.
      
      * Add tensor parameter sizes to kernel hash as
      they are kernel-specific.
      
      * Add 2 layer lstm fusion test against by-hand solution.
      
      * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.
      
      * Formatting.
      
      * Finishing touches after merge: add support for macro expansed dispatch via op_tbl.
      
      * Simplify macro support for gpu ops.
      
      * Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
      Need to decide how to notify user of increased performance with >= 7200.
      
      * Revert lstm_analytic test to explicitly copy data to tensor params.
      
      * Removed namespace arg from NGRAPH_GPU_OP.
      
      * Refactored macros to different header so op_tbl only contains op list.
      
      * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.
      
      * doubles -> floats
      
      * Reorg. pass asserts, prepare to replace with non-throwing pass failures.
      
      * Remove Lstm op and replace it with Rnn.
      
      * Format
      
      * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
      Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
      results in a double free inside of the memory layout pass. Appears to be a bug
      in Reshape pass through.
      
      * Removed print statements. Add check on input data and recurrent data.
      
      * Don't reuse memory for non-destructive ops.
      
      * Add back Rnn test.
      
      * Formatting.
      
      * Clean up comments.
      
      * Update test per review comments.
      4df5ea8b