• Chris Sullivan's avatar
    RNN fusion (inference) (#1459) · 4df5ea8b
    Chris Sullivan authored
    * Add op::Sigmoid to nvgpu.
    
    * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.
    
    * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.
    
    * Formatting
    
    * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).
    
    * Add RNN fusion of LSTM cells within a recurrent layer.
    
    * Formatting.
    
    * Add fusion across RNN layers.
    
    * Formatting.
    
    * Add algebraic simplification.
    
    * Added rnn fusion tests.
    
    * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.
    
    * Formatting.
    
    * Removed print statements.
    
    * Formatting.
    
    * Committing missing file.
    
    * Remove concat inputs pass and mkldnn references.
    
    * fix cmake paths
    
    * conflict resolution with merge from master.
    
    * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.
    
    * Formatting.
    
    * Use NGRAPH_ASSERT. Formatting of intel copyright.
    
    * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.
    
    * fix wrong rnn header
    
    * Formatting.
    
    * Add back lstm op to dispatch table.
    
    * Added RNN test which shows cudnn rnn kernel is not producing correct results.
    
    * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.
    
    * Bug fix in parameter tensor packing.
    
    * Alias third output element of RNN for cell state (bug fix).
    
    * Resolve numerical correctness issue with negative values in RNN (bug fix).
    Add minimal test to evaluate LSTM and compare with values calculated by hand.
    
    * Add tensor parameter sizes to kernel hash as
    they are kernel-specific.
    
    * Add 2 layer lstm fusion test against by-hand solution.
    
    * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.
    
    * Formatting.
    
    * Finishing touches after merge: add support for macro expansed dispatch via op_tbl.
    
    * Simplify macro support for gpu ops.
    
    * Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
    Need to decide how to notify user of increased performance with >= 7200.
    
    * Revert lstm_analytic test to explicitly copy data to tensor params.
    
    * Removed namespace arg from NGRAPH_GPU_OP.
    
    * Refactored macros to different header so op_tbl only contains op list.
    
    * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.
    
    * doubles -> floats
    
    * Reorg. pass asserts, prepare to replace with non-throwing pass failures.
    
    * Remove Lstm op and replace it with Rnn.
    
    * Format
    
    * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
    Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
    results in a double free inside of the memory layout pass. Appears to be a bug
    in Reshape pass through.
    
    * Removed print statements. Add check on input data and recurrent data.
    
    * Don't reuse memory for non-destructive ops.
    
    * Add back Rnn test.
    
    * Formatting.
    
    * Clean up comments.
    
    * Update test per review comments.
    4df5ea8b
gpu_fusion.cpp 23.7 KB