RNN fusion (inference) (#1459)
* Add op::Sigmoid to nvgpu. * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur. * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion. * Formatting * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy). * Add RNN fusion of LSTM cells within a recurrent layer. * Formatting. * Add fusion across RNN layers. * Formatting. * Add algebraic simplification. * Added rnn fusion tests. * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt. * Formatting. * Removed print statements. * Formatting. * Committing missing file. * Remove concat inputs pass and mkldnn references. * fix cmake paths * conflict resolution with merge from master. * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission. * Formatting. * Use NGRAPH_ASSERT. Formatting of intel copyright. * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size. * fix wrong rnn header * Formatting. * Add back lstm op to dispatch table. * Added RNN test which shows cudnn rnn kernel is not producing correct results. * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed. * Bug fix in parameter tensor packing. * Alias third output element of RNN for cell state (bug fix). * Resolve numerical correctness issue with negative values in RNN (bug fix). Add minimal test to evaluate LSTM and compare with values calculated by hand. * Add tensor parameter sizes to kernel hash as they are kernel-specific. * Add 2 layer lstm fusion test against by-hand solution. * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer. * Formatting. * Finishing touches after merge: add support for macro expansed dispatch via op_tbl. * Simplify macro support for gpu ops. * Add CUDNN_VERSION >= 7200 defguards for RNN fusion. Need to decide how to notify user of increased performance with >= 7200. * Revert lstm_analytic test to explicitly copy data to tensor params. * Removed namespace arg from NGRAPH_GPU_OP. * Refactored macros to different header so op_tbl only contains op list. * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>. * doubles -> floats * Reorg. pass asserts, prepare to replace with non-throwing pass failures. * Remove Lstm op and replace it with Rnn. * Format * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts. Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models results in a double free inside of the memory layout pass. Appears to be a bug in Reshape pass through. * Removed print statements. Add check on input data and recurrent data. * Don't reuse memory for non-destructive ops. * Add back Rnn test. * Formatting. * Clean up comments. * Update test per review comments.
Showing
This diff is collapsed.
This diff is collapsed.
test/gpu_fusion.cpp
0 → 100644
This diff is collapsed.
Please
register
or
sign in
to comment