1. 08 Jun, 2019 1 commit
  2. 26 Apr, 2019 1 commit
  3. 22 Feb, 2019 1 commit
  4. 18 Jan, 2019 1 commit
    • Louis Feng's avatar
      Addes backprop to BatchDot op, allows fusion in training. (#2297) · ef778693
      Louis Feng authored
      * batch dot bprop WIP.
      
      * WIP.
      
      * testing.
      
      * clean up debug code.
      
      * comments and var name change.
      
      * clean up.
      
      * format style, batch dot differentiable pass.
      
      * removed debug output.
      
      * added unit test to autodiff, refactored make_function -> make_function_from_file.
      
      * fixed build warning.
      
      * fixed gpu build error.
      
      * clang format fix.
      
      * all test_tools.cpp to find SERIALIZED_ZOO
      
      * remove cmake redef.
      
      * fix unused macro.
      
      * making test cpu only.
      
      * testing build var
      
      * macro test
      
      * verbose makefile test
      
      * style fix
      
      * verbose make
      
      * test/util needs test/models.
      
      * removed debug output.
      
      * refactor fusion type.
      
      * refactor fusion type.
      ef778693
  5. 03 Jan, 2019 1 commit
  6. 19 Dec, 2018 1 commit
  7. 07 Dec, 2018 1 commit
    • Robert Kimball's avatar
      Backend API change pre-work (#2064) · e0933553
      Robert Kimball authored
      * change compile call to return Handle
      
      * make CPU require compile() before call()
      
      * fix unit tests to call compile() before call()
      
      * fix failing ops
      
      * update unit test
      
      * revert some changes
      
      * more fixups
      
      * more diff cleanup
      
      * a few more issues addressed
      
      * more fixes
      
      * update API
      
      * more updates
      
      * fix test_ops.py
      
      * fix
      
      * another attempt to fix
      
      * fix unit test
      
      * fix test error
      e0933553
  8. 06 Dec, 2018 1 commit
    • Pruthvi's avatar
      Pruthvi/fix rnn precision (#1874) · 73da681a
      Pruthvi authored
      * - Added reorder support for rnn weights_layer/iter
      
      * i) fixed compilation issues ii) working but still observing precision error
      
      * i) fixed failing rnn unit test for DEX ii) refactored workspace in RNN mkldnn emitter
      
      * i) added support for src reorder to TNC from NTC
      
      * reorder support for rnn output fron NTC to TNC
      
      * - added support for rnn weight reorder ldgoi -> ldigo
      - code refactor for lstm/rnn kernel in mkldnn emitter
      
      * - refactor rnn mkldnnn kernel, change variable names
      
      * fix RNN codegen kernel
      
      * disbale layer rnn fusion pass, to test CI
      
      * method to validate recurrent rnn inputs
      
      * add correlated macthes for Recurrent RNN PM
      
      * - simplify reorder logic for rnn_weights
      - fix graph pattern for fusing rnn cell across time steps
      
      * do weights reorders in rnn timesteps fusion
      
      * refactored LSTM graph pass
      
      * - Bug fix for finding the lstm inputs determenstically
      - Refactored LSTM graph pass to single pass
      - made changes to LSTM RNN time step fusion graph pass
      
      * - use replace_node instead of replace_output in Lstm_step_wise fusion graph pass
      
      * fix compilation error
      
      * Fix GNMT rnn fusion
      
      * check if the node is in use before replacing in RNN graph passes
      
      *  i) fix style ii) fix topo sort issue in RNN graph pass
      
      * style fix
      
      * fix bug in simplify_concat pass
      
      * replaces Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Concat -> Lstm2 with Lstm1 -> Lstm2
      
      * cse for convert layout
      
      * addressed PR comments
      
      * - optimization pass to remove  Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Lstm2
      - conditional fusing of LSTM cells only for the decoder
      
      * made changes to multi layer RNN fusion callback
      
      * fix asserts in RNN op
      
      * - added support to fuse layers when slc=dlc for RNN cells
      - bug fix on the sanity checks for RNN Op
      
      * - support RNN layer fusion till slc = dlc
      - bug fixes in multi layer rnn fusion call back
      
      * capture reshape in the RNN weights
      
      * Addressed PR comments
      
      * - added comments in multi layer PM call back
      - fuse only if slc == DLC across layers
      
      * restore deleted 3_lstm_cell_forward.json file
      
      * fix typo
      
      * fix failing unit tets
      
      * When processing in place slice, do not change the offset of the slice node if the argument pointer comes from function input.
      
      * Address PR feedback: process in place slice after propagating in place input.
      
      * Set INTERMEDIATE role before propagating in place input.
      
      * Do not add temporaries to the variable name map before propagating in place input in codegen.
      
      * Fix a bug in codegen.
      
      * Fix a bug in codegen slice.
      
      * reenable disabled rnn unit test
      
      * fix compiler error
      
      * - bug fix in the slicing logic for the layer fused rnn cell
      - fix failing rnn unit test
      
      * - Addressed PR comments
      - removed redundant checks from the rnn graph pass
      - simplified rnn call back replace node logic
      
      * - added new multilayer rnn *.json file
      - fix test case
      
      * [PRIVATE BRANCH] Style fixes (#2080)
      
      * Style fixes
      
      * change order of lstm gates
      
      * [PRIVATE BRANCH] Jbobba/rnn fusion review (#2113)
      
      * Style fixes for single-layer RNN fusion
      
      * Style fixes to multi-layer RNN
      
      * style fix
      
      * disable GPU test
      73da681a
  9. 16 Nov, 2018 1 commit
  10. 11 Nov, 2018 1 commit
    • Fenglei's avatar
      add isfinite check for all_close (#2028) · 702d465a
      Fenglei authored
      * add isfinite check
      
      * style
      
      * output 5 diff and total diff
      
      * output limit of diff for all_close_f
      
      * dix bug
      
      * disable tests
      
      * remove failing unit test that does not make sense.
      702d465a
  11. 12 Oct, 2018 1 commit
  12. 05 Oct, 2018 1 commit
    • Chris Sullivan's avatar
      RNN fusion (inference) (#1459) · 4df5ea8b
      Chris Sullivan authored
      * Add op::Sigmoid to nvgpu.
      
      * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.
      
      * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.
      
      * Formatting
      
      * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).
      
      * Add RNN fusion of LSTM cells within a recurrent layer.
      
      * Formatting.
      
      * Add fusion across RNN layers.
      
      * Formatting.
      
      * Add algebraic simplification.
      
      * Added rnn fusion tests.
      
      * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.
      
      * Formatting.
      
      * Removed print statements.
      
      * Formatting.
      
      * Committing missing file.
      
      * Remove concat inputs pass and mkldnn references.
      
      * fix cmake paths
      
      * conflict resolution with merge from master.
      
      * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.
      
      * Formatting.
      
      * Use NGRAPH_ASSERT. Formatting of intel copyright.
      
      * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.
      
      * fix wrong rnn header
      
      * Formatting.
      
      * Add back lstm op to dispatch table.
      
      * Added RNN test which shows cudnn rnn kernel is not producing correct results.
      
      * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.
      
      * Bug fix in parameter tensor packing.
      
      * Alias third output element of RNN for cell state (bug fix).
      
      * Resolve numerical correctness issue with negative values in RNN (bug fix).
      Add minimal test to evaluate LSTM and compare with values calculated by hand.
      
      * Add tensor parameter sizes to kernel hash as
      they are kernel-specific.
      
      * Add 2 layer lstm fusion test against by-hand solution.
      
      * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.
      
      * Formatting.
      
      * Finishing touches after merge: add support for macro expansed dispatch via op_tbl.
      
      * Simplify macro support for gpu ops.
      
      * Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
      Need to decide how to notify user of increased performance with >= 7200.
      
      * Revert lstm_analytic test to explicitly copy data to tensor params.
      
      * Removed namespace arg from NGRAPH_GPU_OP.
      
      * Refactored macros to different header so op_tbl only contains op list.
      
      * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.
      
      * doubles -> floats
      
      * Reorg. pass asserts, prepare to replace with non-throwing pass failures.
      
      * Remove Lstm op and replace it with Rnn.
      
      * Format
      
      * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
      Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
      results in a double free inside of the memory layout pass. Appears to be a bug
      in Reshape pass through.
      
      * Removed print statements. Add check on input data and recurrent data.
      
      * Don't reuse memory for non-destructive ops.
      
      * Add back Rnn test.
      
      * Formatting.
      
      * Clean up comments.
      
      * Update test per review comments.
      4df5ea8b