1. 06 Dec, 2018 7 commits
    • gcwenger's avatar
      Graph comparison - isolated per op testing (#2144) · 1feb49f1
      gcwenger authored
      * Isolated per op testing when comparing graphs for better determination of source of accuracy divergence.
      
      * Improve clarity of comment
      1feb49f1
    • Michał Karzyński's avatar
      [Py] Update README for PyPI (#2151) · 8a9cf8aa
      Michał Karzyński authored
      * Update README for PyPI
      
      * Update README for PyPI
      
      * Remove redundant newlines
      
      * Fix links
      8a9cf8aa
    • Adam Rogowiec's avatar
      [Py] setup.py code style formatting. (#2164) · 8249bf9f
      Adam Rogowiec authored
      * Uniform quotes style .
      
      * Fix comment style.
      
      * Check setup.py with flake8.
      
      - Fix flake8 errors.
      
      * Move function out of class scope.
      
      * Fix function paramter list
      
      * Fix formatting.
      8249bf9f
    • Fenglei's avatar
      nvgpu cuda reduce with stable sum (#2076) · 606f3f93
      Fenglei authored
      * add some helper function
      
      * update with new helper function
      
      * update reduce to nd with new helper function
      
      * update float sum to stable sum
      
      * fix bug
      
      * update all reduce to stable sum for float
      
      * fix bug and pass the sum stable test
      
      * remove debug info
      
      * style
      
      * update with shape
      
      * fix bug
      
      * add host parameters to cuda_emitter
      
      * clang format
      
      * fix bugs
      
      * add element::type support
      
      * format
      
      * add a cached value with datatype name
      
      * add init_reduce_value
      
      * unroll loop
      
      * optimization
      
      * remove the need for init_value
      
      * add memset kernel
      
      * add memcpy
      
      * working version
      
      * remove debug info
      
      * add comments, clean up code.
      
      * change in_idx to input_idx
      
      * fix bug
      
      * change args name for memset in emitter
      
      * pass element::Type instead of string
      
      * the op::reduce come with init value, add support
      
      * resolve codacy-bot comment
      
      * fix bug
      
      * resove codacy-bot comment
      
      * remove unused comments, resolve comments
      
      * cuda reduce for max, min, mul, reduce op init value, format
      
      * use type::info
      
      * use type info for numeric_limits
      
      * remove code from gpu_host_parameters
      
      * header
      
      * remvoe outdated comments
      
      * add helper to check if stable sum is needed
      
      * add stable sum test for double
      
      * remove extra line
      
      * consolidate helper functions
      
      * no need list now.
      
      * remove extra ;
      
      * clang format
      
      * style
      
      * add skip test for cpu and intelGPU side
      
      * add line between groups of headers
      
      * add two simple stable sum test for float and double
      
      * skip test for intelGPU
      606f3f93
    • Fabian Boemer's avatar
      Fix compiler error GCC with 7.1 (#2155) · 4b0445d1
      Fabian Boemer authored
      4b0445d1
    • Pruthvi's avatar
      Pruthvi/fix rnn precision (#1874) · 73da681a
      Pruthvi authored
      * - Added reorder support for rnn weights_layer/iter
      
      * i) fixed compilation issues ii) working but still observing precision error
      
      * i) fixed failing rnn unit test for DEX ii) refactored workspace in RNN mkldnn emitter
      
      * i) added support for src reorder to TNC from NTC
      
      * reorder support for rnn output fron NTC to TNC
      
      * - added support for rnn weight reorder ldgoi -> ldigo
      - code refactor for lstm/rnn kernel in mkldnn emitter
      
      * - refactor rnn mkldnnn kernel, change variable names
      
      * fix RNN codegen kernel
      
      * disbale layer rnn fusion pass, to test CI
      
      * method to validate recurrent rnn inputs
      
      * add correlated macthes for Recurrent RNN PM
      
      * - simplify reorder logic for rnn_weights
      - fix graph pattern for fusing rnn cell across time steps
      
      * do weights reorders in rnn timesteps fusion
      
      * refactored LSTM graph pass
      
      * - Bug fix for finding the lstm inputs determenstically
      - Refactored LSTM graph pass to single pass
      - made changes to LSTM RNN time step fusion graph pass
      
      * - use replace_node instead of replace_output in Lstm_step_wise fusion graph pass
      
      * fix compilation error
      
      * Fix GNMT rnn fusion
      
      * check if the node is in use before replacing in RNN graph passes
      
      *  i) fix style ii) fix topo sort issue in RNN graph pass
      
      * style fix
      
      * fix bug in simplify_concat pass
      
      * replaces Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Concat -> Lstm2 with Lstm1 -> Lstm2
      
      * cse for convert layout
      
      * addressed PR comments
      
      * - optimization pass to remove  Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Lstm2
      - conditional fusing of LSTM cells only for the decoder
      
      * made changes to multi layer RNN fusion callback
      
      * fix asserts in RNN op
      
      * - added support to fuse layers when slc=dlc for RNN cells
      - bug fix on the sanity checks for RNN Op
      
      * - support RNN layer fusion till slc = dlc
      - bug fixes in multi layer rnn fusion call back
      
      * capture reshape in the RNN weights
      
      * Addressed PR comments
      
      * - added comments in multi layer PM call back
      - fuse only if slc == DLC across layers
      
      * restore deleted 3_lstm_cell_forward.json file
      
      * fix typo
      
      * fix failing unit tets
      
      * When processing in place slice, do not change the offset of the slice node if the argument pointer comes from function input.
      
      * Address PR feedback: process in place slice after propagating in place input.
      
      * Set INTERMEDIATE role before propagating in place input.
      
      * Do not add temporaries to the variable name map before propagating in place input in codegen.
      
      * Fix a bug in codegen.
      
      * Fix a bug in codegen slice.
      
      * reenable disabled rnn unit test
      
      * fix compiler error
      
      * - bug fix in the slicing logic for the layer fused rnn cell
      - fix failing rnn unit test
      
      * - Addressed PR comments
      - removed redundant checks from the rnn graph pass
      - simplified rnn call back replace node logic
      
      * - added new multilayer rnn *.json file
      - fix test case
      
      * [PRIVATE BRANCH] Style fixes (#2080)
      
      * Style fixes
      
      * change order of lstm gates
      
      * [PRIVATE BRANCH] Jbobba/rnn fusion review (#2113)
      
      * Style fixes for single-layer RNN fusion
      
      * Style fixes to multi-layer RNN
      
      * style fix
      
      * disable GPU test
      73da681a
    • Pruthvi's avatar
      fix failing bn test (#2175) · 86b783c6
      Pruthvi authored
      * fix fialing bn test
      
      * fix style
      86b783c6
  2. 05 Dec, 2018 9 commits
  3. 04 Dec, 2018 11 commits
  4. 03 Dec, 2018 4 commits
  5. 01 Dec, 2018 9 commits