1. 11 Dec, 2018 12 commits
    • Robert Kimball's avatar
      Framework for Hybrid GPU backend (#2196) · af2c4c7d
      Robert Kimball authored
      * add empty framework for hybrid GPU, or GPUH
      
      * move placement to the runtime directory
      
      * wip
      
      * skeleton for hybrid GPU backend. most unit tests pass.
      
      * cleanup
      
      * move hybrid code into hybrid dir/namespace
      
      * move hybrid functions
      
      * move more hybrid functions to hybrid directory
      
      * fix placement after compile. All unit tests passing
      
      * fix gpu backend ctor
      af2c4c7d
    • Robert Kimball's avatar
      Windows build support (#2177) · 9234cc69
      Robert Kimball authored
      * files pulled from bob/winbuild
      
      * fix compile problems
      
      * fix a few windows build errors
      
      * add windows file to exclude from git
      
      * add comment why change was made
      
      * revert obsolete change
      
      * more cleanup
      
      * building interpreter and unit test on windows with DLLs
      
      * Add flag for windows to export all symbols. Short term fix.
      
      * enable MD build
      
      * address warnings
      
      * dump all windows build results to a single directory
      
      * fix windows backend dll open issue
      
      * remove debug
      
      * fix file iterator for windows
      
      * fix merge error
      
      * fix test failure
      
      * change header from h to hpp in hopes of making python happy
      
      * address more linux build issues
      
      * fix visibility enable
      9234cc69
    • Fenglei's avatar
      nvgpu cuda softmax optimization (#2101) · a3133482
      Fenglei authored
      * add some helper function
      
      * update with new helper function
      
      * update reduce to nd with new helper function
      
      * update float sum to stable sum
      
      * fix bug
      
      * update all reduce to stable sum for float
      
      * fix bug and pass the sum stable test
      
      * remove debug info
      
      * style
      
      * update with shape
      
      * fix bug
      
      * add host parameters to cuda_emitter
      
      * clang format
      
      * fix bugs
      
      * add element::type support
      
      * format
      
      * add a cached value with datatype name
      
      * add init_reduce_value
      
      * unroll loop
      
      * optimization
      
      * remove the need for init_value
      
      * add memset kernel
      
      * add memcpy
      
      * working version
      
      * remove debug info
      
      * add comments, clean up code.
      
      * change in_idx to input_idx
      
      * fix bug
      
      * change args name for memset in emitter
      
      * pass element::Type instead of string
      
      * the op::reduce come with init value, add support
      
      * resolve codacy-bot comment
      
      * fix bug
      
      * resove codacy-bot comment
      
      * add soft_max_block_reduce kernel
      
      * fix bugs
      
      * add softmax_block_reduce to cuda_emitter
      
      * compiing ok, result wrong
      
      * fix bug in kernel
      
      * working version
      
      * removed unused code
      
      * remove unused comments, resolve comments
      
      * cuda reduce for max, min, mul, reduce op init value, format
      
      * use type::info
      
      * use type info for numeric_limits
      
      * remove code from gpu_host_parameters
      
      * header
      
      * remvoe outdated comments
      
      * add helper to check if stable sum is needed
      
      * add stable sum test for double
      
      * remove extra line
      
      * consolidate helper functions
      
      * no need list now.
      
      * remove extra ;
      
      * clang format
      
      * style
      
      * add skip test for cpu and intelGPU side
      
      * resolve more conflict
      
      * update comment
      
      * fix a warning
      
      * Update src/ngraph/runtime/gpu/gpu_cuda_kernel_builder.cpp
      
      using load.
      Co-Authored-By: 's avatarfengleitian <35274053+fengleitian@users.noreply.github.com>
      
      * using WARPSIZE instead of 32, using lambda
      
      * more WARPSIZE instead of 32
      
      * fix block_size_x bug
      
      * using __expf
      a3133482
    • gaurides's avatar
      fix crash in ReshapeConvertLayout (#2205) · 6584306c
      gaurides authored
      * fix crash in ngraph-tf test conv_ops_test.Conv2DTest.testConv2DKernelSmallerThanStrideSame
      
      * fix file perms
      
      * correct checks
      6584306c
    • Sergey Shalnov's avatar
      24bd105f
    • Chris Sullivan's avatar
      Bind cuda context to thread prior to compilation (#2199) · 31210402
      Chris Sullivan authored
      * Bind cuda context to thread prior to compilation. Small refactoring.
      
      * bind_cuda_context_to_thread in source
      
      * bind_cuda_context_to_thread header
      31210402
    • tsocha's avatar
      [Py]Add version to ngraph python (#2193) · ec0a3f5c
      tsocha authored
      * [Py]Add version to ngraph python
      
      * FIX
      ec0a3f5c
    • Nick Korovaiko's avatar
      Reshape SoftMax Reshape (#2188) · b77fd922
      Nick Korovaiko authored
      * reshape softmax reshape
      
      * add new line
      
      * add new line
      
      * fix style errors
      b77fd922
    • Nick Korovaiko's avatar
      Matcher skip (#2169) · c8bc3edc
      Nick Korovaiko authored
      * Update cpu_external_function.cpp
      
      * fix test case failures
      
      * env var to abort matching
      
      * Update matcher.cpp
      
      * Update matcher.cpp
      
      * add a comment
      
      * give an env var a better name
      c8bc3edc
    • Adam Rogowiec's avatar
      Fix setup.py for CentOS (#2163) · f46e56ec
      Adam Rogowiec authored
      * Fix installing numpy dependency on CentOS.
      
      * Check whether nGraph library directory exists.
      f46e56ec
    • Amy Zhuang's avatar
      Fix TF test failures on Mac. (#2210) · 1640d21e
      Amy Zhuang authored
      * Bug fixes to unordered map checks
      
      * No in-place slice for non-native MKLDNN layouts
      
      * is_op
      1640d21e
    • Nick Korovaiko's avatar
      is_op (#2203) · c9eef901
      Nick Korovaiko authored
      c9eef901
  2. 10 Dec, 2018 1 commit
    • harryskim's avatar
      Harryk remove winml ref (#2204) · 90aa7336
      harryskim authored
      * Removed winml from stack diagram
      
      * Removed winml from full stack diagram
      
      * Update README.md
      
      * update the diagram without winml
      
      * Changed sentence about WinML
      
      * Removed duplication
      90aa7336
  3. 08 Dec, 2018 4 commits
  4. 07 Dec, 2018 6 commits
  5. 06 Dec, 2018 14 commits
    • Nishant Patel's avatar
      QCBiasAdd and QCBiasSignedAdd for mkldnn (#2062) · 1f40160d
      Nishant Patel authored
      * Quantize the bias to int32
      
      * Bias scale fix
      
      * mnist works
      
      * Quantize Bias
      
      * Introduce Quantize op in the graph to quantize bias & feedback
      
      * Add QuantizedConvBiasAdd
      
      * Comments and some refactoring
      
      * Add test case with float bias and enable int32 as quantized type in ngraph
      
      * Change shape of scale from Shape{} to Shape{1} in the backend
      
      * Add QuantizedConvBiasSignedAdd
      
      * Fix Layouts, clean up and a test case for QCBA
      
      * Test case for QCBSA
      
      * cleanup mkldnn_emitter.hpp
      
      * fix build error
      
      * Constant fold
      1f40160d
    • Sergey Shalnov's avatar
    • Nick Korovaiko's avatar
      DEX Loop Kernel (updated) (#2156) · 8fc481a3
      Nick Korovaiko authored
      * one output
      
      passing tests
      
      clean up
      
      fix build breaks
      
      * move generators into a separate file
      8fc481a3
    • Nick Korovaiko's avatar
      56980738
    • Nick Korovaiko's avatar
      an env var to disable individual fusions (#2185) · 504e78f8
      Nick Korovaiko authored
      * an env var to disable individual fusions
      
      * fix env var name
      504e78f8
    • Nick Korovaiko's avatar
      Give Fusions Names (#2178) · a09d5f88
      Nick Korovaiko authored
      * give fusions names
      
      * fix build breaks
      
      * fix perms
      a09d5f88
    • Nick Korovaiko's avatar
      Abort messages in Matcher to better understand cases where we fail to match (#2179) · 06916cbc
      Nick Korovaiko authored
      *  abort messages in matcher.cpp
      
      * style fixes
      06916cbc
    • gcwenger's avatar
      Graph comparison - isolated per op testing (#2144) · 1feb49f1
      gcwenger authored
      * Isolated per op testing when comparing graphs for better determination of source of accuracy divergence.
      
      * Improve clarity of comment
      1feb49f1
    • Michał Karzyński's avatar
      [Py] Update README for PyPI (#2151) · 8a9cf8aa
      Michał Karzyński authored
      * Update README for PyPI
      
      * Update README for PyPI
      
      * Remove redundant newlines
      
      * Fix links
      8a9cf8aa
    • Adam Rogowiec's avatar
      [Py] setup.py code style formatting. (#2164) · 8249bf9f
      Adam Rogowiec authored
      * Uniform quotes style .
      
      * Fix comment style.
      
      * Check setup.py with flake8.
      
      - Fix flake8 errors.
      
      * Move function out of class scope.
      
      * Fix function paramter list
      
      * Fix formatting.
      8249bf9f
    • Fenglei's avatar
      nvgpu cuda reduce with stable sum (#2076) · 606f3f93
      Fenglei authored
      * add some helper function
      
      * update with new helper function
      
      * update reduce to nd with new helper function
      
      * update float sum to stable sum
      
      * fix bug
      
      * update all reduce to stable sum for float
      
      * fix bug and pass the sum stable test
      
      * remove debug info
      
      * style
      
      * update with shape
      
      * fix bug
      
      * add host parameters to cuda_emitter
      
      * clang format
      
      * fix bugs
      
      * add element::type support
      
      * format
      
      * add a cached value with datatype name
      
      * add init_reduce_value
      
      * unroll loop
      
      * optimization
      
      * remove the need for init_value
      
      * add memset kernel
      
      * add memcpy
      
      * working version
      
      * remove debug info
      
      * add comments, clean up code.
      
      * change in_idx to input_idx
      
      * fix bug
      
      * change args name for memset in emitter
      
      * pass element::Type instead of string
      
      * the op::reduce come with init value, add support
      
      * resolve codacy-bot comment
      
      * fix bug
      
      * resove codacy-bot comment
      
      * remove unused comments, resolve comments
      
      * cuda reduce for max, min, mul, reduce op init value, format
      
      * use type::info
      
      * use type info for numeric_limits
      
      * remove code from gpu_host_parameters
      
      * header
      
      * remvoe outdated comments
      
      * add helper to check if stable sum is needed
      
      * add stable sum test for double
      
      * remove extra line
      
      * consolidate helper functions
      
      * no need list now.
      
      * remove extra ;
      
      * clang format
      
      * style
      
      * add skip test for cpu and intelGPU side
      
      * add line between groups of headers
      
      * add two simple stable sum test for float and double
      
      * skip test for intelGPU
      606f3f93
    • Fabian Boemer's avatar
      Fix compiler error GCC with 7.1 (#2155) · 4b0445d1
      Fabian Boemer authored
      4b0445d1
    • Pruthvi's avatar
      Pruthvi/fix rnn precision (#1874) · 73da681a
      Pruthvi authored
      * - Added reorder support for rnn weights_layer/iter
      
      * i) fixed compilation issues ii) working but still observing precision error
      
      * i) fixed failing rnn unit test for DEX ii) refactored workspace in RNN mkldnn emitter
      
      * i) added support for src reorder to TNC from NTC
      
      * reorder support for rnn output fron NTC to TNC
      
      * - added support for rnn weight reorder ldgoi -> ldigo
      - code refactor for lstm/rnn kernel in mkldnn emitter
      
      * - refactor rnn mkldnnn kernel, change variable names
      
      * fix RNN codegen kernel
      
      * disbale layer rnn fusion pass, to test CI
      
      * method to validate recurrent rnn inputs
      
      * add correlated macthes for Recurrent RNN PM
      
      * - simplify reorder logic for rnn_weights
      - fix graph pattern for fusing rnn cell across time steps
      
      * do weights reorders in rnn timesteps fusion
      
      * refactored LSTM graph pass
      
      * - Bug fix for finding the lstm inputs determenstically
      - Refactored LSTM graph pass to single pass
      - made changes to LSTM RNN time step fusion graph pass
      
      * - use replace_node instead of replace_output in Lstm_step_wise fusion graph pass
      
      * fix compilation error
      
      * Fix GNMT rnn fusion
      
      * check if the node is in use before replacing in RNN graph passes
      
      *  i) fix style ii) fix topo sort issue in RNN graph pass
      
      * style fix
      
      * fix bug in simplify_concat pass
      
      * replaces Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Concat -> Lstm2 with Lstm1 -> Lstm2
      
      * cse for convert layout
      
      * addressed PR comments
      
      * - optimization pass to remove  Lstm1 -> {GOE1, GOE2} -> {Slice1, Slice2} -> Lstm2
      - conditional fusing of LSTM cells only for the decoder
      
      * made changes to multi layer RNN fusion callback
      
      * fix asserts in RNN op
      
      * - added support to fuse layers when slc=dlc for RNN cells
      - bug fix on the sanity checks for RNN Op
      
      * - support RNN layer fusion till slc = dlc
      - bug fixes in multi layer rnn fusion call back
      
      * capture reshape in the RNN weights
      
      * Addressed PR comments
      
      * - added comments in multi layer PM call back
      - fuse only if slc == DLC across layers
      
      * restore deleted 3_lstm_cell_forward.json file
      
      * fix typo
      
      * fix failing unit tets
      
      * When processing in place slice, do not change the offset of the slice node if the argument pointer comes from function input.
      
      * Address PR feedback: process in place slice after propagating in place input.
      
      * Set INTERMEDIATE role before propagating in place input.
      
      * Do not add temporaries to the variable name map before propagating in place input in codegen.
      
      * Fix a bug in codegen.
      
      * Fix a bug in codegen slice.
      
      * reenable disabled rnn unit test
      
      * fix compiler error
      
      * - bug fix in the slicing logic for the layer fused rnn cell
      - fix failing rnn unit test
      
      * - Addressed PR comments
      - removed redundant checks from the rnn graph pass
      - simplified rnn call back replace node logic
      
      * - added new multilayer rnn *.json file
      - fix test case
      
      * [PRIVATE BRANCH] Style fixes (#2080)
      
      * Style fixes
      
      * change order of lstm gates
      
      * [PRIVATE BRANCH] Jbobba/rnn fusion review (#2113)
      
      * Style fixes for single-layer RNN fusion
      
      * Style fixes to multi-layer RNN
      
      * style fix
      
      * disable GPU test
      73da681a
    • Pruthvi's avatar
      fix failing bn test (#2175) · 86b783c6
      Pruthvi authored
      * fix fialing bn test
      
      * fix style
      86b783c6
  6. 05 Dec, 2018 3 commits