1. 26 Oct, 2018 2 commits
    • Nishant Patel's avatar
      Add builder for {de}quantize to make API's consistent and support {de}quantize with mkldnn (#1839) · 6b36a480
      Nishant Patel authored
      * Add builder for {de}quantize
      
      * Add declaration in header
      
      * Add mkldnn support for {de}quantize
      
      * Add support for {de}quantize with mkldnn
      
      * Add Dex support
      
      * Generalizing some api's and adding a test case for DQ in backend_test.in.cpp
      
      * Unify scale between ngraph and mkldnn
      
      * Check for nullptrs
      
      * PR feedback
      
      * fix unit test failure
      
      * Adding tests for builder and deleting the backend tests
      
      * curly braces
      
      * test rename
      6b36a480
    • Nick Korovaiko's avatar
      DEX Debugger (#1798) · fc5842d9
      Nick Korovaiko authored
      * gdb-like interface + tests
      
      * fix not being able to run call twice without call
      
      * fix continue bug
      
      * fix enables; rename kontinue to resume
      
      * switch from lists of functors,enables to vector
      
      * address scott's feedback
      
      * adding a debugger object
      
      * address jayarams feedback
      fc5842d9
  2. 23 Oct, 2018 1 commit
    • Sandeep's avatar
      hybrid at core (#1821) · 2e88d948
      Sandeep authored
      * skeleton backend
      
      * Code owner from if conditioning
      
      * add simple placement for interpreter and register pass in hybrid
      
      * placement policy applied
      
      * clone the function if needed
      
      * split the function
      
      * Compile subfunctions in corresponding backends
      
      * hybrid backed works as is for abc test
      
      * cleanup
      
      * add placement policy for CPU
      
      * cleanup a little
      
      * add simple op cost method to backend
      
      * enable CPU pass via flag
      
      * address clang-format PR issue
      
      * reslove build
      
      * clean-up
      
      * update manifest
      
      * disable HYBRID as default build
      
      * style
      
      * addressing offline discussion
      
      * more offline discussion
      2e88d948
  3. 22 Oct, 2018 1 commit
  4. 19 Oct, 2018 2 commits
  5. 18 Oct, 2018 1 commit
  6. 13 Oct, 2018 1 commit
  7. 12 Oct, 2018 2 commits
  8. 10 Oct, 2018 1 commit
    • Nick Korovaiko's avatar
      Reshape Sinking (#1701) · f642bc4c
      Nick Korovaiko authored
      * reshape sinking working on mnist_conv
      
      * forgot to add reshape_sinking files
      
      * refactoring of binary case
      
      * Quantize/Dequantize case, fix add case, add assert
      
      * address bob and scott's feedback
      
      * debug
      
      * fix a bug where reshapes are removed too early
      f642bc4c
  9. 05 Oct, 2018 1 commit
    • Chris Sullivan's avatar
      RNN fusion (inference) (#1459) · 4df5ea8b
      Chris Sullivan authored
      * Add op::Sigmoid to nvgpu.
      
      * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.
      
      * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.
      
      * Formatting
      
      * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).
      
      * Add RNN fusion of LSTM cells within a recurrent layer.
      
      * Formatting.
      
      * Add fusion across RNN layers.
      
      * Formatting.
      
      * Add algebraic simplification.
      
      * Added rnn fusion tests.
      
      * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.
      
      * Formatting.
      
      * Removed print statements.
      
      * Formatting.
      
      * Committing missing file.
      
      * Remove concat inputs pass and mkldnn references.
      
      * fix cmake paths
      
      * conflict resolution with merge from master.
      
      * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.
      
      * Formatting.
      
      * Use NGRAPH_ASSERT. Formatting of intel copyright.
      
      * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.
      
      * fix wrong rnn header
      
      * Formatting.
      
      * Add back lstm op to dispatch table.
      
      * Added RNN test which shows cudnn rnn kernel is not producing correct results.
      
      * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.
      
      * Bug fix in parameter tensor packing.
      
      * Alias third output element of RNN for cell state (bug fix).
      
      * Resolve numerical correctness issue with negative values in RNN (bug fix).
      Add minimal test to evaluate LSTM and compare with values calculated by hand.
      
      * Add tensor parameter sizes to kernel hash as
      they are kernel-specific.
      
      * Add 2 layer lstm fusion test against by-hand solution.
      
      * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.
      
      * Formatting.
      
      * Finishing touches after merge: add support for macro expansed dispatch via op_tbl.
      
      * Simplify macro support for gpu ops.
      
      * Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
      Need to decide how to notify user of increased performance with >= 7200.
      
      * Revert lstm_analytic test to explicitly copy data to tensor params.
      
      * Removed namespace arg from NGRAPH_GPU_OP.
      
      * Refactored macros to different header so op_tbl only contains op list.
      
      * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.
      
      * doubles -> floats
      
      * Reorg. pass asserts, prepare to replace with non-throwing pass failures.
      
      * Remove Lstm op and replace it with Rnn.
      
      * Format
      
      * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
      Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
      results in a double free inside of the memory layout pass. Appears to be a bug
      in Reshape pass through.
      
      * Removed print statements. Add check on input data and recurrent data.
      
      * Don't reuse memory for non-destructive ops.
      
      * Add back Rnn test.
      
      * Formatting.
      
      * Clean up comments.
      
      * Update test per review comments.
      4df5ea8b
  10. 26 Sep, 2018 1 commit
  11. 15 Sep, 2018 1 commit
  12. 13 Sep, 2018 1 commit
    • Nick Korovaiko's avatar
      Control dependencies (#1445) · 58f9af01
      Nick Korovaiko authored
      * topological sort with cdeps
      
      * add control deps API, fix unit tests
      
      * rollback adjoints changes
      
      * fix test failures,add more tests
      
      * remove dead code
      
      * address scott's feedback
      58f9af01
  13. 07 Sep, 2018 1 commit
  14. 06 Sep, 2018 1 commit
  15. 30 Aug, 2018 1 commit
  16. 24 Aug, 2018 1 commit
  17. 22 Aug, 2018 1 commit
  18. 21 Aug, 2018 1 commit
    • Robert Kimball's avatar
      Statically link cpu backend into ngraph shared library (#1444) · 5ab5a129
      Robert Kimball authored
      * static link cpu library to ngraph
      
      * remove debug
      
      * link ngraph and cpu backend into a single shared object
      
      * add -fPIC and whole-archive for CPU backend
      
      * Added conditional for --whole-archive for Mac OS.
      
      * Added more conditonal for MacOS.
      
      * fix linking problem and unit test failures caused by multiple copies of the same function in CPU backend and INTERPRETER
      
      * fix nbench build
      
      * add nbench to unit test build
      
      * add version number to libngraph
      5ab5a129
  19. 17 Aug, 2018 1 commit
    • Jayaram Bobba's avatar
      Enable DEX only build of ngraph (#1424) · 64ac3775
      Jayaram Bobba authored
      * Optionally get rid of codegen from the CPU backend
      
      * Rename option variable
      
      * Merge fixes
      
      * Merge
      
      * Remove extra changes
      
      * remove dex only exclusions (#1429)
      
      * Unconditionally pick  m_direct_execution if NGRAPH_DEX_ONLY
      
      * Style fix
      64ac3775
  20. 12 Aug, 2018 1 commit
  21. 11 Aug, 2018 1 commit
  22. 10 Aug, 2018 1 commit
  23. 27 Jul, 2018 1 commit
    • Adam Procter's avatar
      Add some convenience macros/classes for error messages (#1258) · deacf29a
      Adam Procter authored
      * Testing out some ideas for better error messages on AvgPool
      
      * Add uncaught_exception() check to ConstructionAssertLogger dtor
      
      * More general assertion class, not homed inside Node
      
      * Minor formatting change
      
      * NODE_ASSERT for type prop failure
      
      * Produce lighter-weight DummyAssertionHandler when assertion succeeds
      
      * New ctor for AssertionHelper that takes a single location arg; more const&-ness for the constructors
      
      * Remove move constructor for AssertionHelper; fix broken test in assertion.cpp
      
      * Miscellaneous improvements
      
      * Templatized AssertionHelper so different exception classes can be used; implemented TYPE_CHECK_ASSERT around this
      * Changed from a "stack" of locations to a single location (the stack was too complicated)
      * Added "FAIL" classes/macros which do not take a condition
      
      * Rename a helper function
      
      * Cleanup, cruft removal
      
      * Add test to make sure the assert helper has the lifetime we expect
      
      * Missing includes
      deacf29a
  24. 14 Jul, 2018 1 commit
  25. 21 Jun, 2018 1 commit
    • Adam Straw's avatar
      Constant folding for Reshapes (#1130) · b9a77a9d
      Adam Straw authored
      * adding constant propagation pass
      
      * adding test/constant_propagation.cpp
      
      * template make_constant_reshape function
      
      * code review feedback
      
      * add missing files
      b9a77a9d
  26. 19 Jun, 2018 1 commit
    • Robert Kimball's avatar
      Bob/cmake (#1118) · 4847b2de
      Robert Kimball authored
      * fix mkldnn rpath
      
      * fix compile warning
      
      * close backends when exiting
      
      * set backend output directory of backends to the ngraph output directory
      
      * Aprocter/patch patch (#1119)
      
      * Move more rpath stuff inside if(NOT APPLE)
      
      * fix repatch problem with mkldnn library
      
      * add updated patch command for older versions of cmake
      4847b2de
  27. 13 Jun, 2018 1 commit
  28. 04 Jun, 2018 1 commit
    • Robert Kimball's avatar
      Modernize cmake usage (#1032) · eef750df
      Robert Kimball authored
      * Update cmake files to more modern approach
      
      * disable building libraries that are not required
      
      * handle more build cases
      
      * add versions to backend libs. add start of package target.
      
      * add create_backend to backends
      
      * temporary workaround to tbb not linking correctly with gcc
      
      * install codegen lib
      
      * force tbb to link to the cpu backend so that it is available for codegen
      
      * fix clang build error
      
      * fix warning for codegen build
      
      * update cuda header paths
      
      * change error message for opening backend shared library
      
      * set lib path
      eef750df
  29. 02 Jun, 2018 1 commit
  30. 29 May, 2018 1 commit
    • Chris Sullivan's avatar
      [CS:GPU::Part 1] Add GPUShape type, conversion operators, and generalized shape helpers. (#1031) · d051f5fa
      Chris Sullivan authored
      * Added GPUShape and reworked Shape helpers to be
      compatible with different shape types.
      Shape is now implicitly convertable to GPUShape.
      
      * Updated shape helpers signature and add conversion operators/constructors for GPUShape.
      
      * Adjust row_major_strides to avoid reversed-copy.
      
      * Moved declaration out of loop for clang.
      
      * Moved gpu_shape to gpu transformer.
      
      * Removed no longer necessary headers.
      
      * Added stdexcept header to gpu_shape.hpp
      
      * Changed check on 64bit shape to check if high bits are set.
      
      * Added spacing between functions in GPUShape and boolean operators in shape.hpp.
      
      * Template parameters are UPPER_SNAKE_CASE.
      
      * Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
      This should be 64bits wide regardless of the underlying value_type of the ShapeType.
      
      * [CS:GPU::Part 2] Add GPUMemoryManager, GPUAllocator, and memory primitives. (#1034)
      
      This is a big PR which introduces the GPUMemoryManager, GPUAllocator, and the concept of memory primitives.
      
      A memory primitive is a closure which yields the device memory address for a reserved memory space. When a memory reservation is made, the request is recorded along with the data that should be copied (for kernel arguments, but not for workspace memory). The reservation does not yield an address eagerly but instead does so lazily by returning an index which can be used to look up the memory_primitive at runtime. This allows the GPUMemoryManager to delay resolution of the memory address until all reservations have been made. 
      
      Ideally, the temporary allocations needed by each kernel could be captured by the liveness lists in the GPU_External_Function. This way the pass::MemoryManager would capture these allocations along with the needed tensor allocations.
      
      For now, rather than rearchitect the gpu_emitter and external function, we utilize the GPUMemoryManager, which maintains its own internal pass::MemoryManager, and the GPUAllocator. Liveness is handled by the GPUAllocator: all workspace allocation/reservations created in the same (or sub)scope as the GPUAllocator will persist until the GPUAllocator goes out of scope and deconstructs. At that time, the GPUAllocator will mark the requested temporary buffers as free, and their liveness will be removed (effectively). That way the next kernels that construct a GPUAllocator can reuse the workspace memory that was needed for previous kernels.
      
      Additional notes:
      * This PR updates the CUDAEmitter to exclusively utilize GPUShape instead of Shape.
      
         Commits:
         * Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.
      
         * Added GPUShape and reworked Shape helpers to be
      compatible with different shape types.
      
        * Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.
      
         * Coordinate->GPUShape
      
         * Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.
      
        * Refactor allocations to make use of primitive emitter. Now memory primitives are registered at compile time and the gpu memory address is resolved at runtime by invoking the primitive.
      
         * Added const qualifier to data being copied in GPUAllocator::reserve_argspace
      
         * Removed more replace_slice diffs.
      
         * Added unit tests for GPUMemoryManager and added checks that ensure the
      device memory is allocated prior to address resolution by the memory_primitives.
      Also exposed the allocation size of the memory manager.
      
         * Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.
      
      [CS:GPU::Part 3] Refactoring of several ops to use GPUMemoryManager (#1035)
      
      This PR implements the new GPUMemoryManager and allocator for all the ops which were previously implemented but required allocations and copies for kernel arguments at runtime. 
      
      Limitations:
      The convolution workspaces could not be added because the relevant descriptors were not available at compile time due to the codegen. If convolution is later added to the CUDNN emitter, the GPUAllocator can be used to avoid workspace allocation at runtime.
      
         Commits:
         * Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.
      
         * Forgot to remove no longer necessary buffer freeing from op emitters.
      
      [CS:GPU::Part4] Added op::ReplaceSlice and enabled respective tests. (#999)
      
      This PR implements ReplaceSlice using the coordinate transformation strategy. A thread for each tensor element of the input tensor is chosen and it's position in the source tensor coordinate system is calculated. If it is within the source slice, the source is loaded and written out, otherwise the input tensor is loaded. 
      
      * Relevant tests are enabled.
      
      * This op was refactored to utilize the new GPUAllocator and memory manager.
      
         Commits: 
      
         * Updated replace_slice op to utilize GPUShape and GPUMemoryManager.
      
         * Added back missing changes after timeline resolution.
      
      * Fixed clang warnings and bug. The cudnn_handle was not initialized ahead of emission time and so any eager cudnn calls would fail.
      To fix this, the cudnn and cublas handle creation was moved to the external function constructor.
      
      * Changed row_major_strides to always return vector<size_t> to avoid overflow for tensors with many dimensions. Handle the conversion to 32 bits for GPU shapes with an explicit conversion constructor from vector<size_t>.
      
      * During merge the allocation line from external_function was left out. Adding it back.
      d051f5fa
  31. 14 May, 2018 1 commit
  32. 11 May, 2018 1 commit
  33. 10 May, 2018 2 commits
  34. 09 May, 2018 1 commit
  35. 07 May, 2018 1 commit
  36. 28 Apr, 2018 1 commit