1. 15 Jun, 2018 2 commits
    • Pruthvi's avatar
      RNN fusion across layers (#1085) · f75b8006
      Pruthvi authored
      * - Added graph pass for fusing RNN op across layer
      - Added test case for inter v/s cpu for verifying layer fused RNN
      - more sanity checks in the RNN fusion graph pass
      - added support to replace the recurrent cell state correctly in the fused RNN op
      
      * Fixed multi layer rnn fusion unit test failure
      
      * Addressed PR comments
      f75b8006
    • Fenglei's avatar
      gpu function call (#1111) · 7c8e9250
      Fenglei authored
      * enable tests
      
      * add funciton call
      
      * working version
      
      * remove test from ski list
      7c8e9250
  2. 14 Jun, 2018 2 commits
  3. 13 Jun, 2018 3 commits
    • Robert Kimball's avatar
      Ubuntu 18 build support (#1101) · 838ba3f1
      Robert Kimball authored
      * backend libraries now found in tree
      
      dynamically read header search paths
      
      fix running from install
      838ba3f1
    • Nick Korovaiko's avatar
      Group Convolution (#1041) · 4a2c3c9c
      Nick Korovaiko authored
      *  group conv init
      
      * add GroupConvolution op; refine checks in fusion logic
      
      * add an emitter, cpu assigment
      
      * cpu_layout
      
      * add checks to algebraic simplification
      
      * updating emitter logic for groupconvolution
      
      * working before refactoring
      
      * moving primitive creation logic to mkldnn_emitter
      
      * group convolution graph test
      
      * rename an opt
      
      * address jbobba's feedback
      4a2c3c9c
    • Fenglei's avatar
      gpu deconvolution (#1099) · 40069d27
      Fenglei authored
      * add pad_dilation function
      
      * add dilation to gpu_emitter
      
      * add CoordinateDiff constructor to GPUShape
      
      * remove unecessary cast
      
      * working version for forward
      
      * forward working
      
      * forward test all pass
      
      * deconvolution forward
      
      * backward data dilation
      
      * forward test passed
      
      * initial to 0
      
      * fix bug for get_padded_shape and clang format
      
      * code style, change variable names
      
      * refactor convolution conditions
      
      * fix bug padding_below_diff
      
      * change pad_dilation to pad_dynamic, compare to pad
      
      * remove passed convolution test from skip list, clang format
      
      * change pad to use GPUShape
      40069d27
  4. 12 Jun, 2018 2 commits
    • Chris Sullivan's avatar
      CUDA softmax kernel and broadcast kernel support for multiple non-consecutive axes (#1070) · 83e6aa5f
      Chris Sullivan authored
      * Added op::ReplaceSlice and enabled respective tests.
      
      * div64 -> division_by_invariant_multiplication
      
      * Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.
      
      * Added GPUShape and reworked Shape helpers to be
      compatible with different shape types.
      Shape is now implicitly convertable to GPUShape.
      
      * Updated shape helpers signature and add conversion operators/constructors for GPUShape.
      
      * Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.
      
      * Forgot to fix lambda for workspace allocations to match that of argspace allocations.
      
      * Added GPUShape and reworked Shape helpers to be
      compatible with different shape types.
      Shape is now implicitly convertable to GPUShape.
      
      * Updated shape helpers signature and add conversion operators/constructors for GPUShape.
      
      * Adjust row_major_strides to avoid reversed-copy.
      
      * Moved declaration out of loop for clang.
      
      * Moved gpu_shape to gpu transformer.
      
      * Removed no longer necessary headers.
      
      * Added stdexcept header to gpu_shape.hpp
      
      * Coordinate->GPUShape
      
      * Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.
      
      * Refactor allocations to make use of primitive emitter.
      Now memory primitives are registered at compile time and
      the gpu memory address is resolved at runtime by ivoking
      the primitive.
      
      * Changed check on 64bit shape to check if high bits are set.
      
      * Added const qualifier to data being copied in GPUAllocator::reserve_argspace
      
      * Added const qualifier to data being copied in GPUAllocator::reserve_argspace
      
      * Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.
      
      * Forgot to remove no longer necessary buffer freeing from op emitters.
      
      * Removed replace slice.
      
      * Removed more replace_slice diffs.
      
      * Updated replace_slice op to utilize GPUShape and GPUMemoryManager.
      
      * Added back missing changes after timeline resolution.
      
      * Added spacing between functions in GPUShape and boolean operators in shape.hpp.
      
      * Template parameters are UPPER_SNAKE_CASE.
      
      * Added unit tests for GPUMemoryManager and added checks that ensure the
      device memory is allocated prior to address resolution by the memory_primitives.
      Also exposed the allocation size of the memory manager.
      
      * Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
      This should be 64bits wide regardless of the underlying value_type of the ShapeType.
      
      * Upstreaming changes to shape_size (which returns size_t).
      
      * cuDNN softmax impl. for all axis activation.
      
      * Added catch for per-axis activations.
      
      * Removed commended headers.
      
      * Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.
      
      * Add softmax cuda kernel. It relies on atomic memory addition to global
      memory, this will add contention and should be optimized in the
      future. A multilevel reduction can be found in
      cs/gpu_softmax_cuda_shfl but it requires some further engineering.
      
      * Refactored reduce coordinate transform code into a helper and applied it to broadcast.
      Broadcast added to CUDAEmitter, now supports multiple non-consecutive axes.
      
      * Removed change to data_types variable and updated/removed comments.
      
      * Refactored softmax into the emission of two fused elementwise collective ops.
      Added fused elementwise + collective kernels. Softmax is then just the combination of exp_sum_reduce + div_broadcast.
      
      * Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive.
      
      * GPU workspace memory is zero initialized by default but can be turned off if desired.
      
      * Added template parameter to CUDAEmitter::build_elementwise, REDUCE_OP_TYPE,
      to specify the ngraph op type to use for the reduction in the fusted ew_collective kernel.
      
      * Renamed variables and updated a comment.
      
      * Removed outdated softmax kernel to avoid confusion. Can be added later when atomic reduce is replaced.
      
      * Clang complained about lack of explicit destructor for AxisSet. Since cuda_emitter doesn't need AxisSet specifically, switch to std::set<size_t>.
      This also has the benefit that in the future, if we wish to emit kernels without ngraph core (for example in a standalone binary via a
      serialized graph manifest, we don't depend on AxisSet.
      
      * softmax -> broadcast in build_broadcast.
      
      * Separate elementwise and elementwise_collective.
      83e6aa5f
    • Nick Korovaiko's avatar
      Replace Check (#1097) · 692101a7
      Nick Korovaiko authored
      692101a7
  5. 11 Jun, 2018 2 commits
  6. 08 Jun, 2018 2 commits
    • Jayaram Bobba's avatar
      Optimized eigen kernel for spatial mean (#1094) · 0b95efa6
      Jayaram Bobba authored
      * Optimized eigen kernel for 2D reduction on a 4D tensor used for spatial mean
      
      * revert change to serializer
      0b95efa6
    • Jaikrishnan Menon's avatar
      Jmenon/dexec (#1092) · abb68627
      Jaikrishnan Menon authored
      * CPU: Direct Execution
      Part 1 with bare minimum infrastructure
      
      * Refactor: Move build related functionality to a separate TU
      and external function method
      
      * Add TU back after merge
      
      * Remove an assert
      
      * Remove commented-out code
      abb68627
  7. 07 Jun, 2018 2 commits
  8. 06 Jun, 2018 7 commits
  9. 05 Jun, 2018 6 commits
  10. 04 Jun, 2018 3 commits
  11. 02 Jun, 2018 1 commit
  12. 01 Jun, 2018 1 commit
  13. 31 May, 2018 4 commits
  14. 30 May, 2018 3 commits
    • Fenglei's avatar
      add gpu reduce_window op (#1020) · f6b84d67
      Fenglei authored
      * add reduce op
      
      * hack solution to get reduction function in reduct op
      
      * hack version working on all tests
      
      * add recude_window op
      
      * fixed the reduction checking process
      
      * add reduce window op, save progress, not compilable yet
      
      * change puchback to pre allocate for vector
      
      * fixed datatype vector
      
      * dataype and comments
      
      * pass op intead of using template
      
      * using new GPUshape and allocator
      
      * using GPUShape
      
      * add comment, change map inside function.
      
      * change to more menaful name
      f6b84d67
    • Nick Korovaiko's avatar
    • Nick Korovaiko's avatar
      Refactor CPUWorkspaceInsertion to simplify its use in MxNet (#988) · fa221c5f
      Nick Korovaiko authored
      * refactor cpworkspaceinsertion for mxnet
      
      * rename mxnet functions to adhere to ngraph naming convention
      
      * define a member static const int in a cpp file to resolve a linking issue
      fa221c5f