1. 04 Jun, 2018 1 commit
    • Robert Kimball's avatar
      Modernize cmake usage (#1032) · eef750df
      Robert Kimball authored
      * Update cmake files to more modern approach
      
      * disable building libraries that are not required
      
      * handle more build cases
      
      * add versions to backend libs. add start of package target.
      
      * add create_backend to backends
      
      * temporary workaround to tbb not linking correctly with gcc
      
      * install codegen lib
      
      * force tbb to link to the cpu backend so that it is available for codegen
      
      * fix clang build error
      
      * fix warning for codegen build
      
      * update cuda header paths
      
      * change error message for opening backend shared library
      
      * set lib path
      eef750df
  2. 02 Jun, 2018 1 commit
  3. 26 May, 2018 1 commit
  4. 25 May, 2018 2 commits
  5. 18 May, 2018 1 commit
  6. 10 May, 2018 2 commits
  7. 09 May, 2018 2 commits
    • Chris Sullivan's avatar
      Add op::Or and op::And to GPU transformer (#979) · 8508410f
      Chris Sullivan authored
      * Moved emit_elementwise implementation into CUDAEmitter and added logical_and and logical_or ops.
      
      * Updated comment and formatting.
      
      * Added check for multi-output elementwise ops.
      8508410f
    • Chris Sullivan's avatar
      CUDNN and CUDA kernels for AvgPool (forward/backward) (#951) · b1b3d4d6
      Chris Sullivan authored
      * Added op::AvgPool cudnn impl. which works for 2-3 spatial dimesions and no/symmetric padding. Enabled tests.
      
      * Added cuda-c implementation of average pool which handles 1-3 spatial
      dimensions as well as asymmetric padding. This commit also introduces
      several helper functions for performing fast integer division and
      fast constant memory access.
      
      * Formatting. Removed bool that was used for testing to force the cuda impl. over cudnn.
      
      * Added CUDNN AvgPoolBackprop implementation.
      
      * Removed inline enum in preference of a helper struct. Removed instances of multiple declarations on a single line. Updated comments.
      
      * Removed _prefix to helper functions in anonymous namespace.
      b1b3d4d6
  8. 08 May, 2018 2 commits
    • Fenglei's avatar
      add gpu concat op (#931) · 57d58e50
      Fenglei authored
      * add concat op
      
      * change to concat
      
      * add more code for gpu concat
      
      * compile sucess version with bug
      
      * add emit_concat_op
      
      * runable with wrong result
      
      * working version
      
      * add some comments
      
      * delete old comments.
      
      * delete old comments.
      
      * remove bug doxyen comments
      57d58e50
    • Jayaram Bobba's avatar
      Computation reuse (#945) · 41c50b44
      Jayaram Bobba authored
      * Make temp memory pools static to avoid memory allocation overheads
      
      * Initial implementation for graph control to enable caching and computation reuse
      
      * Added sphinx documentation
      
      * Turned off memory buffer reuse in CPU transformer to support computation reuse. Added unit test
      
      * Change memoizable to cacheable
      
      * Change memoizable to cacheable
      
      * Rename variables
      41c50b44
  9. 05 May, 2018 1 commit
    • Fenglei's avatar
      add gpu reverse (#952) · af946b7d
      Fenglei authored
      * add code to gpu reverse
      
      * add reverse emitter and kernel builder
      
      * working versrion
      af946b7d
  10. 30 Apr, 2018 1 commit
  11. 27 Apr, 2018 1 commit
    • Fenglei's avatar
      gpu select (#919) · 30d24597
      Fenglei authored
      * add select op, pass data type for each operand
      
      * fix bugs and apply clang format
      
      * fix index bug
      30d24597
  12. 25 Apr, 2018 2 commits
    • Chris Sullivan's avatar
      CUDNN BatchNorm (inference/forward/backward) (#893) · 23ac5e5a
      Chris Sullivan authored
      * Added cudnn batch norm operation to GPU transformer.
      Brought batchnorm tests out of cpu_tests and into
      backend_tests. Need to add JIRA ticket for interpreter
      SKIPS.
      
      * CUDNN batchnorm is implemented. In the ForwardTraining branch
      CUDNN seems to calculate the batch mean correctly but the batch variance incorrectly.
      Currently the batchnorm output and mean are calculated correctly for tests:
      * GPU.batchnorm_fprop_b2c2h3w3_mean_var
      * GPU.batchnorm_fprop_b1c2h2w2
      * GPU.batchnorm_fprop_b2c2h2w1
      but the variance calculated for the batches in these tests is incorrectly calculated by CUDNN.
      
      Also added an additional test and cleaned up some of the old tests.
      
      * MKLDNN internally utilizes the biased estimate of the population variance
      and the tests have been crafted to suit MKLDNN. According to the original
      batchnorm publication (https://arxiv.org/pdf/1502.03167v3.pdf), population
      (unbiased) statistics should be used for inference, and mini-batch (biased)
      statistics should be used training (forward/backward). For the variance this
      means utlitizing the following equations, respectively:
      
        (biased)   Var[X] = 1/m * Sum_i(x_i-mu)^2      :: used in training
        (unbiased) Var[X] = 1/(m-1) * Sum_i(x_i-mu)^2  :: used in inference
      
        s.t. x_i are elements of X and m = N*D*H*W.
      
      For large batch sizes in inference this may not impact convergence as m >> 1,
      but for small batch sizes it will. CUDNN internally utilizes the unbiased
      variance.
      
      Changes:
      * Added Multiply op to Forward pass of batchnorm to convert
        the unbiased variance to a biased one. The op utilizes the
        blending scaling factors to apply the bias factor.
      * Adds emission for the BatchNormBackprop kernel and cleans up
        the emitter implementation.
      
      * Added hashing to cudnn::batchnorm op.
      
      * Formatting.
      
      * Changed hashing of epsilon in cudnn batchnorm.
      
      * Remove implicit conversion and default case in switch for bn.
      
      * Added skips for IE transformer on batchnorm.
      
      * add cudnn include path to compiler.cpp
      
      * seperate two path
      
      * PR #892 and #825 which were recently merged both forgot skips for the GPU backend.
      Adding them in as they are unimplemented ops.
      
      * The allocation and deletion of primitives was occuring in seperate
      translation units with raw c pointers. Because of this, it was not
      clear that these were being freed appropriate, nor did it indicate
      ownership of the pointers.
      
      In this commit these raw pointers have been converted over to
      std::unique_ptrs such that the construction/destruction is managed
      automatically. Furthermore, GPUPrimitiveEmitter::insert now only
      takes an r-value reference, requiring move-semantics to indicate
      that when inserting a primitive, the GPUPrimitiveEmitter takes
      ownership of the pointer.
      
      All instances of primitive creation have been modified.
      
      * CUDNN_SAFE_CALL
      
      * Removed redundant comment and made variable names more verbose.
      
      * Change from conditionals to case-switch in pooling to conform to
      batchnorm per @fengleitian's suggestion.
      23ac5e5a
    • Fenglei's avatar
      add cudnn include path to compiler.cpp (#902) · b0421577
      Fenglei authored
      * add cudnn include path to compiler.cpp
      
      * seperate two path
      
      * Skipping one_hot tests for CPU as
      CI is failing. JIRA bug report: https://jira01.devtools.intel.com/browse/NGRAPH-1682.
      b0421577
  13. 24 Apr, 2018 1 commit
  14. 23 Apr, 2018 2 commits
  15. 21 Apr, 2018 2 commits
    • Adam Straw's avatar
      Add Inference Engine (IE) backend (#883) · 3d590dea
      Adam Straw authored
      * ie backend and manager with passing unit tests except for select/function
      
      * fix function_call and select
      
      * simplify implemenation by removing support for convert and select
      
      * remove manager
      3d590dea
    • Nishant Patel's avatar
      Support concat with mkldnn and add a test case (#825) · 1a73f10c
      Nishant Patel authored
      * Support Concat with mkldnn (two inputs)
      
      * Support concat with mkldnn (multiple inputs)
      
      * Address feedback
      
      * Remove unused variable
      
      * Allow rank two tensor to mkldnn for concat & add a test case for 2D inputs
      
      * Add mkldnn_any layout to concat
      
      * Make API changes to get consistent with master
      1a73f10c
  16. 18 Apr, 2018 1 commit
    • Chris Sullivan's avatar
      GPU Padding - add support for custom pad value and interior padding (#860) · 0be581c0
      Chris Sullivan authored
      * * cuda_emitter::build_pad now utilizes pad_value.
      
      * Added TypeInfo class for dispatching c-type information from the underlying ngraph element::Type.
        Adjusted test to use all_close when comparing floating point values (max_pool_2d_1channel_1image_overpadded).
      
      * Refactored max_pool_1d into cuda_emitter so that numeric_limits<c_type>::lowest() could be used for initial max value.
      Test max_pool_2d_1channel_1image_padded_negative_values now enabled and passes.
      
      * Removed old function and switch to size_t to match ngraph.
      
      * Added virtual dtor.
      
      * Adding support for interior padding. All op::Pad functionality is now included.
      
      * More info in runtime_error for checking of tensor dimensions. Removed commented code.
      0be581c0
  17. 16 Apr, 2018 1 commit
  18. 13 Apr, 2018 3 commits
    • Robert Kimball's avatar
      Remove legacy Backend API (#848) · ec501913
      Robert Kimball authored
      * remove deprecated
      
      * remove all legacy Backend API usage
      
      remove deprecated files
      
      * pull in changes from master
      
      * fix GPU calls
      
      * disable tests in convolution generator
      
      * update per PR comments. Enable performance counter feature.
      
      * update per PR comments
      
      * fix build error
      
      * fix conditionally compiled test :(
      ec501913
    • Robert Kimball's avatar
      e7cf2662
    • Chris Sullivan's avatar
      Add GPURuntimeContext and GPUPrimitiveEmitter to the gpu transformer (#837) · 026bede0
      Chris Sullivan authored
      * Begin prototype of cudnn_emitter.
      
      * Added GPURuntimeContext to gpu_external_function for passing through to JIT functions.
      
      * gpu_emitters now utilize gpu runtime context.
      
      * Moved cublas and cudnn handles into GPURuntimeContext pointer and out of callframe EntryPoint.
      
      * Added CUDNNEmitter, comparable to MKLDNNEmitter,
      which allows for cudnn kernels to be defined via
      lambda primitives that are emitted and
      subsequently called during graph execution.
      An example implementation is provided for op::Sum.
      
      * Added GPURuntimeContext to gpu_external_function for passing through to JIT functions.
      
      * gpu_emitters now utilize gpu runtime context.
      
      * Moved cublas and cudnn handles into GPURuntimeContext pointer and out of callframe EntryPoint.
      
      * GPURuntimeContext should be stored as unique_ptr in external function.
      
      * GPURuntimeContext should be stored as unique_ptr in external function.
      
      * Extract raw pointer from unique for cudnn_emitter.
      
      * Removing unrelated code from PR.
      
      * GPURuntimeContext needs to be a strict C interface in case
      the native compiler and clang are utilizing different glibc ABIs.
      Updated to reflect this.
      
      * Added cudnn::primitive typedef for better readability.
      
      * Moved allocation of CudaFunctionPool to external function
      so that it is available during gpu emission.
      
      * Fixed too-late initialization of cudart.
      
      * Fixed too-late initialization of cudart.
      
      * CUDNNEmitter moved into superset class GPUPrimitiveEmitter.
      The GPUPrimitiveEmitter handles the emission of all gpu primitives,
      including cudnn, cuda, and cublas. CUBLASEmitter support not yet included.
      
      * Added unordered_map for cacheing primitives in the gpu_emitter.
      
      * Added dtor to GPUPrimitiveEmitter to cleanup compiled functions.
      
      * Adding back a serialized model graph that was accidentally rem* Added a few additional helpers to use ngraph::row_major_strides.
      
      * added whitespace per @fengleitian's comment
      
      * added whitespace per @fengleitian's comment
      
      * Remove implicit type conversions from size_t to int.
      
      * Add op::MaxPool, op::MaxPoolBackprop and op::Pad to GPU transformer (#817)
      
      * Added pooling for 1 and 2dimensions. 1d uses a cuda kernel and 2d utilizes cudnn.
      Padding is not yet supported.
      
      * Normalized call signature on gpu emission for 1d max pool. Added a few comments.
      
      * Max pool backprop impl. inprogress. Amend this commit.
      
      * Max pool backprop implemented. Note that cuDNN
      requests the output tensor for the maxpool operation but it is not required for computation.
      
      * Formatting and invokation for maxpool changed.
      
      * Fixed too-late initialization of cudart.
      
      * Added padding kernel that is used with maxpool. Need to investigate remaining tests.
      
      * Changed dimensionality check to correctly
      determine if data is 1d or not.
      
      * Added 3d MaxPooling (forward), verified by forcing 2d case to use Nd pooling routines.
      
      * Added 3d MaxPooling (backward), verified by forcing 2d case to use Nd pooling routines.
      
      * Moved cudnn prologues for maxpool into ngraph runtime and out of primitive so
      that the only execution occuring on the JIT runtime is the evaluation of the op kernel.
      
      * Refactored forward and backward pooling into single CUDNNEmitter::build_pooling interface
      with a runtime switch to determine if the op is forward or backward propagation.
      
      * Cache preconstructed cudnn kernel for maxpool if it has already been constructed.
      
      * Forgot to add padding arrays back into cudnn kernel for MaxPool in the 2d case.
      
      * Fixed namespace issues and use join(...,'_')
      
      * Refactored 4d/Nd tensor descriptor builder into single function.
      
      * Changed conditionals and comments. Now throws if MaxPool on more than 3 spatial dimensions is requested.
      
      * Fixed forward declare for GPURuntimeContext (class -> struct).
      
      * Clang complains about missing braces on brace-initializer. Fixed implicit conversions.
      
      * Fixed implicit conversions (clang).
      
      * Reverting changes on autodiff test for maxpool. @Krovatkin will update later.
      026bede0
  19. 12 Apr, 2018 2 commits
    • Fenglei's avatar
      gpu slice (#843) · 041dd524
      Fenglei authored
      * add slice op, first version
      
      * change size to output size
      
      * fix bugs
      
      * working version
      
      * using exist function for join and strides
      
      * clang format
      
      * revert accidental change
      041dd524
    • Fenglei's avatar
      gpu convolution support nd(n<4) (#824) · b9b7845c
      Fenglei authored
      * add convolution in progress
      
      * enable 1 test
      
      * convolution in progress
      
      * use filter descripter
      
      * filter discreptor bug fix
      
      * tensor format
      
      * add missed dimension calculator
      
      * forward convolution 4d without dilation and padding working
      
      * data dilation(deconvolution) and enable some test
      
      * add backprop convolution data and filter
      
      * backprop can compile
      
      * pass unit test, but still have problem on padding
      
      * 2d, symmtric padding, no data dilation works now
      
      * clean up code
      
      * extend gpu convolution to nd
      
      * fix some bugs
      
      * working version for upto 3d convolution, code format.
      
      * remove nunecessary changes
      
      * add restriction for data dilation and asymmetric padding
      
      * clang format
      
      * support upto 3D convolution for now
      
      * change comments to not implemented
      
      * change comments to not implemented
      
      * add quary for additional GPU workspace for convolution
      
      * clang format
      
      * code format
      
      * using row_major_strides
      
      * using join
      
      * fix bug for join
      
      * refactor dimension calculation
      b9b7845c
  20. 10 Apr, 2018 1 commit
  21. 09 Apr, 2018 1 commit
    • Robert Kimball's avatar
      New backend/transformer API (#739) · 777600c6
      Robert Kimball authored
      * force backend compile() to make a copy of the graph
      
      fix copy_with_new_args on ops that have function pointers internal
      
      update unit test for new backend API
      
      add unit test for multiple simulataneous backends
      
      * move get_subdevices virtual method to Manager class
      
      * update GPU to latest
      
      * update call methods
      
      * add remove_compiled_function()
      777600c6
  22. 05 Apr, 2018 1 commit
    • Ashok Emani's avatar
      enable TensorView to use pre-allocated mem (#795) · e189f9c6
      Ashok Emani authored
      * enable TensorView to use pre-allocated mem
      
      * proper check for nullptr
      
      * add unittest for custom mem with tensorview and feedback
      
      * minor fix from feedback
      
      * support GPU TensorView custom mem
      
      * feedback fix and code format
      e189f9c6
  23. 04 Apr, 2018 1 commit
    • Nick Korovaiko's avatar
      Support multi-output ops in Adjoints (#796) · 5f0e8dc3
      Nick Korovaiko authored
      * refactor Adjoints to support multi-output ops
      
      * passing tests
      
      * switch to generate_adjoints(deltas) and backprop_node
      
      * remove debugging code
      
      * fix error msg
      
      * fix typo adjoitns
      
      * fix comp errors in mnist_mlp
      5f0e8dc3
  24. 28 Mar, 2018 1 commit
  25. 27 Mar, 2018 1 commit
    • Fenglei's avatar
      gpu reshape n-dimension (n>2) (#716) · 4e78f25d
      Fenglei authored
      * add nd reshape
      
      * compiler and no crash with wrong result version
      
      * change output_stride to trans_stride, which transform input idx to output idx
      
      * using vector instead of c array
      
      * remove delete
      
      * using const and reference to pass string and array
      
      * change 'unimplement' comments, remove extra indents
      
      * format and cast size_t to int
      4e78f25d
  26. 26 Mar, 2018 1 commit
  27. 22 Mar, 2018 3 commits
    • Fenglei's avatar
      Dot op that can handle more than 2D on GPU (#645) · 6ebc3c8c
      Fenglei authored
      * general dot for gpu
      6ebc3c8c
    • Chris Sullivan's avatar
      Add reduce sum to the GPU transformer (op::Sum) (#671) · bae77590
      Chris Sullivan authored
      * Current cudnn implementations use only
      a single dimension for the ngraph tensor data (width).
      In this case the tensor format should be set to
      
      CUDNN_TENSOR_NCHW
      
      so that adjacent memory accesses are coalesced (stride=1 for width).
      
      * * Added some kernel emitter helpers that are reused often.
      * Renamed EmitElementwise -> emit_elementwise to match emit<T>.
      * op::Sum now handles trivial case of dim(input_tensor) = dim(output_tensor)
        by performing a memcpy as no axes are reduced.
      
      *   Added general case for Nd descriptors which is used when the tensor
        has more than 4 dimensions. Currently a naive reduce is performed,
        in the future a coordinate transformation could be performed to
        improve the memory layout for the reduction.
      
      * Switched to codegen::CodeWriter::block_begin/end.
      It appears that CodeWriter::block_begin/end is not frequently used for emitters (in cpu and gpu transformers)
      because a block comment is often desired. To this end I added prefix/suffix default parameters to CodeWriter::block_begin/end
      so that this functionality is captured.
      bae77590
    • Chris Sullivan's avatar
      Add op::ReluBackprop to GPU transformer (#712) · 72f4d661
      Chris Sullivan authored
      * Added backprop op for relu and enabled tests.
      72f4d661
  28. 21 Mar, 2018 1 commit