1. 09 May, 2018 2 commits
    • Yixing Lao's avatar
      RTLD_GLOBAL fix codgen link (#984) · 7bc6b785
      Yixing Lao authored
      7bc6b785
    • Chris Sullivan's avatar
      CUDNN and CUDA kernels for AvgPool (forward/backward) (#951) · b1b3d4d6
      Chris Sullivan authored
      * Added op::AvgPool cudnn impl. which works for 2-3 spatial dimesions and no/symmetric padding. Enabled tests.
      
      * Added cuda-c implementation of average pool which handles 1-3 spatial
      dimensions as well as asymmetric padding. This commit also introduces
      several helper functions for performing fast integer division and
      fast constant memory access.
      
      * Formatting. Removed bool that was used for testing to force the cuda impl. over cudnn.
      
      * Added CUDNN AvgPoolBackprop implementation.
      
      * Removed inline enum in preference of a helper struct. Removed instances of multiple declarations on a single line. Updated comments.
      
      * Removed _prefix to helper functions in anonymous namespace.
      b1b3d4d6
  2. 08 May, 2018 8 commits
    • Chris Sullivan's avatar
      [cuDNN:Part 1] minimal refactoring of op::reduce (#965) · 682f7b04
      Chris Sullivan authored
      * Refactored the cudnn reduce kernel to use the nGraph Shape -> cudnnTensorDescriptor cudnn helpers that the other kernels use.
      
      * Added cacheing to cudnn reduce op.
      
      * Adding back hashing call before returning primitive index to op::Reduce (bug fix).
      
      * [cuDNN:Part 2] Descriptor Creation/Destruction refactoring (#969)
      
      * Added a cuDNN descriptor factory which manages the construction and destruction of cuDNN descriptors.
      It correctly calls Create/Destroy based on the cuDNN descriptor type. Previously the Destroy functions were not being called.
      
      * Removed commented code and changed class to struct on cudnn_descriptor.
      
      * Added comments and updated a few variable names.
      
      * Clang compiled cuDNN kernels (those not part of CUDNNEmitter)
      now use the CUDNNDescriptor factory.
      682f7b04
    • Fenglei's avatar
      add gpu concat op (#931) · 57d58e50
      Fenglei authored
      * add concat op
      
      * change to concat
      
      * add more code for gpu concat
      
      * compile sucess version with bug
      
      * add emit_concat_op
      
      * runable with wrong result
      
      * working version
      
      * add some comments
      
      * delete old comments.
      
      * delete old comments.
      
      * remove bug doxyen comments
      57d58e50
    • Nick Korovaiko's avatar
    • Christian Convey's avatar
      Fixes compiler warning. (#974) · acc4e46d
      Christian Convey authored
      acc4e46d
    • Robert Kimball's avatar
    • Nick Korovaiko's avatar
      Algebraic Simplification for Product (#949) · 659d2565
      Nick Korovaiko authored
      * product simplifier
      
      * char -> signed char
      659d2565
    • Jayaram Bobba's avatar
      Computation reuse (#945) · 41c50b44
      Jayaram Bobba authored
      * Make temp memory pools static to avoid memory allocation overheads
      
      * Initial implementation for graph control to enable caching and computation reuse
      
      * Added sphinx documentation
      
      * Turned off memory buffer reuse in CPU transformer to support computation reuse. Added unit test
      
      * Change memoizable to cacheable
      
      * Change memoizable to cacheable
      
      * Rename variables
      41c50b44
    • Nick Korovaiko's avatar
      MaxPoolWithIndices (#900) · a174c8c9
      Nick Korovaiko authored
      * MaxPoolWithIndices CPU Fusion
      
      * fix test to pass checks in cpu_fusion
      
      * pass test
      
      * clean up
      
      * add a new pass, add layouts
      
      * remove the opt from cpu_fusion
      
      * refactor cpu_layout logic for maxpool, clean up comments
      
      * add comment w.r.t. indices tensor
      
      * rename to cpu_workspace_insertion
      
      * add CPUWorkspaceInsertion pass for TF
      a174c8c9
  3. 07 May, 2018 2 commits
  4. 06 May, 2018 1 commit
  5. 05 May, 2018 1 commit
    • Fenglei's avatar
      add gpu reverse (#952) · af946b7d
      Fenglei authored
      * add code to gpu reverse
      
      * add reverse emitter and kernel builder
      
      * working versrion
      af946b7d
  6. 04 May, 2018 9 commits
  7. 03 May, 2018 2 commits
  8. 01 May, 2018 3 commits
  9. 30 Apr, 2018 2 commits
  10. 27 Apr, 2018 2 commits
  11. 26 Apr, 2018 6 commits
  12. 25 Apr, 2018 2 commits
    • Chris Sullivan's avatar
      CUDNN BatchNorm (inference/forward/backward) (#893) · 23ac5e5a
      Chris Sullivan authored
      * Added cudnn batch norm operation to GPU transformer.
      Brought batchnorm tests out of cpu_tests and into
      backend_tests. Need to add JIRA ticket for interpreter
      SKIPS.
      
      * CUDNN batchnorm is implemented. In the ForwardTraining branch
      CUDNN seems to calculate the batch mean correctly but the batch variance incorrectly.
      Currently the batchnorm output and mean are calculated correctly for tests:
      * GPU.batchnorm_fprop_b2c2h3w3_mean_var
      * GPU.batchnorm_fprop_b1c2h2w2
      * GPU.batchnorm_fprop_b2c2h2w1
      but the variance calculated for the batches in these tests is incorrectly calculated by CUDNN.
      
      Also added an additional test and cleaned up some of the old tests.
      
      * MKLDNN internally utilizes the biased estimate of the population variance
      and the tests have been crafted to suit MKLDNN. According to the original
      batchnorm publication (https://arxiv.org/pdf/1502.03167v3.pdf), population
      (unbiased) statistics should be used for inference, and mini-batch (biased)
      statistics should be used training (forward/backward). For the variance this
      means utlitizing the following equations, respectively:
      
        (biased)   Var[X] = 1/m * Sum_i(x_i-mu)^2      :: used in training
        (unbiased) Var[X] = 1/(m-1) * Sum_i(x_i-mu)^2  :: used in inference
      
        s.t. x_i are elements of X and m = N*D*H*W.
      
      For large batch sizes in inference this may not impact convergence as m >> 1,
      but for small batch sizes it will. CUDNN internally utilizes the unbiased
      variance.
      
      Changes:
      * Added Multiply op to Forward pass of batchnorm to convert
        the unbiased variance to a biased one. The op utilizes the
        blending scaling factors to apply the bias factor.
      * Adds emission for the BatchNormBackprop kernel and cleans up
        the emitter implementation.
      
      * Added hashing to cudnn::batchnorm op.
      
      * Formatting.
      
      * Changed hashing of epsilon in cudnn batchnorm.
      
      * Remove implicit conversion and default case in switch for bn.
      
      * Added skips for IE transformer on batchnorm.
      
      * add cudnn include path to compiler.cpp
      
      * seperate two path
      
      * PR #892 and #825 which were recently merged both forgot skips for the GPU backend.
      Adding them in as they are unimplemented ops.
      
      * The allocation and deletion of primitives was occuring in seperate
      translation units with raw c pointers. Because of this, it was not
      clear that these were being freed appropriate, nor did it indicate
      ownership of the pointers.
      
      In this commit these raw pointers have been converted over to
      std::unique_ptrs such that the construction/destruction is managed
      automatically. Furthermore, GPUPrimitiveEmitter::insert now only
      takes an r-value reference, requiring move-semantics to indicate
      that when inserting a primitive, the GPUPrimitiveEmitter takes
      ownership of the pointer.
      
      All instances of primitive creation have been modified.
      
      * CUDNN_SAFE_CALL
      
      * Removed redundant comment and made variable names more verbose.
      
      * Change from conditionals to case-switch in pooling to conform to
      batchnorm per @fengleitian's suggestion.
      23ac5e5a
    • Fenglei's avatar
      add cudnn include path to compiler.cpp (#902) · b0421577
      Fenglei authored
      * add cudnn include path to compiler.cpp
      
      * seperate two path
      
      * Skipping one_hot tests for CPU as
      CI is failing. JIRA bug report: https://jira01.devtools.intel.com/browse/NGRAPH-1682.
      b0421577