• Chris Sullivan's avatar
    CUDNN BatchNorm (inference/forward/backward) (#893) · 23ac5e5a
    Chris Sullivan authored
    * Added cudnn batch norm operation to GPU transformer.
    Brought batchnorm tests out of cpu_tests and into
    backend_tests. Need to add JIRA ticket for interpreter
    SKIPS.
    
    * CUDNN batchnorm is implemented. In the ForwardTraining branch
    CUDNN seems to calculate the batch mean correctly but the batch variance incorrectly.
    Currently the batchnorm output and mean are calculated correctly for tests:
    * GPU.batchnorm_fprop_b2c2h3w3_mean_var
    * GPU.batchnorm_fprop_b1c2h2w2
    * GPU.batchnorm_fprop_b2c2h2w1
    but the variance calculated for the batches in these tests is incorrectly calculated by CUDNN.
    
    Also added an additional test and cleaned up some of the old tests.
    
    * MKLDNN internally utilizes the biased estimate of the population variance
    and the tests have been crafted to suit MKLDNN. According to the original
    batchnorm publication (https://arxiv.org/pdf/1502.03167v3.pdf), population
    (unbiased) statistics should be used for inference, and mini-batch (biased)
    statistics should be used training (forward/backward). For the variance this
    means utlitizing the following equations, respectively:
    
      (biased)   Var[X] = 1/m * Sum_i(x_i-mu)^2      :: used in training
      (unbiased) Var[X] = 1/(m-1) * Sum_i(x_i-mu)^2  :: used in inference
    
      s.t. x_i are elements of X and m = N*D*H*W.
    
    For large batch sizes in inference this may not impact convergence as m >> 1,
    but for small batch sizes it will. CUDNN internally utilizes the unbiased
    variance.
    
    Changes:
    * Added Multiply op to Forward pass of batchnorm to convert
      the unbiased variance to a biased one. The op utilizes the
      blending scaling factors to apply the bias factor.
    * Adds emission for the BatchNormBackprop kernel and cleans up
      the emitter implementation.
    
    * Added hashing to cudnn::batchnorm op.
    
    * Formatting.
    
    * Changed hashing of epsilon in cudnn batchnorm.
    
    * Remove implicit conversion and default case in switch for bn.
    
    * Added skips for IE transformer on batchnorm.
    
    * add cudnn include path to compiler.cpp
    
    * seperate two path
    
    * PR #892 and #825 which were recently merged both forgot skips for the GPU backend.
    Adding them in as they are unimplemented ops.
    
    * The allocation and deletion of primitives was occuring in seperate
    translation units with raw c pointers. Because of this, it was not
    clear that these were being freed appropriate, nor did it indicate
    ownership of the pointers.
    
    In this commit these raw pointers have been converted over to
    std::unique_ptrs such that the construction/destruction is managed
    automatically. Furthermore, GPUPrimitiveEmitter::insert now only
    takes an r-value reference, requiring move-semantics to indicate
    that when inserting a primitive, the GPUPrimitiveEmitter takes
    ownership of the pointer.
    
    All instances of primitive creation have been modified.
    
    * CUDNN_SAFE_CALL
    
    * Removed redundant comment and made variable names more verbose.
    
    * Change from conditionals to case-switch in pooling to conform to
    batchnorm per @fengleitian's suggestion.
    23ac5e5a
gpu_emitter.cpp 72.4 KB