1. 13 Jul, 2018 11 commits
    • Chris Sullivan's avatar
      Refactored GPU backend state into BackendContext (#1186) · 55a25d41
      Chris Sullivan authored
      * Refactored GPU backend state into BackendContext and moved it to the highest level GPU_Backend.
      Some bugs have appeared in so doing. Needs investigation.
      
      * extra *block_size
      
      * change grid_size to threads
      
      * Bug fix in softmax cache parameters.
      
      * Additional bug fix for maxpool1d cache parameters.
      
      * Bug fix in softmax cache parameters.
      
      * Additional bug fix for maxpool1d cache parameters.
      
      * Remove temporary print statements.
      
      * Use nthreads in primitive hash.
      
      * Switched from using stack references for cudnn and cublas handles to heap pointers held only the c-struct GPURuntimeContext but managed by the GPU_Backend.
      
      * Refactored the use of GPURuntimeContext* ctx throughout the emitters.
      
      * Use std::prev instead of operator-- for memory iteratory capture
      
      * bug fix from abaf1d7
      55a25d41
    • dmyershov's avatar
      Backend/API: Implementation of the call method for IntelGPU (#1199) · 8bde818c
      dmyershov authored
      * Backend/API: Implementation of the call method for IntelGPU
      
      * intel_gpu_style_fix_1199
      
      * Copy memory from clDNN to Tensor
      
      * Code style fix in 1199.2
      8bde818c
    • Nick Korovaiko's avatar
      get_subgraph_outputs (towards checking that intermediate nodes in a matched graph not used) (#1207) · 83e7dba5
      Nick Korovaiko authored
      * get_subgraph_outputs
      
      * simplify the condition
      83e7dba5
    • Robert Kimball's avatar
      minor speed increase (#1218) · 33b54ce1
      Robert Kimball authored
      33b54ce1
    • Jaikrishnan Menon's avatar
      346f480f
    • Jayaram Bobba's avatar
      Jbobba/dex computation reuse (#1219) · 7d59542d
      Jayaram Bobba authored
      * CPU Direct Execution: Implement ConvertLayout and refactor
      
      * CPU Direct Execution: Implement Convolution
      
      * 1) Adds computation reuse to direct execution
      2) Add avg_pool, broadcast and convolution_bias to direct execution
      3) Moved some computation reuse utility functions to graph_utils
      
      * Use lists instead of vectors to avoid reallocation overheads
      
      * - Style fix
      
      * style fix
      7d59542d
    • Fenglei's avatar
      gpu_external_function and gpu constant memory refactor (#1189) · 260cb90d
      Fenglei authored
      * refactor external function
      
      * wokring version
      
      * fix bug
      
      * add emit_fucntions, emit_declare_constants, emit_declare_functions
      
      * add std::
      
      * add functions declaration
      
      * fix bugs
      
      * fix bugs
      
      * separate temp memory allocation and release
      
      * add invoke_constant_ptr function, clean up outputs for function
      
      * fix bugs, compiled ok
      
      * add ctx to emit_declare_constant
      
      * cleanup code, code style
      
      * remove using std, code style
      
      * revert std changes
      
      * change function names based Chris's comments
      
      * add ResultCopyElimination to pass_manager
      
      * clang format
      260cb90d
    • shssf's avatar
      Backend/API: Implementation of ADD and MUL operations in the compile() (#1200) · 2c345798
      shssf authored
      * Backend/API: Implementation of ADD and MUL operations in the compile method for IntelGPU
      
      * Branch merge conflicts resolved
      
      * Parameters number check moved to function. RESULT operation handling added.
      2c345798
    • Louis Feng's avatar
      268853d0
    • Chris Sullivan's avatar
      Fix incorrect hash strings for softmax and 1d maxpool. (#1195) · 4659d60d
      Chris Sullivan authored
      * Bug fix in softmax cache parameters.
      
      * Additional bug fix for maxpool1d cache parameters.
      
      * Formatting.
      
      * Use nthreads in primitive hash.
      4659d60d
    • Fenglei's avatar
      gpu reshape optimization (#1174) · b5e69eaa
      Fenglei authored
      * add gpu_timer to external function
      
      * compiled version
      
      * working version
      
      * using block_begin and block_end
      
      * add the missing '
      ;'
      
      * move slice to cuda emiter
      
      * change size_t to uint32_t in kernel
      
      * working version
      
      * change block size from 1 to 64
      
      * fix bugs
      
      * nthreads need to be size_t in broadcast op
      
      * add rank to kernel name hash
      
      * change reshape to cuda_emitter
      
      * fix bugs
      
      * bug, remove rank from kernel
      
      * clang format
      
      * update slice in convolution
      
      * resolve index conflict
      
      * change align to align_to_blocksize, add overflow check
      
      * add gird size check and fix pool merge bug
      
      * code style, change names
      
      * fix merge conflict
      
      * change kernel_runner to kernel_launch
      b5e69eaa
  2. 12 Jul, 2018 4 commits
  3. 11 Jul, 2018 2 commits
  4. 10 Jul, 2018 1 commit
  5. 09 Jul, 2018 4 commits
  6. 08 Jul, 2018 2 commits
  7. 07 Jul, 2018 4 commits
  8. 06 Jul, 2018 4 commits
  9. 05 Jul, 2018 4 commits
  10. 04 Jul, 2018 1 commit
  11. 03 Jul, 2018 3 commits