- 13 Jul, 2018 11 commits
-
-
Chris Sullivan authored
* Refactored GPU backend state into BackendContext and moved it to the highest level GPU_Backend. Some bugs have appeared in so doing. Needs investigation. * extra *block_size * change grid_size to threads * Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Remove temporary print statements. * Use nthreads in primitive hash. * Switched from using stack references for cudnn and cublas handles to heap pointers held only the c-struct GPURuntimeContext but managed by the GPU_Backend. * Refactored the use of GPURuntimeContext* ctx throughout the emitters. * Use std::prev instead of operator-- for memory iteratory capture * bug fix from abaf1d7
-
dmyershov authored
* Backend/API: Implementation of the call method for IntelGPU * intel_gpu_style_fix_1199 * Copy memory from clDNN to Tensor * Code style fix in 1199.2
-
Nick Korovaiko authored
* get_subgraph_outputs * simplify the condition
-
Robert Kimball authored
-
Jaikrishnan Menon authored
-
Jayaram Bobba authored
* CPU Direct Execution: Implement ConvertLayout and refactor * CPU Direct Execution: Implement Convolution * 1) Adds computation reuse to direct execution 2) Add avg_pool, broadcast and convolution_bias to direct execution 3) Moved some computation reuse utility functions to graph_utils * Use lists instead of vectors to avoid reallocation overheads * - Style fix * style fix
-
Fenglei authored
* refactor external function * wokring version * fix bug * add emit_fucntions, emit_declare_constants, emit_declare_functions * add std:: * add functions declaration * fix bugs * fix bugs * separate temp memory allocation and release * add invoke_constant_ptr function, clean up outputs for function * fix bugs, compiled ok * add ctx to emit_declare_constant * cleanup code, code style * remove using std, code style * revert std changes * change function names based Chris's comments * add ResultCopyElimination to pass_manager * clang format
-
shssf authored
* Backend/API: Implementation of ADD and MUL operations in the compile method for IntelGPU * Branch merge conflicts resolved * Parameters number check moved to function. RESULT operation handling added.
-
Louis Feng authored
-
Chris Sullivan authored
* Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Formatting. * Use nthreads in primitive hash.
-
Fenglei authored
* add gpu_timer to external function * compiled version * working version * using block_begin and block_end * add the missing ' ;' * move slice to cuda emiter * change size_t to uint32_t in kernel * working version * change block size from 1 to 64 * fix bugs * nthreads need to be size_t in broadcast op * add rank to kernel name hash * change reshape to cuda_emitter * fix bugs * bug, remove rank from kernel * clang format * update slice in convolution * resolve index conflict * change align to align_to_blocksize, add overflow check * add gird size check and fix pool merge bug * code style, change names * fix merge conflict * change kernel_runner to kernel_launch
-
- 12 Jul, 2018 4 commits
-
-
Louis Feng authored
* reshape inplace without copy data if possible. * added reshape and broadcast to CSE. * Fixed debug messages.
-
Robert Kimball authored
* remove custom install path * fix travis build * Add NGRAPH_INSTALL_PREFIX as an alias for CMAKE_INSTALL_PREFIX to make our unit tests pass. * change install path setting
-
Robert Kimball authored
* open only the unversioned library but check that it is built against the correct version of ngraph * review comments
-
Fenglei authored
* add CUDA_SAFE_CALL to all cuda calls * add CUDA_RT_SAFE_CALL * add null ptr check before free * init pointer to nullptr * consolidate conditions
-
- 11 Jul, 2018 2 commits
-
-
Jaikrishnan Menon authored
* CPU Direct Execution: Implement ConvertLayout and refactor * CPU Direct Execution: Implement Convolution
-
Pruthvi authored
-
- 10 Jul, 2018 1 commit
-
-
Adam Rogowiec authored
* Enable retrieving data from Constant in python. * Test on wide value range.
-
- 09 Jul, 2018 4 commits
-
-
Robert Kimball authored
* Faster liveness. Memory manager optimized for non-sharing of tensors. Add pass manager profiler. * Move pass profiler to a separate PR * Move Memory Layout optimizations to a separate PR * use find instead of count
-
Robert Kimball authored
* Cache some generated functions in backwards tests to speed performance * more caching
-
Michał Karzyński authored
-
Robert Kimball authored
Better CI performance
-
- 08 Jul, 2018 2 commits
-
-
Robert Kimball authored
* Memory Layout pass optimizations * rename SIMPLE memory allocator
-
Robert Kimball authored
-
- 07 Jul, 2018 4 commits
-
-
shssf authored
-
Robert Kimball authored
* complete the new backend construction/destruction API * close each dlopen * don't close libraries for now as it causes python to segfault
-
Nick Korovaiko authored
-
Pruthvi authored
-
- 06 Jul, 2018 4 commits
-
-
Jayaram Bobba authored
* inplace compute * fix warnings * Initial support for convolution sum fusion * Added in-place support for conv sum fusion and test cases * reverting spurious changes * Bug fix to account for inplace input in conv sum fusion * fix compilation error * Addressed PR feedback * Handle corner cases for conv sum fusion. Skip computation reuse while using an inplace kernel * Check node argument for in-place relu assignment * Addressed PR comments * Addressed PR feedback
-
Nishant Patel authored
* Usage of mkldnn reshape updated * update reshape condition for mkldnn * Add a test case and order in which conditions are checked
-
Nick Korovaiko authored
* collect matched nodes * clear m_matched_list * tests * address feedback
-
Adam Rogowiec authored
-
- 05 Jul, 2018 4 commits
-
-
Scott Cyphers authored
* Fix short markup * Minor adjustments, license requirements.
-
Nick Korovaiko authored
-
Fenglei authored
* extra *block_size * change grid_size to threads
-
Yixing Lao authored
-
- 04 Jul, 2018 1 commit
-
-
Artur Wojcik authored
-
- 03 Jul, 2018 3 commits
-
-
Adam Procter authored
-
Louis Feng authored
* hacking to support dot of 3 by 2 inputs with gemm_batch. * clean up.
-
Robert Kimball authored
* nbench cleanup * update style
-