- 18 Jul, 2018 8 commits
-
-
shssf authored
* IntelGPUBackend: BatchNorm 5x1 operation * Update intelgpu_op_batchnorm.cpp * PR1244 Comments are adressed
-
Robert Kimball authored
-
Jayaram Bobba authored
-
Adam Procter authored
* Fix a segfault in the strided conv optimization * Only bail if all *live* users are not Convolution
-
L.S. Cook authored
* Draft of updates for JIRA tasks WIP * meta to correct spelling
-
Amy Zhuang authored
* Modify TBB graph nodes creation and deletion * Add a graph* member to CPURuntimeContext. * Create nodes the first time a function is called, all the following calls only exectue the computation. * Delete nodes when cleanup_runtime_context is called. * Add TBB global_control* and task_scheduler_init* members to CPURuntimeContext. * Remove one comment. Do not write two TBB header files and one #define to generated c++ source code. * Move TBB header file and #define before other header files in generated c++ source code. * Move one comment to the top in generated c++ source code.
-
Nick Korovaiko authored
* inplace results * fix parameter propagation * fix python tests
-
Robert Kimball authored
* change GPU to use cfe pass * update per review comments
-
- 17 Jul, 2018 2 commits
-
-
Jaikrishnan Menon authored
-
Jayaram Bobba authored
* CPU Direct Execution: Implement ConvertLayout and refactor * CPU Direct Execution: Implement Convolution * 1) Adds computation reuse to direct execution 2) Add avg_pool, broadcast and convolution_bias to direct execution 3) Moved some computation reuse utility functions to graph_utils * Use lists instead of vectors to avoid reallocation overheads * - Added convolution variants to direct execution - Removed ConvolutionBiasRelu, use ConvolutionBias instead - Reduced code duplication by moving functionality to mkldnn_emitter from cpu_emitter * Style fix * Moved mkldnn build_convolution to a templated method * Style fix * refactored mkldnn conv bprop builders * Style fix
-
- 14 Jul, 2018 4 commits
-
-
Robert Kimball authored
move long building tests to the be the first tests built with the hope of reducing build time. (#1229)
-
Robert Kimball authored
-
Fenglei authored
* using async gpu timers * remove sync for cuda calls, add async gpu stopwatch, add count to timing-detail * add debug sync * make timer static * move timer to runtime context
-
L.S. Cook authored
* Draft of updates for JIRA tasks WIP * correct typo * more cleanup * more cleanup
-
- 13 Jul, 2018 11 commits
-
-
Chris Sullivan authored
* Refactored GPU backend state into BackendContext and moved it to the highest level GPU_Backend. Some bugs have appeared in so doing. Needs investigation. * extra *block_size * change grid_size to threads * Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Remove temporary print statements. * Use nthreads in primitive hash. * Switched from using stack references for cudnn and cublas handles to heap pointers held only the c-struct GPURuntimeContext but managed by the GPU_Backend. * Refactored the use of GPURuntimeContext* ctx throughout the emitters. * Use std::prev instead of operator-- for memory iteratory capture * bug fix from abaf1d7
-
dmyershov authored
* Backend/API: Implementation of the call method for IntelGPU * intel_gpu_style_fix_1199 * Copy memory from clDNN to Tensor * Code style fix in 1199.2
-
Nick Korovaiko authored
* get_subgraph_outputs * simplify the condition
-
Robert Kimball authored
-
Jaikrishnan Menon authored
-
Jayaram Bobba authored
* CPU Direct Execution: Implement ConvertLayout and refactor * CPU Direct Execution: Implement Convolution * 1) Adds computation reuse to direct execution 2) Add avg_pool, broadcast and convolution_bias to direct execution 3) Moved some computation reuse utility functions to graph_utils * Use lists instead of vectors to avoid reallocation overheads * - Style fix * style fix
-
Fenglei authored
* refactor external function * wokring version * fix bug * add emit_fucntions, emit_declare_constants, emit_declare_functions * add std:: * add functions declaration * fix bugs * fix bugs * separate temp memory allocation and release * add invoke_constant_ptr function, clean up outputs for function * fix bugs, compiled ok * add ctx to emit_declare_constant * cleanup code, code style * remove using std, code style * revert std changes * change function names based Chris's comments * add ResultCopyElimination to pass_manager * clang format
-
shssf authored
* Backend/API: Implementation of ADD and MUL operations in the compile method for IntelGPU * Branch merge conflicts resolved * Parameters number check moved to function. RESULT operation handling added.
-
Louis Feng authored
-
Chris Sullivan authored
* Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Formatting. * Use nthreads in primitive hash.
-
Fenglei authored
* add gpu_timer to external function * compiled version * working version * using block_begin and block_end * add the missing ' ;' * move slice to cuda emiter * change size_t to uint32_t in kernel * working version * change block size from 1 to 64 * fix bugs * nthreads need to be size_t in broadcast op * add rank to kernel name hash * change reshape to cuda_emitter * fix bugs * bug, remove rank from kernel * clang format * update slice in convolution * resolve index conflict * change align to align_to_blocksize, add overflow check * add gird size check and fix pool merge bug * code style, change names * fix merge conflict * change kernel_runner to kernel_launch
-
- 12 Jul, 2018 4 commits
-
-
Louis Feng authored
* reshape inplace without copy data if possible. * added reshape and broadcast to CSE. * Fixed debug messages.
-
Robert Kimball authored
* remove custom install path * fix travis build * Add NGRAPH_INSTALL_PREFIX as an alias for CMAKE_INSTALL_PREFIX to make our unit tests pass. * change install path setting
-
Robert Kimball authored
* open only the unversioned library but check that it is built against the correct version of ngraph * review comments
-
Fenglei authored
* add CUDA_SAFE_CALL to all cuda calls * add CUDA_RT_SAFE_CALL * add null ptr check before free * init pointer to nullptr * consolidate conditions
-
- 11 Jul, 2018 2 commits
-
-
Jaikrishnan Menon authored
* CPU Direct Execution: Implement ConvertLayout and refactor * CPU Direct Execution: Implement Convolution
-
Pruthvi authored
-
- 10 Jul, 2018 1 commit
-
-
Adam Rogowiec authored
* Enable retrieving data from Constant in python. * Test on wide value range.
-
- 09 Jul, 2018 4 commits
-
-
Robert Kimball authored
* Faster liveness. Memory manager optimized for non-sharing of tensors. Add pass manager profiler. * Move pass profiler to a separate PR * Move Memory Layout optimizations to a separate PR * use find instead of count
-
Robert Kimball authored
* Cache some generated functions in backwards tests to speed performance * more caching
-
Michał Karzyński authored
-
Robert Kimball authored
Better CI performance
-
- 08 Jul, 2018 2 commits
-
-
Robert Kimball authored
* Memory Layout pass optimizations * rename SIMPLE memory allocator
-
Robert Kimball authored
-
- 07 Jul, 2018 2 commits
-
-
shssf authored
-
Robert Kimball authored
* complete the new backend construction/destruction API * close each dlopen * don't close libraries for now as it causes python to segfault
-