- 20 Jul, 2018 13 commits
-
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
Also modify existing kernel so it works within the builder framework
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
and remove broadcast, which will be replaced with an Eigen implementation
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
This allows op builders to be self-contained changesets
-
- 19 Jul, 2018 2 commits
-
-
L.S. Cook authored
* update version and add glossary defs * clean up graph rewrite code blocks * PR feedback * add better details to LSTM def * RNN def generalized * adding fancy formulas to RNN def glossary entry * Address API breaking change in PR 1164 * all of the documentation re default install path needed updated with pr 1164 * Assert manual compilation process to build ngraph_dist locally as a sensible default
-
shssf authored
* IntelGPUBackend: const, div, maxpool and max operations * IntelGPUBackend: negative, abs, relu, sqrt, tanh and substract operations * Update intelgpu_backend.cpp
-
- 18 Jul, 2018 13 commits
-
-
Robert Kimball authored
* make pool test check backends other than CPU * more unit test cleanup
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
-
Artur Wojcik authored
* onnx: add 'constant' operator Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> * onnx: getting attribute value by name Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> * onnx: fix code style Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> * onnx: fix clang compilation warnings Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> * onnx: incorporate review comments Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
-
Nick Korovaiko authored
* cpu loop kernel fusion pass * remove extra code * bounded relu test * address scotts feedback
-
shssf authored
* IntelGPUBackend: BatchNorm 5x1 operation * Update intelgpu_op_batchnorm.cpp * PR1244 Comments are adressed
-
Robert Kimball authored
-
Jayaram Bobba authored
-
Adam Procter authored
* Fix a segfault in the strided conv optimization * Only bail if all *live* users are not Convolution
-
L.S. Cook authored
* Draft of updates for JIRA tasks WIP * meta to correct spelling
-
Amy Zhuang authored
* Modify TBB graph nodes creation and deletion * Add a graph* member to CPURuntimeContext. * Create nodes the first time a function is called, all the following calls only exectue the computation. * Delete nodes when cleanup_runtime_context is called. * Add TBB global_control* and task_scheduler_init* members to CPURuntimeContext. * Remove one comment. Do not write two TBB header files and one #define to generated c++ source code. * Move TBB header file and #define before other header files in generated c++ source code. * Move one comment to the top in generated c++ source code.
-
Nick Korovaiko authored
* inplace results * fix parameter propagation * fix python tests
-
Robert Kimball authored
* change GPU to use cfe pass * update per review comments
-
- 17 Jul, 2018 2 commits
-
-
Jaikrishnan Menon authored
-
Jayaram Bobba authored
* CPU Direct Execution: Implement ConvertLayout and refactor * CPU Direct Execution: Implement Convolution * 1) Adds computation reuse to direct execution 2) Add avg_pool, broadcast and convolution_bias to direct execution 3) Moved some computation reuse utility functions to graph_utils * Use lists instead of vectors to avoid reallocation overheads * - Added convolution variants to direct execution - Removed ConvolutionBiasRelu, use ConvolutionBias instead - Reduced code duplication by moving functionality to mkldnn_emitter from cpu_emitter * Style fix * Moved mkldnn build_convolution to a templated method * Style fix * refactored mkldnn conv bprop builders * Style fix
-
- 14 Jul, 2018 4 commits
-
-
Robert Kimball authored
move long building tests to the be the first tests built with the hope of reducing build time. (#1229)
-
Robert Kimball authored
-
Fenglei authored
* using async gpu timers * remove sync for cuda calls, add async gpu stopwatch, add count to timing-detail * add debug sync * make timer static * move timer to runtime context
-
L.S. Cook authored
* Draft of updates for JIRA tasks WIP * correct typo * more cleanup * more cleanup
-
- 13 Jul, 2018 6 commits
-
-
Chris Sullivan authored
* Refactored GPU backend state into BackendContext and moved it to the highest level GPU_Backend. Some bugs have appeared in so doing. Needs investigation. * extra *block_size * change grid_size to threads * Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Bug fix in softmax cache parameters. * Additional bug fix for maxpool1d cache parameters. * Remove temporary print statements. * Use nthreads in primitive hash. * Switched from using stack references for cudnn and cublas handles to heap pointers held only the c-struct GPURuntimeContext but managed by the GPU_Backend. * Refactored the use of GPURuntimeContext* ctx throughout the emitters. * Use std::prev instead of operator-- for memory iteratory capture * bug fix from abaf1d7
-
dmyershov authored
* Backend/API: Implementation of the call method for IntelGPU * intel_gpu_style_fix_1199 * Copy memory from clDNN to Tensor * Code style fix in 1199.2
-
Nick Korovaiko authored
* get_subgraph_outputs * simplify the condition
-
Robert Kimball authored
-
Jaikrishnan Menon authored
-
Jayaram Bobba authored
* CPU Direct Execution: Implement ConvertLayout and refactor * CPU Direct Execution: Implement Convolution * 1) Adds computation reuse to direct execution 2) Add avg_pool, broadcast and convolution_bias to direct execution 3) Moved some computation reuse utility functions to graph_utils * Use lists instead of vectors to avoid reallocation overheads * - Style fix * style fix
-