-
Fenglei authored
* add some helper function * update with new helper function * update reduce to nd with new helper function * update float sum to stable sum * fix bug * update all reduce to stable sum for float * fix bug and pass the sum stable test * remove debug info * style * update with shape * fix bug * add host parameters to cuda_emitter * clang format * fix bugs * add element::type support * format * add a cached value with datatype name * add init_reduce_value * unroll loop * optimization * remove the need for init_value * add memset kernel * add memcpy * working version * remove debug info * add comments, clean up code. * change in_idx to input_idx * fix bug * change args name for memset in emitter * pass element::Type instead of string * the op::reduce come with init value, add support * resolve codacy-bot comment * fix bug * resove codacy-bot comment * add soft_max_block_reduce kernel * fix bugs * add softmax_block_reduce to cuda_emitter * compiing ok, result wrong * fix bug in kernel * working version * removed unused code * remove unused comments, resolve comments * cuda reduce for max, min, mul, reduce op init value, format * use type::info * use type info for numeric_limits * remove code from gpu_host_parameters * header * remvoe outdated comments * add helper to check if stable sum is needed * add stable sum test for double * remove extra line * consolidate helper functions * no need list now. * remove extra ; * clang format * style * add skip test for cpu and intelGPU side * resolve more conflict * update comment * fix a warning * Update src/ngraph/runtime/gpu/gpu_cuda_kernel_builder.cpp using load. Co-Authored-By:
fengleitian <35274053+fengleitian@users.noreply.github.com> * using WARPSIZE instead of 32, using lambda * more WARPSIZE instead of 32 * fix block_size_x bug * using __expf
a3133482
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
ngraph | ||
resource | ||
tools | ||
CMakeLists.txt |