Commit a3133482 authored by Fenglei's avatar Fenglei Committed by Scott Cyphers

nvgpu cuda softmax optimization (#2101)

* add some helper function

* update with new helper function

* update reduce to nd with new helper function

* update float sum to stable sum

* fix bug

* update all reduce to stable sum for float

* fix bug and pass the sum stable test

* remove debug info

* style

* update with shape

* fix bug

* add host parameters to cuda_emitter

* clang format

* fix bugs

* add element::type support

* format

* add a cached value with datatype name

* add init_reduce_value

* unroll loop

* optimization

* remove the need for init_value

* add memset kernel

* add memcpy

* working version

* remove debug info

* add comments, clean up code.

* change in_idx to input_idx

* fix bug

* change args name for memset in emitter

* pass element::Type instead of string

* the op::reduce come with init value, add support

* resolve codacy-bot comment

* fix bug

* resove codacy-bot comment

* add soft_max_block_reduce kernel

* fix bugs

* add softmax_block_reduce to cuda_emitter

* compiing ok, result wrong

* fix bug in kernel

* working version

* removed unused code

* remove unused comments, resolve comments

* cuda reduce for max, min, mul, reduce op init value, format

* use type::info

* use type info for numeric_limits

* remove code from gpu_host_parameters

* header

* remvoe outdated comments

* add helper to check if stable sum is needed

* add stable sum test for double

* remove extra line

* consolidate helper functions

* no need list now.

* remove extra ;

* clang format

* style

* add skip test for cpu and intelGPU side

* resolve more conflict

* update comment

* fix a warning

* Update src/ngraph/runtime/gpu/gpu_cuda_kernel_builder.cpp

using load.
Co-Authored-By: 's avatarfengleitian <35274053+fengleitian@users.noreply.github.com>

* using WARPSIZE instead of 32, using lambda

* more WARPSIZE instead of 32

* fix block_size_x bug

* using __expf
parent 6584306c
This diff is collapsed.
...@@ -190,7 +190,7 @@ namespace ngraph ...@@ -190,7 +190,7 @@ namespace ngraph
size_t concat_axis, size_t concat_axis,
NVShape output_shape); NVShape output_shape);
size_t build_softmax(const std::vector<std::string>& dtypes, size_t build_softmax(const std::vector<element::Type>& dtypes,
NVShape input_shape, NVShape input_shape,
NVShape reduce_axis); NVShape reduce_axis);
......
...@@ -208,6 +208,14 @@ namespace ngraph ...@@ -208,6 +208,14 @@ namespace ngraph
size_t out_rank, size_t out_rank,
size_t reduce_rank); size_t reduce_rank);
static void get_softmax_block_reduce_op(codegen::CodeWriter& writer,
const std::string& name,
runtime::gpu::GPUKernelArgs& args,
const std::vector<std::string>& data_types,
size_t non_reduce_rank,
size_t reduce_rank,
size_t block_size_x);
static void add_pod_typedefs(codegen::CodeWriter& writer); static void add_pod_typedefs(codegen::CodeWriter& writer);
static void coordinate_transform_to_multi_d(codegen::CodeWriter& writer, static void coordinate_transform_to_multi_d(codegen::CodeWriter& writer,
......
...@@ -1477,9 +1477,9 @@ void runtime::gpu::GPU_Emitter::emit_Softmax(EMIT_ARGS) ...@@ -1477,9 +1477,9 @@ void runtime::gpu::GPU_Emitter::emit_Softmax(EMIT_ARGS)
writer.block_begin(); writer.block_begin();
{ {
auto axes_set = softmax->get_axes(); auto axes_set = softmax->get_axes();
std::vector<string> dtypes; std::vector<element::Type> dtypes;
dtypes.push_back(args[0].get_type()); dtypes.push_back(args[0].get_element_type());
dtypes.push_back(out[0].get_type()); dtypes.push_back(out[0].get_element_type());
auto& cuda_emitter = external_function->get_primitive_emitter()->get_cuda_emitter(); auto& cuda_emitter = external_function->get_primitive_emitter()->get_cuda_emitter();
size_t index = cuda_emitter->build_softmax(dtypes, args[0].get_shape(), axes_set); size_t index = cuda_emitter->build_softmax(dtypes, args[0].get_shape(), axes_set);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment