Adding support for GPU elementwise ops for arbitrarily many inputs (#618)
* Refactored unary elementwise ops into a single interface that is adaptable to elementwise ops with arbitrary number of inputs. * Renamed EmitUnaryElementwise -> EmitElementwise. Implemented first binary elementwise op (Power). * Refactored some of the boiler plate code for emitting cuda kernels to nvrtc out of the emit functions and into the CudaFunctionPool static singleton. CodeWriter now saves cuda kernels to ./gpu_codegen. * Added ops Divide, Subtract & Sign to the GPU transformer. Subtract and Sign both use custom device helper functions which have math kernels defined for the op in gpu_cuda_kernel_ops.hpp, and which are built by a new get_device_helper function.
Showing
Please
register
or
sign in
to comment