• Chris Sullivan's avatar
    Adding support for GPU elementwise ops for arbitrarily many inputs (#618) · 89da71d3
    Chris Sullivan authored
    * Refactored unary elementwise ops into a single interface
    that is adaptable to elementwise ops with arbitrary number of inputs.
    
    * Renamed EmitUnaryElementwise -> EmitElementwise.
    Implemented first binary elementwise op (Power).
    
    * Refactored some of the boiler plate code for emitting cuda kernels to nvrtc
    out of the emit functions and into the CudaFunctionPool static singleton.
    CodeWriter now saves cuda kernels to ./gpu_codegen.
    
    * Added ops Divide, Subtract & Sign to the GPU transformer.
    Subtract and Sign both use custom device helper functions which
    have math kernels defined for the op in gpu_cuda_kernel_ops.hpp,
    and which are built by a new get_device_helper function.
    89da71d3
Name
Last commit
Last update
cmake Loading commit data...
contrib/docker Loading commit data...
doc Loading commit data...
maint Loading commit data...
src Loading commit data...
test Loading commit data...
third-party Loading commit data...
.clang-format Loading commit data...
.gitignore Loading commit data...
CMakeLists.txt Loading commit data...
INSTALL Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
changes.md Loading commit data...