-
Tristan Webb authored
* Initial GPU_ExternalFunction implementation Other changes: Add GPU runtime to same cmake block as GPU, include CUDA headers if GPU enabled Initial passing (a+b)*c test Properly link cuda libraries Simple GPUTensorView implementation Initial GPU emitter GPU codegen initial function gen, no kernels yet Rename GPU emitter and tensor_view_wrapper to match naming convention * GPU external function based on BASE * Fix stray base -> gpu * TensorViewWrapper -> GPU_TensorViewWrapper * Copy over emitter from base transformer * Fix for naming dense layout * Copy kernel emitters from base -> gpu and strip out kernel_utils * Add aliases to GPU_TensorViewWrappers * More fixes for naming descriptor::TensorViews * Move in call_frame implementation from base -> gpu * apply code format * GPU codegen running A+B*C gpu emitters gpu ctx setup cuda_module kernels Remove GPU_CF perf counters Use gpu kernels in external function Add GPU 1d dot test Review Changes: * Remove CPU specific kernel emitting method bodies * Use copy_data from test/util.cpp, uncomment compileTest * Use test_utils copy_data function * Grab function name from pass manager for def, clean up indentation
Name |
Last commit
|
Last update |
---|---|---|
cmake | ||
contrib/docker | ||
doc | ||
maint | ||
src | ||
test | ||
third-party | ||
.clang-format | ||
.gitignore | ||
CMakeLists.txt | ||
README-RESNET.rst | ||
README.md | ||
changes.md |