-
Tristan Webb authored
* Drwebb/gpu backend dot op (#387) * GPU Dot prod emitter switch statement * cuBLAS dot kernel call * Flush out arg substitution into gpu dot kernel call * Drwebb/gpu backend dot op (#392) * Take in CodeWriter into gpu op emitters * Introduce GPU function gen based on pass functions * Additional gpu emitter stubs * link cublas in to unit test and ngraph * Use static code gen methods for GPU, add new GPU op stubs * use pass manager to declare functions / cublas Updates * Prune down gpu_external_function wip * Switch back to GPU tensor views in GPU backend * Pass in cublas handle to GPU external function * cuMalloc memory in gpu tensor view * Use cuda runtime malloc and free for tensor view managment c * change GPU tensor view init, and use GPU tensor view for GPU call frame * include headers as system dirs * GPU tensor printing utility function * cublasSetPointer to device mode / Fix copyright notification lowercasing * Passing GPU dot product test using cuBLAS Clean up * Changes from review
Name |
Last commit
|
Last update |
---|---|---|
cmake | ||
contrib/docker | ||
doc | ||
maint | ||
src | ||
test | ||
third-party | ||
.clang-format | ||
.gitignore | ||
CMakeLists.txt | ||
README-RESNET.rst | ||
README.md | ||
changes.md |