- 09 Feb, 2018 1 commit
-
-
Tristan Webb authored
* GPU kernels for reshape, GEMM, EW ADD/Mult, Maximum (A + B) * C test now with cuBLAS Additional gemm and gemv calls cmake updates for cuDNN calls memcpy wrappers in gpu_util Additional passing tests: aliased outputs, parameter, constant tensor memcopy
-
- 08 Feb, 2018 1 commit
-
-
Jennifer Myers authored
-
- 02 Feb, 2018 2 commits
-
-
Tristan Webb authored
-
Tristan Webb authored
GPU ew add and mult cuBLAS calls GPU (A + B) * C with cuBLAS Additional gemm and gemv calls cmake updates for cuDNN calls kernels WIP params for dot gemm more kernel WIP memcpy wrappers aliased outputs, parameter, constant tensor memcopy comment cleanup remove cruft gpu faster gemm MNIST WIP Cleanup
-
- 24 Jan, 2018 1 commit
-
-
Tristan Webb authored
* Drwebb/gpu backend dot op (#387) * GPU Dot prod emitter switch statement * cuBLAS dot kernel call * Flush out arg substitution into gpu dot kernel call * Drwebb/gpu backend dot op (#392) * Take in CodeWriter into gpu op emitters * Introduce GPU function gen based on pass functions * Additional gpu emitter stubs * link cublas in to unit test and ngraph * Use static code gen methods for GPU, add new GPU op stubs * use pass manager to declare functions / cublas Updates * Prune down gpu_external_function wip * Switch back to GPU tensor views in GPU backend * Pass in cublas handle to GPU external function * cuMalloc memory in gpu tensor view * Use cuda runtime malloc and free for tensor view managment c * change GPU tensor view init, and use GPU tensor view for GPU call frame * include headers as system dirs * GPU tensor printing utility function * cublasSetPointer to device mode / Fix copyright notification lowercasing * Passing GPU dot product test using cuBLAS Clean up * Changes from review
-
- 20 Jan, 2018 1 commit
-
-
Robert Kimball authored
* wip * wip * remove get_vector from runtime::TensorView class as it was for unit test only * cleanup * move writting vector to runtime::TensorView to the unit test dir * merge fix * PR review change * update from PR comment * update changes file
-
- 19 Jan, 2018 1 commit
-
-
Tristan Webb authored
* Add mention of blob ref of original file from caffe2 * Mention location of source listing originally from LLVM project
-
- 17 Jan, 2018 1 commit
-
-
Tristan Webb authored
* Initial GPU_ExternalFunction implementation Other changes: Add GPU runtime to same cmake block as GPU, include CUDA headers if GPU enabled Initial passing (a+b)*c test Properly link cuda libraries Simple GPUTensorView implementation Initial GPU emitter GPU codegen initial function gen, no kernels yet Rename GPU emitter and tensor_view_wrapper to match naming convention * GPU external function based on BASE * Fix stray base -> gpu * TensorViewWrapper -> GPU_TensorViewWrapper * Copy over emitter from base transformer * Fix for naming dense layout * Copy kernel emitters from base -> gpu and strip out kernel_utils * Add aliases to GPU_TensorViewWrappers * More fixes for naming descriptor::TensorViews * Move in call_frame implementation from base -> gpu * apply code format * GPU codegen running A+B*C gpu emitters gpu ctx setup cuda_module kernels Remove GPU_CF perf counters Use gpu kernels in external function Add GPU 1d dot test Review Changes: * Remove CPU specific kernel emitting method bodies * Use copy_data from test/util.cpp, uncomment compileTest * Use test_utils copy_data function * Grab function name from pass manager for def, clean up indentation
-
- 05 Jan, 2018 1 commit
-
-
Tristan Webb authored
* Simple boilerplate for GPU runtime files - GPUBackend - GPU ExternalFunction - GPUManager - GPUCallFrame * Test for construction all GPU runtime classes * Comment out calls, constructors haven't been defined * Clang CUDA source example to later test compiling Clang cuda example from: https://gist.github.com/anonymous/855e277884eb6b388cd2f00d956c2fd4 * Initial nvptx compiler copied from CPU compiler sources * Define FunctionMap and Instruction for gpu external function * Rename Compiler -> NVPTXCompiler for gpu compile. Add call to compile for test * Rename StaticCompiler -> NVPTXStaticCompiler for GPU code gen * CAdd nvptx_compiler and nvptx_execution_engine to gpu sources * Compiling source unit test using hardcoded PTX * (a+b)*c test for GPU * WIP Fix compile * rmed accidentally included file * Fix compile, and LLVM link errosr from nvptx_compiler.cpp * Stub out parts needed for GPU manager * Test GPU runtime method stubs * Cleanup * Add GPU runtime to same cmake block as GPU, include CUDA headers if GPU enabled * Kill reflexive assertion * change GPU naming convention to match CPU * Snake case functions and identifiers in test case * Change element type to match changes in master * Make CUDA headers accessible for codegen with GPU transformer * clang-format * apply-code-format
-
- 21 Nov, 2017 4 commits
-
-
Tristan Webb authored
Clang cuda example from: https://gist.github.com/anonymous/855e277884eb6b388cd2f00d956c2fd4
-
Tristan Webb authored
-
Tristan Webb authored
-
Tristan Webb authored
-