• Chris Sullivan's avatar
    nvgpu backend without clang (#2115) · 757621be
    Chris Sullivan authored
    * Separate out external function base class.
    
    * pt1 first step to removing m_writer from GPU_Emitter.
    
    * pt2 add gpu_internal function skeleton
    
    * pt3 temporarily add to gpu_backend for prototyping.
    
    * pt4 add call frame (partial) and runtime constructor
    
    * pt 5 implement resolution for function memory reservations. build new tensor wrapper for use with call frame.
    
    * pt 6 resolve compilation errors.
    
    * pt 7 Add host emitter for emitting host primtives and implement in gpu emitter.
    
    * pt 8 add compile time manifest.
    
    * pt 9 add simple runtime tracer.
    
    * pt 10 seperate runtimes for different functions. index by function name, should switch to using function instance_id for look up performance.
    
    * pt 11 add function call interface and support nested call frames
    
    * pt 12 Reshape elimination check in emitter needs to include offset.
    
    * pt 13 Add default indentation to all op emissions in gpu external functions.
    
    * pt 14 fix constant mem reservation (should not depend on the tmeporary buffers existence check.
    
    * pt 15 backward pooling for avg pool requires only one param. rather than passing this param
    three times, this commit changes the runtime to detect if its avgpooling and pass the appropriate pointers.
    This is a hold over until max and avgpool are refactored into separate cudnn emitters.
    
    * pt 16 update cmake compatibility. gpu backend can now be built without clang via NGRAPH_DEX_ONLY.
    if this cmake variable is not define, then both clang codegen (via gpu external function) and interpreter (via gpu internal function) modes will be built.
    for now codegen is the default backend but can be explicitly disabled by setting the env. variable to NGRAPH_CODEGEN=0/FALSE/NO/etc.
    
    additional note: made codegen::CodeWriter header-only so that it can be used independently of whether the clang codegen library is compiled.
    
    * pt 17 fix issues with merge from master
    
    * pt 18 factor compile function into a few virtual calls so that common passes can be added in a single location for both backends.
    
    * pt 19 formatting
    
    * Remove code_writer.cpp from cmake and disable (temporarily) some reduce tests that require changes to gpu_emitter.cpp
    
    * Move call frame and runtime constructor implementations to source files.
    
    * Use member m_common_function_string.
    
    * Applying analogous bug fix as found in #2145
    
    * Remove underscore from GPU_CompiledFunction, GPU_ExternalFunction, and GPU_InternalFunction.
    
    * Made static members of GPUCompiledFunction static methods.
    
    * Remove 'No' codegen options, use std::toupper and applied format
    
    * review comments
    
    * Remove vector overload for resolve inputs/outputs in GPUCallFrame.
    
    * Remove diagnostic pragmas
    757621be
CMakeLists.txt 12.9 KB