refactored transpose_gpu, made it non template function.
Attach a file by drag & drop or click to upload