* move add,mult,min,max,sqrt to elementwise_op, increase op per threads
Attach a file by drag & drop or click to upload