* add feature of Fp16 on GPU (cudev) * add test * leave template function as unimplemented to raise error