-
Tomoaki Teshima authored
* use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation * define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed * brush up implementation on old compiler (guard correctly) * add test for v_load_f16 and round trip conversion of v_float16x4 * fix conversion error
903789f7