-
Sayed Adel authored
- redefine 16-bit multiply operator to perform saturating multiply instead of non-saturating multiply - implement 8-bit multiply operator to perform saturating multiply - implement v_mul_wrap() for 8-bit, 16-bit non-saturating multiply - improve performance of v_mul_hi() for VSX - update intrin tests with new changes - replace unv 16-bit multiplication operator with v_mul_wrap due behavior changes - Several improvements depend on vpisarev review * initial forward declarations for universal intrinsics * move emulating SSE intrinsics into separate file * implement v_mul_expand for 8-bit * reimplement saturating multiply using v_mul_expand + v_pack * map v_expand, v_load_expand, v_load_expand_q to sse4.1 * fix overflow avx2::v_pack(uint32) * implement two universal intrinsics v_expand_low and v_expand_high
5771fd69
Name |
Last commit
|
Last update |
---|---|---|
.github | ||
3rdparty | ||
apps | ||
cmake | ||
data | ||
doc | ||
include | ||
modules | ||
platforms | ||
samples | ||
.gitattributes | ||
.gitignore | ||
CMakeLists.txt | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md |