-
Sayed Adel authored
- improve cpu dispatching calls to allow more SIMD extentions (SSE4.1, AVX2, VSX) - wide universal intrinsics - replace dummy v_expand with v_expand_low - replace v_expand + v_mul_wrap with v_mul_expand for product accumulate operations - use FMA for accumulate operations - add mask and more types to accumulate's performance tests
8965f3ae