1. 20 Aug, 2019 1 commit
    • Paul E. Murphy's avatar
      core: vectorize dotProd_32s · 33fb253a
      Paul E. Murphy authored
      Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
      x86 this showed about 1.4x improvement.
      
      For PPC, do a full multiply (32x32->64b), convert to DP
      then accumulate. This may be slightly less precise for
      some inputs. But is 1.5x faster than the above which
      is about 1.5x than the FMA above for ~2.5x speedup.
      33fb253a
  2. 25 Jul, 2019 8 commits
  3. 24 Jul, 2019 2 commits
  4. 21 Jul, 2019 2 commits
  5. 20 Jul, 2019 2 commits
  6. 19 Jul, 2019 5 commits
  7. 18 Jul, 2019 5 commits
  8. 17 Jul, 2019 1 commit
  9. 16 Jul, 2019 5 commits
  10. 15 Jul, 2019 1 commit
  11. 12 Jul, 2019 5 commits
  12. 11 Jul, 2019 2 commits
  13. 09 Jul, 2019 1 commit