More efficient sum for some cases (#1251)
* hacking to support dot of 3 by 2 inputs with gemm_batch. * clean up. * testing inplace reshape. * fixed a compile error. * added comments on todo. * check for output. * check for annotation. * more optimizations WIP. * sum simd. * moved parallel for * testing sum vectorization. * fixed merge errors. * sum wip. * more logic. * sum refactor and clean up. * clean up. * removed unrelated changes. * removed related changes from merge. * fixed clang compile errors.
Showing
This diff is collapsed.
Please
register
or
sign in
to comment