Hello,
I'm working with MKL 11.2.0.090 on Gentoo. I have an "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz" processor.
I'm trying to speed my inplace matrix transpositions and for that I thought that mkl_?imatcopy would be the solution. I have a very speedup on square matrix, but on rectangular matrix it is much worse than my naive "follow the cycles" implementation.
Here is the call:
mkl_dimatcopy('R', 'T', rows, cols, 1.0, matrix_ptr, rows, cols);
When I profiled the executable, most of the cycles were spent in
libmkl_avx.so [.] mkl_trans_avx_mkl_dimatcopy_mipt_t
Am I doing something wrong or is simply the algorithm not good on rectangular matrix (I'd be surprised) ? Should I simply make an O(MN) space algorithm ?
Thanks