I am evaluating the performance of Intel MKL on Xeon Gold 6130 processors, which have two AVX512 FMA units. I see performance improvement with AVX512 for matrix multiplication and FFT. However, for matrix inversion, the performance of AVX512 is worse than AVX2. I tested complex float (CGESDD) and float (SGESDD).
My question is: what is the reason that cause the slowdown of AVX512 for CGESDD/SGESDD? Is it because these functions are not optimized for AVX512 or something I did wrong?
Below is the output when MKL_VERBOSE is enabled
MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.10GHz lp64 sequential
I set MKL_ENABLE_INSTRUCTIONS to be AVX2 or AVX512 to compare their performance and set the library to be sequential.
-----------------------------------------------------------------
For SGESDD/CGESDD, AVX2 outperforms AVX512 in most cases
64x64 matrix:
- SGESDD: AVX2: 536.91us AVX512: 703.39us
- CGESDD: AVX2: 766.52us AVX512: 861.09us
1000x1000 matrix:
- SGESDD: AVX2: 305.60ms AVX512: 360.65ms
- CGESDD: AVX2: 744.38ms AVX512: 696.96ms (AVX512 is slightly better)
-----------------------------------------------------------------
For SGEMM/CGEMM, AVX512 outperforms AVX2
64x64 matrix:
SGEMM: AVX2: 8.58us AVX512: 7.08us
CGEMM: AVX2: 43.55us AVX512: 23.06us
1000x1000 matrix:
SGEMM: AVX2: 27.98ms AVX512: 18.40ms
CGEMM: AVX2: 109.17ms AVX512: 69.49ms
-----------------------------------------------------------------