Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

AVX512 is slower than AVX2 when running CGESDD/SGESDD on Xeon Gold 6130

$
0
0

I am evaluating the performance of Intel MKL on Xeon Gold 6130 processors, which have two AVX512 FMA units. I see performance improvement with AVX512 for matrix multiplication and FFT. However, for matrix inversion, the performance of AVX512 is worse than AVX2. I tested complex float (CGESDD) and float (SGESDD). 

My question is: what is the reason that cause the slowdown of AVX512 for CGESDD/SGESDD? Is it because these functions are not optimized for AVX512 or something I did wrong?

Below is the output when MKL_VERBOSE is enabled

MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.10GHz lp64 sequential

I set MKL_ENABLE_INSTRUCTIONS to be AVX2 or AVX512 to compare their performance and set the library to be sequential.

-----------------------------------------------------------------

For SGESDD/CGESDD, AVX2 outperforms AVX512 in most cases

64x64 matrix:

  • SGESDD: AVX2: 536.91us AVX512: 703.39us       
  • CGESDD: AVX2: 766.52us AVX512: 861.09us

1000x1000 matrix:

  • SGESDD: AVX2: 305.60ms AVX512: 360.65ms  
  • CGESDD: AVX2: 744.38ms AVX512: 696.96ms (AVX512 is slightly better)

-----------------------------------------------------------------

For SGEMM/CGEMM, AVX512 outperforms AVX2

64x64 matrix:

  • SGEMM: AVX2: 8.58us AVX512: 7.08us

  • CGEMM: AVX2: 43.55us AVX512: 23.06us

1000x1000 matrix:

  • SGEMM: AVX2: 27.98ms AVX512: 18.40ms

  • CGEMM: AVX2: 109.17ms AVX512: 69.49ms

-----------------------------------------------------------------

 

 

TCE Open Date: 

Friday, February 21, 2020 - 14:39

Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>