Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

Performance characteristics of cblas_gemm_s16s16s32

$
0
0

Hi,

I'm interested to get more details on the performance characteristics of the function cblas_gemm_s16s16s32. In my application, the performance gain over cblas_sgemm is lower than I would hope.

Here is my test configuration, which is larger than what would typically be used in my application (a seq2seq model):

CblasColMajor

M = 1024
K = 512
N = 2048

TRANS_A = FALSE
TRANS_B = TRUE

And here are some single threaded results on a Intel(R) Core(TM) i7-6700K (AVX2), averaged over 1000 samples:

* cblas_sgemm: 17.7135 ms
* cblas_gemm_s16s16s32: 15.5617 ms

Are these values expected? Do I need to do something specific to get more performance out of cblas_gemm_s16s16s32?

Thanks,

Guillaume


Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>