Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

OpenMP MKL DGEMM Performance Issue

$
0
0

Hello,

I am doing development on a 24-core machine (E5-2697-v2).  When I launch a single DGEMM where the matrices are large (m=n=k=15,000), the performance improves as I increase the number of threads used, which is expected.  For reference, I get about 467 GFLOPs/sec using 24 cores.

Next, in an OpenMP parallel region, I have each thread launch an independent call to DGEMM where the matrices are large (m=n=k=15,000).  Each thread has its own matrices which are used in its DGEMM.  In this case, the overall performance improves as I increase the number of threads, up to a point.  With higher numbers of threads, the overall performance decreases.  What hardware limitation could be causing this?  For reference, here are the performance results I got:

#threads         	Compute Speed Overall (GFLOP/sec)
1	                26.3
2	                52.6741
3	                76.6518
4	                102.413
5	                124.401
6	                148.394
7	                168.022
8	                190.557
9	                210.165
10	               232.156
11	               249.77
12	               271.149
13	               291.211
14	               313.747
15	               327.467
16	               349.917
17	               361.444
18	               377.498
19	               346.558
20	               368.453
21	               356.597
22	               319.446
23	               301.81
24	               277.273

 


Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>