Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

Nested parallelisation problem OMP + MKL

$
0
0

I am attempting to parallelise calls to mkl within a parallel omp region to test whether or not the code executes faster. Simply parallelising part of the code does not yield linear increase in performance, hence a mixed approach makes sense. An outline of the code is as follows:

#pragma omp parallel for
for (int i = 0; i < N; i+=2) {
     some_function(i);
}

where some_function will make calls to zgesvd. For starters I would like the omp region to run on 2 threads and the calls to zgesvd inside to also run on 2 threads (for a total of 4 active threads). To achieve this I make the following calls in the begining of the program 

omp_set_num_threads(2);
mkl_set_num_threads(2);
mkl_set_dynamic(false);
omp_set_nested(true);
omp_set_max_active_levels(2);

I have also tried setting omp threads to 4 and then adding threads(2) to the pragma with no success. Currently, the program creates >>3<< (??) threads on both Windows and Linux using the latest MKL & Intel compilers. Changing the value of omp_set_max_active_levels to 3 produces 4 threads on Windows and 3 threads on Linux. However, I don't exactly know what these threads are doing, I can just see their number. 

Best regards

P.S. I noticed that by default the MKL will only try to use 4 threads on a quad-core CPU with hyperthreading enabled but according to top (which should be reliable? I don't really know.) the 4 threads are not always run 1/core (though that might be up to the OS), so why the limit? 


Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>