Hey Guys
When I use mkl_?csrcsc to transpose my sparse matrix, I think the multithreading is supported internally: just like multithreading BLAS function cblas_dgemm(), call omp_set_num_threads() to set the number of threads before cblas_dgemm(). Unfortunately, no matter how many threads I set in omp_set_num_threads(), the performance of mkl_?csrcsc looks consistent. I'm wondering how I can enable the multithreading for mkl_?csrcsc() functions?
I use Intel compiler 13.0.1 on CentOS release 6.3, and my CPU is Intel(R) Xeon(R) CPU E5-2670.
Any suggestions are welcome.
- Hao