Hi,
I have 64 threads running on a Intel Xeon Phi 7230. Each thread can run the following MKL rountine:
@constraint (ComputingUnits="${ComputingUnits}") @task(returns=list) def createBlock(BSIZE, MKLProc, diag): import os os.environ["KMP_AFFINITY"]="verbose" os.environ["MKL_NUM_THREADS"]=str(MKLProc) block = np.array(np.random.random((BSIZE, BSIZE)), dtype=np.double,copy=False) mb = np.matrix(block, dtype=np.double, copy=False) mb = mb + np.transpose(mb) if diag: mb = mb + 2*BSIZE*np.eye(BSIZE) return mb
MKL_NUM_THREADS is set to 64 in order to take advantage of all the cores. When executing the routine number 32, I obtain the following error:
OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.
I've found here https://software.intel.com/en-us/forums/intel-open-source-openmp-runtime... that threads are not destroyed so I can be reaching the thread limit in the machine. The thing is that, at each time, only one thread is running so only 64 OpenMP threads are awaken. My problem is that I'm running this code in a shared cluster so I should not recompile the library with my custom setting if possible. Is there a way to avoid this problem without decrasing the amount of threads running on the machine? I think that just having a fewer amount of threads i could avoid this problem but this is a part of a bigger program and I am really interested in keeping the 64 threads.
Regards,
Ramon