Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

How does l_mklb work when running across a cluster?

$
0
0

Hi all,

Currently attempting to run l_mklb across a 110x node cluster, but I seem to be missing the understanding of the best syntax to run with.

Relevant items:

20 Ps, 22 Qs, NB=192, 1237056 Ns...

Inside the runme_intel64_static I set:

export MPI_PROC_NUM=440

 

export MPI_PER_NODE=4

 

#mpirun -perhost ${MPI_PER_NODE} -np ${MPI_PROC_NUM} ./runme_intel64_prv "$@" | tee -a $OUT <-- This was the original command

mpirun -np ${MPI_PROC_NUM} -machinefile hostlist /mnt/shared/benchmarks/runme_intel64_prv "$@" | tee -a $OUT

Right now on a 110 node cluster with 128GB RAM per node on Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz nodes... I'm seeing starting numbers of around 150TFlops.

I would expect to see more... So I guess my question is:

What are the best settings for runme_intel64_static?

On a normal HPL run i'd set the number of processes to the actual number of cores in the system but if I do that using runme_intel64_static, I totally oversubscribe the nodes and the performance goes through the floor.

If someone can explain what each variable does inside the script so I can work out how to saturate the cluster efficiently, that would be great.

 

 

Thread Topic: 

Help Me

Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>