Hi all,
Currently attempting to run l_mklb across a 110x node cluster, but I seem to be missing the understanding of the best syntax to run with.
Relevant items:
20 Ps, 22 Qs, NB=192, 1237056 Ns...
Inside the runme_intel64_static I set:
export MPI_PROC_NUM=440
export MPI_PER_NODE=4
#mpirun -perhost ${MPI_PER_NODE} -np ${MPI_PROC_NUM} ./runme_intel64_prv "$@" | tee -a $OUT <-- This was the original command
mpirun -np ${MPI_PROC_NUM} -machinefile hostlist /mnt/shared/benchmarks/runme_intel64_prv "$@" | tee -a $OUT
Right now on a 110 node cluster with 128GB RAM per node on Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz nodes... I'm seeing starting numbers of around 150TFlops.
I would expect to see more... So I guess my question is:
What are the best settings for runme_intel64_static?
On a normal HPL run i'd set the number of processes to the actual number of cores in the system but if I do that using runme_intel64_static, I totally oversubscribe the nodes and the performance goes through the floor.
If someone can explain what each variable does inside the script so I can work out how to saturate the cluster efficiently, that would be great.