I want to diagonalize a large matrix, which size is about 40000*40000.
Our supercomputer has 80 nodes and there are two cpus in each node with eight-core.
I think it is very hard to diagonalize such a large matrix just using multithread optimal lapack program in MKL, so I plan to employ the scalapack program.
I understand that the scalapack in MKL can make use both the multithread and multiprocess power to speed up diagonalization, is it correct?
Would you please give me some advice about how many nodes and how many cores in each node I should use?
What is the appropriate block size Mb and Nb for the problem?