Hi all
We use the PARDISO solver (Parallel Studio 2018) in our FEM code for a symmetric indefinite system and discovered that the time spent in the factorization step may vary significantly. We use the following solver parameters, all other values are set to zero:
iparm [0] = 1;
iparm [1] = 2;
iparm [2] = 4;
iparm [9] = 8;
The solver output of a slow and fast example can be found below. The total time differs almost by a factor of 2 despite similar size of the equation system. This makes PARDISO clearly less attractive than our iterative GPU solver in certain cases. Is this a performance issue/bug? Do we use bad settings? Or is it just normal behavior? Any help is highly appreciated.
Thank you very much and best regards
David
Slow example:
Summary: ( starting phase is reordering, ending phase is solution ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 0.703717 s Time spent in reordering of the initial matrix (reorder) : 11.242095 s Time spent in symbolic factorization (symbfct) : 3.675149 s Time spent in data preparations for factorization (parlist) : 0.107626 s Time spent in copying matrix to internal data structure (A to LU): 0.000000 s Time spent in factorization step (numfct) : 133.596461 s Time spent in direct solver at solve step (solve) : 1.954850 s Time spent in allocation of internal data structures (malloc) : 0.250255 s Time spent in additional calculations : 4.518999 s Total time spent : 156.049152 s Statistics: =========== Parallel Direct Factorization is running on 6 OpenMP < Linear system Ax = b > number of equations: 2956396 number of non-zeros in A: 79437854 number of non-zeros in A (%): 0.000909 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 96 number of independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> number of supernodes: 338571 size of largest supernode: 17752 number of non-zeros in L: 3101919901 number of non-zeros in U: 1 number of non-zeros in L+U: 3101919902 gflop for the numerical factorization: 20373.427694 gflop/s for the numerical factorization: 152.499756
Fast example:
Summary: ( starting phase is reordering, ending phase is solution ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 0.462895 s Time spent in reordering of the initial matrix (reorder) : 10.686642 s Time spent in symbolic factorization (symbfct) : 3.656114 s Time spent in data preparations for factorization (parlist) : 0.108843 s Time spent in copying matrix to internal data structure (A to LU): 0.000000 s Time spent in factorization step (numfct) : 64.537849 s Time spent in direct solver at solve step (solve) : 1.397830 s Time spent in allocation of internal data structures (malloc) : 0.240506 s Time spent in additional calculations : 4.692960 s Total time spent : 85.783639 s Statistics: =========== Parallel Direct Factorization is running on 6 OpenMP < Linear system Ax = b > number of equations: 2961080 number of non-zeros in A: 83916044 number of non-zeros in A (%): 0.000957 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 96 number of independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> number of supernodes: 308944 size of largest supernode: 6004 number of non-zeros in L: 2958633165 number of non-zeros in U: 1 number of non-zeros in L+U: 2958633166 gflop for the numerical factorization: 8909.331438 gflop/s for the numerical factorization: 138.048162