Hello,
Running my first Pardiso (cluster) program, and benchmark Mumps. I was expecting that pardiso would be faster or at least close enough but the result is not very encouraging.
My environment
- Ubuntu 14.04, Intel i3-3240 @ 3.4Ghz, 1 CPU (2 core), 4GB RAM
- Latest MUMPS 5.0.2, latest MKL/Pardiso 2017.0.098
- GCC 4.8.4
- MPICH2
- both programs are written in C
My data
- double precision, complex 9612*9612, total non zero 206442, symmetric
Test result
I set OMP_NUM_THREDS=2, MKL_NUM_THREADS=2, and run the program with 1 MPI process
mpirun -np 1 Program
- MUMPS took 25 sec to complete the 9612 columns
- Pardiso took around 33 sec to complete 3000 columns only (nrhs = 3000), 53 sec to complete 4806 columns (nrhs=4806). so it will be likely more than 100 sec to complete the whole matrix. That's about 4 times of what MUMPS needs.
I am not sure what slows down pardiso....I notice that the direct solver took 12 sec, but additional calculation took 24 sec. Not sure what it is, and if this can be improved ?
Here is the message output, appreciate if you know anything that I can tune (the timing above was for a run that without message output):
-------------------------------------------------------------------------------------------------------------
=== PARDISO: solving a complex symmetric system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000960 s
Time spent in reordering of the initial matrix (reorder) : 0.000016 s
Time spent in symbolic factorization (symbfct) : 0.006562 s
Time spent in data preparations for factorization (parlist) : 0.000174 s
Time spent in allocation of internal data structures (malloc) : 0.043606 s
Time spent in additional calculations : 0.009487 s
Total time spent : 0.060805 s
Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP
< Linear system Ax = b >
number of equations: 9612
number of non-zeros in A: 206442
number of non-zeros in A (%): 0.223445
number of right-hand sides: 3000
< Factors L and U >
number of columns for each panel: 80
number of independent subgraphs: 0
number of supernodes: 929
size of largest supernode: 570
number of non-zeros in L: 1642020
number of non-zeros in U: 1
number of non-zeros in L+U: 1642021
Reordering/Analysis is completed, the number of iterative steps in solve : 0, peak memory for factorization : 9329 (KB), permanent memory for factorization : 8758 (KB), memory for factorization and solve : 30858 (KB), time used 0...
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 37 % 38 % 39 % 42 % 43 % 44 % 45 % 46 % 50 % 51 % 52 % 56 % 58 % 59 % 60 % 64 % 68 % 75 % 76 % 84 % 87 % 97 % 99 % 100 %
100 %
=== PARDISO: solving a complex symmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 0.101974 s
Time spent in allocation of internal data structures (malloc) : 0.000089 s
Time spent in additional calculations : 0.000001 s
Total time spent : 0.102064 s
Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP
< Linear system Ax = b >
number of equations: 9612
number of non-zeros in A: 206442
number of non-zeros in A (%): 0.223445
number of right-hand sides: 3000
< Factors L and U >
number of columns for each panel: 80
number of independent subgraphs: 0
number of supernodes: 929
size of largest supernode: 570
number of non-zeros in L: 1642020
number of non-zeros in U: 1
number of non-zeros in L+U: 1642021
gflop for the numerical factorization: 1.889010
gflop/s for the numerical factorization: 18.524426
Factorization completed ... time used 0, start solve for 3000 columns
=== PARDISO: solving a complex symmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 12.161831 s
Time spent in additional calculations : 24.014941 s
Total time spent : 36.176772 s
Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP
< Linear system Ax = b >
number of equations: 9612
number of non-zeros in A: 206442
number of non-zeros in A (%): 0.223445
number of right-hand sides: 3000
< Factors L and U >
number of columns for each panel: 80
number of independent subgraphs: 0
number of supernodes: 929
size of largest supernode: 570
number of non-zeros in L: 1642020
number of non-zeros in U: 1
number of non-zeros in L+U: 1642021
gflop for the numerical factorization: 1.889010
gflop/s for the numerical factorization: 18.524426
-------------------------------------------------------------------------------------------------------------
thanks
canal