Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

PARDISO - different time in factorization step

$
0
0

Hi all

We use the PARDISO solver (Parallel Studio 2018) in our FEM code for a symmetric indefinite system and discovered that the time spent in the factorization step may vary significantly. We use the following solver parameters, all other values are set to zero:

    iparm [0] =  1;
    iparm [1] =  2;
    iparm [2] =  4;
    iparm [9] =  8;

The solver output of a slow and fast example can be found below. The total time differs almost by a factor of 2 despite similar size of the equation system. This makes PARDISO clearly less attractive than our iterative GPU solver in certain cases. Is this a performance issue/bug? Do we use bad settings? Or is it just normal behavior? Any help is highly appreciated.

Thank you very much and best regards

David

 

Slow example:

Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.703717 s
Time spent in reordering of the initial matrix (reorder)         : 11.242095 s
Time spent in symbolic factorization (symbfct)                   : 3.675149 s
Time spent in data preparations for factorization (parlist)      : 0.107626 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 133.596461 s
Time spent in direct solver at solve step (solve)                : 1.954850 s
Time spent in allocation of internal data structures (malloc)    : 0.250255 s
Time spent in additional calculations                            : 4.518999 s
Total time spent                                                 : 156.049152 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

< Linear system Ax = b >
             number of equations:           2956396
             number of non-zeros in A:      79437854
             number of non-zeros in A (%): 0.000909

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    338571
             size of largest supernode:               17752
             number of non-zeros in L:                3101919901
             number of non-zeros in U:                1
             number of non-zeros in L+U:              3101919902
             gflop   for the numerical factorization: 20373.427694

             gflop/s for the numerical factorization: 152.499756

 

Fast example:

Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.462895 s
Time spent in reordering of the initial matrix (reorder)         : 10.686642 s
Time spent in symbolic factorization (symbfct)                   : 3.656114 s
Time spent in data preparations for factorization (parlist)      : 0.108843 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 64.537849 s
Time spent in direct solver at solve step (solve)                : 1.397830 s
Time spent in allocation of internal data structures (malloc)    : 0.240506 s
Time spent in additional calculations                            : 4.692960 s
Total time spent                                                 : 85.783639 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

< Linear system Ax = b >
             number of equations:           2961080
             number of non-zeros in A:      83916044
             number of non-zeros in A (%): 0.000957

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    308944
             size of largest supernode:               6004
             number of non-zeros in L:                2958633165
             number of non-zeros in U:                1
             number of non-zeros in L+U:              2958633166
             gflop   for the numerical factorization: 8909.331438

             gflop/s for the numerical factorization: 138.048162

 

 

 


Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>