I have two matrices with the same sparsity structure but different values that take significantly different amounts of time (7s vs 37s) during the reordering phase. The diagnostics report that all of the time differences is inside of "Time spent in allocation of internal data structures (malloc)". I find the time difference surprising, and I find it surprising that it would be in 'malloc'.
Is this all expected? Is there something I can do to increase performance here?
If I turn off weighted matching, the difference goes away and reordering is faster in both cases. I would simply turn weighted matching off, but I have found it's necessary to get accurate solutions on many of the other problems our users construct.
The matrices should be quite similar. They are nearby timesteps of a dynamic non-linear FEM simulation of an elastic cloth hanging under gravity.
If I run the same physical scenario on a low resolution cloth (20'000 triangles instead of 180'000 triangles), the slowdown doesn't appear.
My iparm settings are all zero, except for these ones:
iparm[0] = 1; // use my settings
iparm[1] = 2; // METIS
iparm[9] = 8; // pivot perturbation of 1.0E-8
iparm[10] = 1; // scaling
iparm[12] = 1; // weighted matching
iparm[17] = -1; // enable reporting
iparm[20] = 1; // enable reporting
Reording phase for the "fast" matrix:
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.051754 s
Time spent in reordering of the initial matrix (reorder) : 0.002212 s
Time spent in symbolic factorization (symbfct) : 0.527827 s
Time spent in data preparations for factorization (parlist) : 0.034113 s
Time spent in allocation of internal data structures (malloc) : 6.139631 s
Time spent in additional calculations : 0.451844 s
Total time spent : 7.207380 s
Reording phase for the "slow" matrix:
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.051663 s
Time spent in reordering of the initial matrix (reorder) : 0.002180 s
Time spent in symbolic factorization (symbfct) : 0.529333 s
Time spent in data preparations for factorization (parlist) : 0.033952 s
Time spent in allocation of internal data structures (malloc) : 35.833652 s
Time spent in additional calculations : 0.451910 s
Total time spent : 36.902690 s
Matrix statistics after reordering phase for both matrices are exactly the same. The factorization and solution phases have almost the exact same runtimes for both matrices.
Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP< Linear system Ax = b >
number of equations: 1351803
number of non-zeros in A: 9191733
number of non-zeros in A (%): 0.000503
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0< Preprocessing with state of the art partitioning metis>
number of supernodes: 1115919
size of largest supernode: 2555
number of non-zeros in L: 93923643
number of non-zeros in U: 1
number of non-zeros in L+U: 93923644Summary: ( factorization phase )
================
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 1.511711 s
Time spent in allocation of internal data structures (malloc) : 0.000519 s
Time spent in additional calculations : 0.000004 s
Total time spent : 1.512234 s
Summary: ( solution phase )
================
Time spent in direct solver at solve step (solve) : 0.229972 s
Time spent in additional calculations : 0.499647 s
Total time spent : 0.729619 s
I'm running Windows 10, compiling and linking with MSVC using toolset Visual Studio 2015 (v140). I'm not sure how to double-check the MKL version I'm using. MKL is in a folder named "compilers_and_libraries_2016.4.246". I have "Intel Parallel Studio 2016 Update 4 Profession Edition for Windows" installed. I have "Intel VTune Amplifier 2017 for Windows Update 4" installed. The only download that Intel Software Manager is recommending to me is an update for parallel studio 2015, which I ignore because I think it's old.
My compile line looks like:
/Yu"stdafx.h" /MP /GS /W3 /Gy /Zc:wchar_t /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\include" /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\include" /Zi /Gm- /O2 /sdl /Fd"x64\Develop\vc140.pdb" /Zc:inline /fp:precise ... more defines ... /D "NDEBUG" /D "_CONSOLE" /D "WIN32_LEAN_AND_MEAN" /D "NOMINMAX" /D "_USE_MATH_DEFINES" /D "_SCL_SECURE_NO_WARNINGS" /D "_CRT_SECURE_NO_WARNINGS" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /openmp- /Fa"x64\Develop\" /EHsc /nologo /Fo"x64\Develop\" /Fp"x64\Develop\solveLinearSystem.pch"
My link line looks like:
/OUT:"x64\Develop\solveLinearSystem.exe" /MANIFEST /NXCOMPAT /PDB:"x64\Develop\solveLinearSystem.pdb" /DYNAMICBASE ... some libs ... "tbb.lib" ... some libs ... "mkl_core.lib""mkl_tbb_thread.lib""mkl_intel_lp64.lib" ... some libs ... "kernel32.lib""user32.lib""gdi32.lib""winspool.lib""comdlg32.lib""advapi32.lib""shell32.lib""ole32.lib""oleaut32.lib""uuid.lib""odbc32.lib""odbccp32.lib" /DEBUG /MACHINE:X64 /OPT:NOREF /INCREMENTAL /PGD:"x64\Develop\solveLinearSystem.pgd" /SUBSYSTEM:CONSOLE /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /ManifestFile:"x64\Develop\solveLinearSystem.exe.intermediate.manifest" /OPT:NOICF /ERRORREPORT:PROMPT /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\lib\intel64\vc14" /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\lib\intel64_win" /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\compiler\lib\intel64_win" /TLBID:1
I'm happy to share the matrices if that's helpful. I have them in plain text, zipped to ~60MB each.