Hi,can pardiso save intermediate results in hard disk?For example ,after phase=22,I save the intermediate results in the hard disk.Few days later ,I call pardiso to finish the calculation of phase=33. Thanks!

↧

Memory Leak

May 15, 2013, 8:36 am

Latest and popular articles on Intel Technologies

≫ Next: non-linear optimization: jacobi_solve question

≪ Previous: pardiso intermediate result output

Hi,

I have an issue with dgemm() and other mkl functions not returning the memory that is used internal to the function.How can I release the memory used up. If this example is ran long enough the application will crash.

Environment:

/* MicroSoft Visual C++ 2010        Microsoft Windows 7     Intel Core i7-2600 CPU 3.40GHz      */
/*   Intel(R) C++ Composer XE 2011 Update 9, with Intel(R) C++ Compiler XE 12.1                   */
/*   and Version MKL 10.3 Update 10

Sample Code;

int main(int argc, char* argv[])
{

double *a, *b, *c;
int n, i;
double alpha, beta;
MKL_INT64 AllocatedBytes;
int N_AllocatedBuffers;

alpha = 1.1; beta = -1.2;
n = 1000;

for(int y=0; y<15; y++)
{
   a = (double*)mkl_malloc(n*n*sizeof(double),64);
   b = (double*)mkl_malloc(n*n*sizeof(double),64);
   c = (double*)mkl_malloc(n*n*sizeof(double),64);

      for (i=0;i<(n*n);i++)
      {
       a[i] = (double)(i+1);
       b[i] = (double)(-i-1);
      }

      dgemm("N","N",&n,&n,&n,&alpha,a,&n,b,&n,&beta,c,&n);
       mkl_free_buffers();

      AllocatedBytes = mkl_mem_stat(&N_AllocatedBuffers);
      printf("\nDGEMM uses "FORMAT" bytes in %d buffers",AllocatedBytes,N_AllocatedBuffers);

      mkl_free(a);
      mkl_free(b);
      mkl_free(c);
      mkl_free_buffers();

      AllocatedBytes = mkl_mem_stat(&N_AllocatedBuffers);
      if (AllocatedBytes > 0) {
          printf("\nMKL memory leak!");
          printf("\nAfter mkl_free_buffers there are "FORMAT" bytes in %d buffers",
           AllocatedBytes,N_AllocatedBuffers);
      }

}

mkl_free_buffers();
mkl_thread_free_buffers();

CodeOutput:

Version MKL 10.3 Update 10
DGEMM uses 24001112 bytes in 8 buffers
MKL memory leak!
After mkl_free_buffers there are 896 bytes in 5 buffe
DGEMM uses 29278936 bytes in 10 buffers
MKL memory leak!
After mkl_free_buffers there are 5278720 bytes in 7 b
DGEMM uses 29278936 bytes in 11 buffers
MKL memory leak!
After mkl_free_buffers there are 5278720 bytes in 8 b
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10

Thanks,

Vince

↧

non-linear optimization: jacobi_solve question

May 17, 2013, 4:26 am

Latest and popular articles on Intel Technologies

≫ Next: Missing FFTW libraries

≪ Previous: Memory Leak

I have tried to use the same _JACOBIMATRIX_HANDLE_t to make evaluations of the jacobian at successive iteration points of the optimizer,

but the output matrix (fjac) is not getting updated.

I created an example, that isolates the issue and this does not seem to be possible. Am I correct ?

If yes, is there something I could do to "reset" the handle ? It would seem to me that this would be the desirable behavior, so that repeated initializaions of the handle and buffers are avoided.

TIA for your help,

Petros

ps: I attach the file. It is a watered-down version of a much bigger project.

Some "tips" to make reading easier :

function evaluator : a wrapper of a member class that delivers the function call. wraps the class object and the method name.

extended_powell : the usual example in class dress.

NumericalJacobian: the class that wraps the mkl functionality

ublas::unbounded_array: similar to std::vector but with guaranteed contiguous memory layout.

ublas::matrix: the obvious.

Attachment	Size
Download main.cpp	5.46 KB

↧

Missing FFTW libraries

May 19, 2013, 10:44 am

Latest and popular articles on Intel Technologies

≫ Next: Data race in Pardiso Solver?

≪ Previous: non-linear optimization: jacobi_solve question

I'm trying to compile a code written in fortran90/95 and I'm getting this error:

user:~> make
ifort -O3 -lmpi -L/opt/local/intel/fftw/lib -I/opt/local/intel/fftw/include - I/opt/sgi/mpt/mpt-2.03/include -L/opt/sgi/mpt/mpt-2.03/lib -o /test module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o filtre.o parametre.o forcage.o navier.o convdiff.o viv.o slfft3d_shift.o poisson.o
slfft3d_shift.o: In function 'slfft3d_shift_':
slfft3d_shift.f90:(.text+0x587e): undefined reference to 'rfftw3d_f77_mpi_create_plan_'
slfft3d_shift.f90:(.text+0x58a3): undefined reference to 'rfftwnd_f77_mpi_local_sizes_'
slfft3d_shift.f90:(.text+0x58cd): undefined reference to 'rfftwnd_f77_mpi_'
slfft3d_shift.f90:(.text+0x58d9): undefined reference to 'rfftwnd_f77_mpi_destroy_plan_'
slfft3d_shift.f90:(.text+0x91ca): undefined reference to 'rfftw3d_f77_mpi_create_plan_'
slfft3d_shift.f90:(.text+0x91ef): undefined reference to 'rfftwnd_f77_mpi_local_sizes_'
slfft3d_shift.f90:(.text+0x9219): undefined reference to 'rfftwnd_f77_mpi_'
slfft3d_shift.f90:(.text+0x9225): undefined reference to 'rfftwnd_f77_mpi_destroy_plan_'
slfft3d_shift.f90:(.text+0x111e7): undefined reference to 'rfftwnd_f77_one_real_to_complex_'
make: *** [/test] Error 1

Here's the Makefile:

FC = ifort
OPTFC = -O3 -lmpi -L/opt/local/intel/fftw/lib -I/opt/local/intel/fftw/include - I/opt/sgi/mpt/mpt-2.03/include -L/opt/sgi/mpt/mpt-2.03/lib/test : module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o poisson.o filtre.o parametre.o slfft3d_shift.o forcage.o navier.o convdiff.o viv.o

$(FC) $(OPTFC) -o /test module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o filtre.o parametre.o forcage.o navier.o convdiff.o viv.o slfft3d_shift.o poisson.o

module_param.o : module_param.f90
$(FC) $(OPTFC) -c module_param.f90 incompact3d.o : incompact3d.f90
$(FC) $(OPTFC) -c incompact3d.f90 mesure.o : mesure.f90
$(FC) $(OPTFC) -c mesure.f90 spectral.o : spectral.f90
$(FC) $(OPTFC) -c spectral.f90 schemas.o : schemas.f90
$(FC) $(OPTFC) -c schemas.f90 derive.o : derive.f90
$(FC) $(OPTFC) -c derive.f90 tools.o : tools.f90
$(FC) $(OPTFC) -c tools.f90 forcage.o : forcage.f90
$(FC) $(OPTFC) -c forcage.f90 navier.o : navier.f90
$(FC) $(OPTFC) -c navier.f90 filtre.o : filtre.f90
$(FC) $(OPTFC) -c filtre.f90 parametre.o : parametre.f90
$(FC) $(OPTFC) -c parametre.f90 convdiff.o : convdiff.f90
$(FC) $(OPTFC) -c convdiff.f90 poisson.o : poisson.f90
$(FC) $(OPTFC) -c poisson.f90 slfft3d_shift.o : slfft3d_shift.f90
$(FC) $(OPTFC) -c slfft3d_shift.f90 viv.o : viv.f90
$(FC) $(OPTFC) -c viv.f90

When I include the libraries required in the Makefile I get the following message:

user:~/test> make
ifort -O3 -lmpi -I/opt/local/intel/fftw/include -L/opt/local/gnu/fftw - I/opt/sgi/mpt/mpt-2.03/include -L/opt/sgi/mpt/mpt-2.03/lib -I/opt/fftw/2.1.5.1/cnos/include -L/opt/intel/composerxe-2011.0.084/mkl/include/fftw/fftw_f77.i -L/opt/fftw/2.1.5.1/cnos/lib -I/opt/local/intel/fftw -I/opt/fftw/3.1.1/cnos/include -L/opt/fftw/3.1.1/cnos/lib - I/usr/local/packages/nag/p3dfft-single/2.3/include -L/usr/local/packages/nag/p3dfft- single/2.3/lib -o /home/u/guitar88/bin/teste module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o filtre.o \
parametre.o forcage.o navier.o convdiff.o viv.o slfft3d_shift.o poisson.o -lm -L/opt/local/intel/fftw/lib -lsrfftw_mpi \
-lsrfftw -lsfftw_mpi -lsfftw
ld: cannot find -lsrfftw_mpi
make: *** [/test] Error 1

I'm using ifort compiler and I'm on a supercomputer environment working with MPI. Please, any clue of what is going on? Cheers.

↧

Data race in Pardiso Solver?

May 21, 2013, 10:05 am

Latest and popular articles on Intel Technologies

≫ Next: Error in feast eigenvalue solver for sparse matrices

≪ Previous: Missing FFTW libraries

Hi All,

I ran into a tricky problem when test data race for my codes. The Intel Inspector XE 2013 shows that there is data race in the following calling:

282   !C.. Factorization.
283          phase = 22 ! only factorization
284          CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,     &
285           idum, nrhs, iparm, msglvl, ddum, ddum, error)

But it can still solve the problem and the result seems correct. How does this data race come from?

I attached the testing codes and matrix data.

Thanks and regards,

Daniel

Attachment	Size
Download pardiso-data-race-test.zip	43.28 KB

↧

Error in feast eigenvalue solver for sparse matrices

May 21, 2013, 10:11 am

Latest and popular articles on Intel Technologies

≫ Next: Minimum working example for mkl_ddnscsr

≪ Previous: Data race in Pardiso Solver?

Hi,

We are trying to use the MKL FEAST 11.0.3.1 solver for symmetric sparse CSR matrices of doubles. The call we are using is something like:

--------------------------------------------

#include "mkl.h"

......

//declaring & preparing data...

....

feastinit(&feastparam[0]); dfeast_scsrev(&UPLO,&N,sa,ia,ja,feastparam,&epsout,&loop,&Emin,&Emax,&M0,E,X,&M,res,&info);

--------------------------------------------

We have no warnings at compiling time, but at runtime the feast call throws an exception like this:

First exception in 0x000007fee1b8249c in feast.exe: 0xC0000005: Invalid read in 0xffffffffffffffff.

Which could be our mistake? Input data? Bad linkage/compiler version-mkl version combination? We are not getting any compiling warnings.

Intel Composer XE 2011 Update 7 (package 258)

Thanks in advance!

Aurora

↧

Minimum working example for mkl_ddnscsr

May 21, 2013, 11:41 pm

Latest and popular articles on Intel Technologies

≫ Next: Issues with declarations of MKL functions: 'remark #424: extra ";" ignored' messages are displayed

≪ Previous: Error in feast eigenvalue solver for sparse matrices

Hi. Can anyone provide me with a minimum working example for mkl_ddnscsr? I have tried this so far

#include <stdio.h>
#include <stdlib.h>
#include <mkl.h>
int main(int argc, char *argv[])
{
  MKL_INT info;
  MKL_INT m = 3; //Number of rows of A
  MKL_INT n = 4; //Number of columns of A
  MKL_INT nnz = 6; //Number of non zero elements
  MKL_INT job[6] = {0,0,1,2,nnz,1};
  double  *Acsr = (double *)  calloc(nnz, sizeof(double)  );
  MKL_INT *Aj   = (MKL_INT *) calloc(nnz, sizeof(MKL_INT) );
  MKL_INT *Ai   = (MKL_INT *) calloc(m+1, sizeof(MKL_INT) );
  double A[3][4] = {{1.,3.,0.,0.},{0.,0.,4.,0.},{2.,5.,0.,6.}};
  mkl_ddnscsr ( job, &m, &n, A[0], &m, Acsr, Aj, Ai, &info);
  for (int i=0; i< nnz; i++) {
    if (Acsr[i] != 0) {
      printf( "column = %i, A = %fn", Aj[i], Acsr[i] );
    }
  }
  for (int i=0; i< m+1; i++) {
    printf("Ai[%i] = %in", i, Ai[i]);
  }
  return 0;
}

But it returns these results

column = 1, A = 1.000000
column = 2, A = 3.000000
column = 4, A = 4.000000
column = 1, A = 4.000000
column = 3, A = 2.000000
column = 4, A = 5.000000
Ai[0] = 1
Ai[1] = 3
Ai[2] = 4
Ai[3] = 7

If I play with the value for the lda I can almost get the correct result, however I believe this is as the manual suggests. I am on using Ubuntu 12.04 and Composer 2013.3.163 if that makes difference.

Thanks

Chris

↧

Issues with declarations of MKL functions: 'remark #424: extra ";" ignored' messages are displayed

May 22, 2013, 9:44 pm

Latest and popular articles on Intel Technologies

≫ Next: How to solve this symmetric indefinite matrix with pardiso?

≪ Previous: Minimum working example for mkl_ddnscsr

When Intel C++ compiler option /W5 is turned on the compiler shows lots of 'remark #424: extra ";" ignored' messages related declaration of some functions in MKL headers. Here is a small example:

...
..\mkl\include\mkl_solvers_ee.h(51): remark #424: extra ";" ignored
_Mkl_Api(void,feastinit,(MKL_INT* fpm));
^

..\mkl\include\mkl_solvers_ee.h(52): remark #424: extra ";" ignored
_Mkl_Api(void,FEASTINIT,(MKL_INT* fpm));

...

↧

How to solve this symmetric indefinite matrix with pardiso?

May 23, 2013, 3:04 am

Latest and popular articles on Intel Technologies

≫ Next: MKL vs Microsoft exp() function

≪ Previous: Issues with declarations of MKL functions: 'remark #424: extra ";" ignored' messages are displayed

Hello,

Greetings! I would like to know how to effectively solve this(attached file) symmetric indefinite system using PARDISO. This particular matrix has super diagonal dominancy and almost zero and negative offdiagonal terms. I tried with differente pardiso parameters, could not get the solution promised by other softwares. Please suggest and list pardiso parameters/options for solving this matrix. The expected(promised) solution is ~0.7 all.

Thank you

Stoka

Attachment	Size
Download csr-matrx-tab-delimited.txt	43.4 KB

↧

MKL vs Microsoft exp() function

May 24, 2013, 3:18 am

Latest and popular articles on Intel Technologies

≫ Next: efficiently solving least squares problems iteratively

≪ Previous: How to solve this symmetric indefinite matrix with pardiso?

I have a client that is migrating a large C++ software base from 32- to 64-bit code in MS Visual Studio. One of the problems they are having is that the 32- and 64-bit versions of the C library exp() function produce results that differ by 1ulp for some operands, and this is causing regression tests to fail. One potential solution I am considering is to use Intel MKL instead of the Microsoft library. So I have a few questions:

1. Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?

2. Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons? If so, are there any performance penalties if MKL is used in place of the Microsoft library?

3. Is there any way of getting the Microsoft .NET framework to use MKL? I assume it may have the same 32/64-bit differences, although I haven't tested that yet.

4. What other benefits might my client gain by switching to MKL?

Thanks in advance - dc42

↧

efficiently solving least squares problems iteratively

May 24, 2013, 7:07 am

Latest and popular articles on Intel Technologies

≫ Next: Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

≪ Previous: MKL vs Microsoft exp() function

I'm performing an iterative routine, where at each iteration I am solving a least squares problem using ?gels. At each iteration step I update a column of the matrix A, which will ulitmately converge, as will the solution vector b.

My question is this: Because at each iteration the solution vector b is changing only very little, is it possible to solve this more efficiently than repeatedly calling ?gels (perhaps by calling some of the routines that ?gels itself calls), since I know that I am often very close to the solution?

Thank you,

Tracy

↧

Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

May 24, 2013, 2:28 pm

Latest and popular articles on Intel Technologies

≫ Next: Pardiso result problem

≪ Previous: efficiently solving least squares problems iteratively

One of our customers has encountered a problem with our library (NumPy) when linked against MKL version 11.0.3 on the Windows platform. The program dies during an eigenvalue decomposition. The problem does not show up on a build of NumPy against an older version of the MKL.

Attached are some screenshots which provide information about:

1) Where the crash occurs (mkl_avx)

2) Which instruction it seems not to like (vandpd)

3) call stack

4&5) Configuration information (Windows Version and Hardware).

The software is Anaconda 1.5 available here: www.continuum.io but the same problem was replicated with other versions of NumPy downloaded elsewhere. Several machines with this kind of hardware seem to have the same problem.

Attachment	Size
Download screen-shot-2013-05-23-at-4.30.49-pm.png	109.06 KB
Download Screen Shot 2013-05-23 at 4.39.54 PM.png	0 bytes
Download screen-shot-2013-05-23-at-4.40.48-pm.png	43.12 KB
Download screen-shot-2013-05-23-at-4.41.14-pm.png	82.83 KB
Download screen-shot-2013-05-23-at-5.01.45-pm.png	112.55 KB

↧

Pardiso result problem

May 24, 2013, 5:22 pm

Latest and popular articles on Intel Technologies

≫ Next: Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

≪ Previous: Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

Hi All,

I have implemented pardiso solver in our model for flow simulation. Pardiso can solve most of our problem well and produce correct results. But recently I have a case that pardiso generates quite different results with our origional solver (ws209), which has been used for over ten years for our model.I guess if there is something wrong with the setting.

Please find the test codes (pardiso_unsym_f.f90), sparse matrices exported from our model (a_i.txt, b_i.txt, ia_i.txt, ja_i.txt), result generated by pardiso (x_out_i.txt) and result generated by ws209 solver (x_ws209_i.txt). The first value in each file is the number of values.

Can anybody help to have a check?

Thanks and regards,

Daniel

Attachment	Size
Download pardiso-solver-result-compare.zip	980.71 KB

↧

Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

May 24, 2013, 2:28 pm

Latest and popular articles on Intel Technologies

≫ Next: Pardiso memory leak

≪ Previous: Pardiso result problem

Attached are some screenshots which provide information about:

1) Where the crash occurs (mkl_avx)

2) Which instruction it seems not to like (vandpd)

3) call stack

4&5) Configuration information (Windows Version and Hardware).

Attachment	Size
Download screen-shot-2013-05-23-at-4.30.49-pm.png	109.06 KB
Download Screen Shot 2013-05-23 at 4.39.54 PM.png	0 bytes
Download screen-shot-2013-05-23-at-4.40.48-pm.png	43.12 KB
Download screen-shot-2013-05-23-at-4.41.14-pm.png	82.83 KB
Download screen-shot-2013-05-23-at-5.01.45-pm.png	112.55 KB

↧

Pardiso memory leak

May 25, 2013, 12:28 pm

Latest and popular articles on Intel Technologies

≫ Next: pardiso iparm(2) parameter

≪ Previous: Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

Hi,

I have the following subroutine works perfectly. However, I am using it in a nonlinear solution. I call it many times in a program. After each call, the memory usage increases, though I am using 'Release memory' option at the end of the subroutine. Does anyone have an idea what may be the reason for memory increase here? Is there any way of chasing the variables at the begining and the end of subroutine; so that I can see which of them stay unreleased?

Many thanks.

SUBROUTINE SPARSE_SOL

USE XX ! this is module of already allocated variables

integer omp_get_max_threads

external omp_get_max_threads

!This is OK in both cases

INTEGER*8 pt(64)

!All other variables

INTEGER maxfct, mnum, mtype, phase, n, nrhs, error, msglvl

INTEGER i, idum

INTEGER iparm(64)

REAL*8 waltime1, waltime2, ddum

complex*16 cdum

! Fill all arrays containing matrix data.

DATA nrhs /1/, maxfct /1/, mnum /1/

n = nodes

!Set up PARDISO control parameter

do i = 1, 64

iparm(i) = 0

end do

iparm(1) = 1 ! no solver default

iparm(2) = 2 ! fill-in reordering from METIS

iparm(3) = mkl_get_max_threads()

! numbers of processors, value of MKL_NUM_THREADS

iparm(4) = 0 ! no iterative-direct algorithm

iparm(5) = 0 ! no user fill-in reducing permutation

iparm(6) = 0 ! =0 solution on the first n components of x

iparm(7) = 0 ! not in use

iparm(8) = 9 ! numbers of iterative refinement steps

iparm(9) = 0 ! not in use

iparm(10) = 13 ! perturb the pivot elements with 1E-13

iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS

iparm(12) = 0 ! not in use

iparm(13) = 0 ! not in use

iparm(14) = 0 ! Output: number of perturbed pivots

iparm(15) = 0 ! not in use

iparm(16) = 0 ! not in use

iparm(17) = 0 ! not in use

iparm(18) = -1 ! Output: number of nonzeros in the factor LU

iparm(19) = -1 ! Output: Mflops for LU factorization

iparm(20) = 0 ! Output: Numbers of CG Iterations

iparm(60) = 1 ! OOC core selection

error = 0 ! initialize error flag

msglvl = 1 ! print statistical information

mtype = 13 ! COMPLEX unsymmetric

!Initialize the internal solver memory pointer. This is only

!necessary for the FIRST call of the PARDISO solver.

do i = 1, 64

pt(i) = 0

end do

!Reordering and Symbolic Factorization, This step also allocates

!all memory that is necessary for the factorization

phase = 11 ! only reordering and symbolic factorization

CALL pardiso (pt,maxfct,mnum,mtype,phase,nodes, valuesCoo, &

rowIndex, colsCoo, idum, nrhs, iparm, msglvl, cdum, cdum, error)

WRITE(*,*) 'Reordering completed ... '

IF (error .NE. 0) THEN

WRITE(*,*) 'The following ERROR was detected: ', error

pause

END IF

WRITE(*,*) 'Number of nonzeros in factors = ',iparm(18)

WRITE(*,*) 'Number of factorization MFLOPS = ',iparm(19)

!Factorization.

phase = 22 ! only factorization

CALL pardiso(pt,maxfct,mnum,mtype, phase,nodes, valuesCoo, &

rowIndex, colsCoo,idum, nrhs, iparm, msglvl, cdum, cdum, error)

WRITE(*,*) 'Factorization completed ... '

IF (error .NE. 0) THEN

WRITE(*,*) 'The following ERROR was detected: ', error

pause

ENDIF

!Back substitution and iterative refinement

iparm(8) = 2 ! max numbers of iterative refinement steps

phase = 33 ! only factorization

! do i = 1, n

! b(i) = 1.d0

! end do

CALL pardiso(pt,maxfct,mnum,mtype,phase,nodes, valuesCoo, &

rowIndex,colsCoo,idum, nrhs, iparm, msglvl, bglb, sln, error)

WRITE(*,*) 'Solve completed ... '

!Termination and release of memory

phase = -1 ! release internal memory

CALL pardiso (pt, maxfct, mnum, mtype, phase, nodes, ddum, idum, idum, &

idum, nrhs, iparm, msglvl, ddum, ddum, error)

RETURN

END

Attachment	Size
Download subroutine-sparse.docx	15.77 KB

↧

pardiso iparm(2) parameter

May 27, 2013, 11:56 am

Latest and popular articles on Intel Technologies

≫ Next: Performance gets worse over time for the same instructions

≪ Previous: Pardiso memory leak

Dear All

From the talbe

http://software.intel.com/en-us/articles/pardiso-parameter-table#table2

It states that

iparm(2)=0 is the MD algorithm and iparm(2)=2 is metis package.

But from the former thread

http://software.intel.com/en-us/forums/topic/299748

"There exists another built-in reordering scheme so called MMD reordering available through iparm(2)=1."

Sergey suggested that iparm(2)=1 is the MD algorithm and iparm(2) = 0 and 2 is metis.

My experience is same as the former thread suggested.

Is the iparm table from your website wrong?

http://software.intel.com/en-us/forums/topic/299748

Hailong

↧

Performance gets worse over time for the same instructions

May 27, 2013, 1:47 pm

Latest and popular articles on Intel Technologies

≫ Next: sparse right hand side reordering problem

≪ Previous: pardiso iparm(2) parameter

First, I'm not if this is the right forum for this question. I don't know what is as the reason could be due to hardware or MKL or .NET or some other hidden factors.

I have a neural network code in C# which heavily uses MKL via PInvoke. I set a fixed number of threads and disabled dynamic threading of MKL. The C# code is used mainly before and after training. However, during training (i.e. between iterations), MKL carries most of the computational body. No memory is allocated and there's no I/O during training.

I have observed unpredictable performance across iterations (example below) and woud like to understand why. In some other runs, the number of connections processed per second dropped to ~600M for a few iterations (very strange). For the one below, it took 6h to finish the training (i.e. each iteration takes about 12 minutes on average). It's rather consistent that the perf degrates towards the end. The perf accounting is more consistent when I run a smaller job (e.g. 20 minutes to finish).

The code is large and not sharable. If you can't pinpoint why, a hint to help me investigate further would also be appreciated.

Iterations:1/30, 1504.65M connections processed per second
Iterations:2/30, 1505.16M connections processed per second
Iterations:3/30, 1505.16M connections processed per second
Iterations:4/30, 1504.96M connections processed per second
Iterations:5/30, 1503.38M connections processed per second
Iterations:6/30, 1504.68M connections processed per second
Iterations:7/30, 1502.40M connections processed per second
Iterations:8/30, 1506.11M connections processed per second
Iterations:9/30, 1503.20M connections processed per second
Iterations:10/30, 1504.95M connections processed per second
Iterations:11/30, 1502.34M connections processed per second
Iterations:12/30, 1498.91M connections processed per second
Iterations:13/30, 1490.70M connections processed per second
Iterations:14/30, 1477.59M connections processed per second
Iterations:15/30, 1459.92M connections processed per second
Iterations:16/30, 1433.61M connections processed per second
Iterations:17/30, 1402.28M connections processed per second
Iterations:18/30, 1356.30M connections processed per second
Iterations:19/30, 1342.68M connections processed per second
Iterations:20/30, 1306.84M connections processed per second
Iterations:21/30, 1263.10M connections processed per second
Iterations:22/30, 1236.72M connections processed per second
Iterations:23/30, 1209.60M connections processed per second
Iterations:24/30, 1183.91M connections processed per second
Iterations:25/30, 1157.60M connections processed per second
Iterations:26/30, 1140.60M connections processed per second
Iterations:27/30, 1112.54M connections processed per second
Iterations:28/30, 1086.06M connections processed per second
Iterations:29/30, 1071.61M connections processed per second
Iterations:30/30, 1055.94M connections processed per second

↧

sparse right hand side reordering problem

May 27, 2013, 4:05 pm

Latest and popular articles on Intel Technologies

≫ Next: ifort: error #10104: unable to open '--start-group' (MPI with FGMRES)

≪ Previous: Performance gets worse over time for the same instructions

Dear All

I am trying to use the sparse right hand feature of pardiso ( iparm(31)=1 ).

I tested with an identity matrix A.

When I set perm[i] = 1 for last several entries except the last one, I got the following error,

*** Error in PARDISO ( reordering_phase) error_num= -180

*** error PARDISO: reordering, symbolic factorization

perm before reordering
0 0 0 0 0 0 0 0 1 1 1 1 1 0
perm after reordering
8 7 4 3 6 2 10 1 5 9 10 11 12 13

Notice: 10 appears twice.

But when I set, the last entry of perm is 1, i.e. perm[last] = 1, then there is no problem.

perm before reordering
0 0 0 0 0 0 0 0 1 1 1 1 1 1
perm after reordering
8 7 6 5 4 3 2 1 9 10 11 12 13 14

The attachment is my C++ test code,

Could you please give me some suggestions?

Hailong

Attachment	Size
Download main.c	3.98 KB

↧

ifort: error #10104: unable to open '--start-group' (MPI with FGMRES)

May 27, 2013, 6:33 pm

Latest and popular articles on Intel Technologies

≫ Next: A quick question on using PARDISO as iterative solver

≪ Previous: sparse right hand side reordering problem

Hello!

Can someone help me get past this error please...

I get the following error while generating my executable:

ifort: error #10104: unable to open '--start-group'

I'm trying to run FGMRES sequentially on multiple nodes of a cluster using MPI. If I compile using ifort I get no problems, but if I use mpif90 I get the above error. The command I'm using is:

mpif90 -xHost -g -traceback -debug all -check all -implicitnone -fp-stack-check -heap-arrays -ftrapuv -check pointers -check bounds -I/INTEL/mkl/include -fpp source1.f90 source2.f -L"/INTEL/mkl/lib/em64t""/INTEL/mkl/lib/em64t"/libmkl_lapack95_lp64.a "/INTEL/mkl/lib/em64t"/libmkl_solver_lp64_sequential.a "/INTEL/mkl/lib/em64t"/libmkl_intel_lp64.a -Wl,--start-group "/INTEL/mkl/lib/em64t"/libmkl_sequential.a "/INTEL/mkl/lib/em64t"/libmkl_core.a -Wl,--end-group -lpthread -lm -o executable

Is it not possible to invoke FGMRES or any other MKL routine in an MPI environment?

Many Thanks!

↧