Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

pardiso Memory leak?

$
0
0

I feel very puzzled with the Win32 Pardiso version.

My integrated enviroment is Intel(R) Visual Fortran Compiler XE 12.1.7.371.

My matrix has 5 million non zeros ,and n =200,000 , real and symmetric indefinite

 I first test the memory I could use(1022M could be used), then I call pardiso to do the 23 phase,then call pardiso to do the -1 phase;

then I test the memory I could use again,

I fould that only about 700 M Memory that could be used.

What should I do? Theoretically I could use 1022 M Memory again.

This doesnt happen under win64.

Hope for the reply.

Thanks.

 


pardiso intermediate result output

$
0
0

Hi,can pardiso save intermediate results in hard disk?For example ,after phase=22,I save the intermediate results in the hard disk.Few days later ,I  call pardiso to finish the calculation of phase=33. Thanks!

Memory Leak

$
0
0

Hi,

I have an issue with dgemm() and other mkl functions not returning the memory that is used internal to the function.How can I release the memory used up. If this example is ran long enough the application will crash.

Environment:

/* MicroSoft Visual C++ 2010        Microsoft Windows 7     Intel Core i7-2600 CPU 3.40GHz      */
/*   Intel(R) C++ Composer XE 2011 Update 9, with Intel(R) C++ Compiler XE 12.1                   */
/*   and Version MKL 10.3 Update 10             

 Sample Code;

int main(int argc, char* argv[])
{

 double *a, *b, *c;
  int n, i;
  double alpha, beta;
  MKL_INT64 AllocatedBytes;
  int N_AllocatedBuffers;
 
  alpha = 1.1; beta = -1.2;
  n = 1000;

 for(int y=0; y<15; y++)
  {
     a = (double*)mkl_malloc(n*n*sizeof(double),64);
     b = (double*)mkl_malloc(n*n*sizeof(double),64);
     c = (double*)mkl_malloc(n*n*sizeof(double),64);

      for (i=0;i<(n*n);i++)
      {
         a[i] = (double)(i+1);
         b[i] = (double)(-i-1);
      }
   
      dgemm("N","N",&n,&n,&n,&alpha,a,&n,b,&n,&beta,c,&n);
       mkl_free_buffers();

      AllocatedBytes = mkl_mem_stat(&N_AllocatedBuffers);
      printf("\nDGEMM uses "FORMAT" bytes in %d buffers",AllocatedBytes,N_AllocatedBuffers);
 
      mkl_free(a);
      mkl_free(b);
      mkl_free(c);
      mkl_free_buffers();

      AllocatedBytes = mkl_mem_stat(&N_AllocatedBuffers);
      if (AllocatedBytes > 0) {
          printf("\nMKL memory leak!");
          printf("\nAfter mkl_free_buffers there are "FORMAT" bytes in %d buffers",
             AllocatedBytes,N_AllocatedBuffers);
      }

}

  mkl_free_buffers();
  mkl_thread_free_buffers();

       AllocatedBytes = mkl_mem_stat(&N_AllocatedBuffers);
      if (AllocatedBytes > 0) {
          printf("\nMKL memory leak!");
          printf("\nAfter mkl_free_buffers there are "FORMAT" bytes in %d buffers",
             AllocatedBytes,N_AllocatedBuffers);
      }
}

CodeOutput:

Version MKL 10.3 Update 10
DGEMM uses 24001112 bytes in 8 buffers
MKL memory leak!
After mkl_free_buffers there are 896 bytes in 5 buffe
DGEMM uses 29278936 bytes in 10 buffers
MKL memory leak!
After mkl_free_buffers there are 5278720 bytes in 7 b
DGEMM uses 29278936 bytes in 11 buffers
MKL memory leak!
After mkl_free_buffers there are 5278720 bytes in 8 b
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
DGEMM uses 31344728 bytes in 13 buffers
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10
MKL memory leak!
After mkl_free_buffers there are 7344512 bytes in 10

Thanks,

Vince

                     

non-linear optimization: jacobi_solve question

$
0
0

I have tried to use the same _JACOBIMATRIX_HANDLE_t to make evaluations of the jacobian at successive iteration points of the optimizer,

but the output matrix (fjac) is not getting updated.

I created an example, that isolates the issue and this does not seem to be possible. Am I correct ?

If yes, is there something I could do to "reset" the handle ? It would seem to me that this would be the desirable behavior, so that repeated initializaions of the handle and buffers are avoided.

TIA for your help,

Petros

ps: I attach the file. It is a watered-down  version of a much bigger project.

Some "tips" to make reading easier :

function evaluator : a wrapper of a member class that delivers the function call. wraps the class object and the method name.

extended_powell : the usual example in class dress.

NumericalJacobian: the class that wraps the mkl functionality

ublas::unbounded_array: similar to std::vector but with guaranteed contiguous memory layout.

ublas::matrix: the obvious.

AttachmentSize
Downloadmain.cpp5.46 KB

Missing FFTW libraries

$
0
0

I'm trying to compile a code written in fortran90/95 and I'm getting this error:

user:~> make
ifort -O3 -lmpi -L/opt/local/intel/fftw/lib -I/opt/local/intel/fftw/include - I/opt/sgi/mpt/mpt-2.03/include -L/opt/sgi/mpt/mpt-2.03/lib -o /test module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o filtre.o parametre.o forcage.o navier.o convdiff.o viv.o slfft3d_shift.o poisson.o
slfft3d_shift.o: In function 'slfft3d_shift_':
slfft3d_shift.f90:(.text+0x587e): undefined reference to 'rfftw3d_f77_mpi_create_plan_'
slfft3d_shift.f90:(.text+0x58a3): undefined reference to 'rfftwnd_f77_mpi_local_sizes_'
slfft3d_shift.f90:(.text+0x58cd): undefined reference to 'rfftwnd_f77_mpi_'
slfft3d_shift.f90:(.text+0x58d9): undefined reference to 'rfftwnd_f77_mpi_destroy_plan_'
slfft3d_shift.f90:(.text+0x91ca): undefined reference to 'rfftw3d_f77_mpi_create_plan_'
slfft3d_shift.f90:(.text+0x91ef): undefined reference to 'rfftwnd_f77_mpi_local_sizes_'
slfft3d_shift.f90:(.text+0x9219): undefined reference to 'rfftwnd_f77_mpi_'
slfft3d_shift.f90:(.text+0x9225): undefined reference to 'rfftwnd_f77_mpi_destroy_plan_'
slfft3d_shift.f90:(.text+0x111e7): undefined reference to 'rfftwnd_f77_one_real_to_complex_'
make: *** [/test] Error 1

Here's the Makefile:

FC = ifort
OPTFC = -O3 -lmpi -L/opt/local/intel/fftw/lib -I/opt/local/intel/fftw/include - I/opt/sgi/mpt/mpt-2.03/include -L/opt/sgi/mpt/mpt-2.03/lib/test : module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o poisson.o filtre.o parametre.o slfft3d_shift.o forcage.o navier.o convdiff.o viv.o

$(FC) $(OPTFC) -o /test module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o filtre.o parametre.o forcage.o navier.o convdiff.o viv.o slfft3d_shift.o poisson.o

module_param.o : module_param.f90
$(FC) $(OPTFC) -c module_param.f90 incompact3d.o : incompact3d.f90
$(FC) $(OPTFC) -c incompact3d.f90 mesure.o : mesure.f90
$(FC) $(OPTFC) -c mesure.f90 spectral.o : spectral.f90
$(FC) $(OPTFC) -c spectral.f90 schemas.o : schemas.f90
$(FC) $(OPTFC) -c schemas.f90 derive.o : derive.f90
$(FC) $(OPTFC) -c derive.f90 tools.o : tools.f90
$(FC) $(OPTFC) -c tools.f90 forcage.o : forcage.f90
$(FC) $(OPTFC) -c forcage.f90 navier.o : navier.f90
$(FC) $(OPTFC) -c navier.f90 filtre.o : filtre.f90
$(FC) $(OPTFC) -c filtre.f90 parametre.o : parametre.f90
$(FC) $(OPTFC) -c parametre.f90 convdiff.o : convdiff.f90
$(FC) $(OPTFC) -c convdiff.f90 poisson.o : poisson.f90
$(FC) $(OPTFC) -c poisson.f90 slfft3d_shift.o : slfft3d_shift.f90
$(FC) $(OPTFC) -c slfft3d_shift.f90 viv.o : viv.f90
$(FC) $(OPTFC) -c viv.f90

When I include the libraries required in the Makefile I get the following message:

user:~/test> make
ifort -O3 -lmpi -I/opt/local/intel/fftw/include -L/opt/local/gnu/fftw - I/opt/sgi/mpt/mpt-2.03/include -L/opt/sgi/mpt/mpt-2.03/lib -I/opt/fftw/2.1.5.1/cnos/include -L/opt/intel/composerxe-2011.0.084/mkl/include/fftw/fftw_f77.i -L/opt/fftw/2.1.5.1/cnos/lib -I/opt/local/intel/fftw -I/opt/fftw/3.1.1/cnos/include -L/opt/fftw/3.1.1/cnos/lib - I/usr/local/packages/nag/p3dfft-single/2.3/include -L/usr/local/packages/nag/p3dfft- single/2.3/lib -o /home/u/guitar88/bin/teste module_param.o incompact3d.o mesure.o schemas.o derive.o spectral.o tools.o filtre.o \
parametre.o forcage.o navier.o convdiff.o viv.o slfft3d_shift.o poisson.o -lm -L/opt/local/intel/fftw/lib -lsrfftw_mpi \
-lsrfftw -lsfftw_mpi -lsfftw
ld: cannot find -lsrfftw_mpi
make: *** [/test] Error 1

I'm using ifort compiler and I'm on a supercomputer environment working with MPI. Please, any clue of what is going on? Cheers.

Data race in Pardiso Solver?

$
0
0

Hi All,

I ran into a tricky problem when test data race for my codes. The Intel Inspector XE 2013 shows that there is data race in the following calling:

282    !C.. Factorization.
283          phase = 22 ! only factorization
284          CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,     &
285           idum, nrhs, iparm, msglvl, ddum, ddum, error)

But it can still solve the problem and the result seems correct. How does this data race come from?

I attached the testing codes and matrix data.

Thanks and regards,

Daniel

Error in feast eigenvalue solver for sparse matrices

$
0
0

Hi,

We are trying to use the MKL FEAST 11.0.3.1 solver for symmetric sparse CSR matrices of doubles. The call we are using is something like:

--------------------------------------------

#include "mkl.h"

......

//declaring & preparing data...

....

 feastinit(&feastparam[0]);   dfeast_scsrev(&UPLO,&N,sa,ia,ja,feastparam,&epsout,&loop,&Emin,&Emax,&M0,E,X,&M,res,&info);

--------------------------------------------

We have no warnings at compiling time, but at runtime the feast call throws an exception like this:

First exception in 0x000007fee1b8249c in feast.exe: 0xC0000005: Invalid read in 0xffffffffffffffff.

Which could be our mistake? Input data? Bad linkage/compiler version-mkl version combination? We are not getting any compiling warnings.

Intel Composer XE 2011 Update 7 (package 258)

Thanks in advance!

Aurora

Minimum working example for mkl_ddnscsr

$
0
0

Hi. Can anyone provide me with a minimum working example for mkl_ddnscsr? I have tried this so far

#include <stdio.h>
#include <stdlib.h>
#include <mkl.h>
int main(int argc, char *argv[])
{
  MKL_INT info;
  MKL_INT m = 3; //Number of rows of A
  MKL_INT n = 4; //Number of columns of A
  MKL_INT nnz = 6; //Number of non zero elements
  MKL_INT job[6] = {0,0,1,2,nnz,1};
  double  *Acsr = (double *)  calloc(nnz, sizeof(double)  );
  MKL_INT *Aj   = (MKL_INT *) calloc(nnz, sizeof(MKL_INT) );
  MKL_INT *Ai   = (MKL_INT *) calloc(m+1, sizeof(MKL_INT) );
  double A[3][4] = {{1.,3.,0.,0.},{0.,0.,4.,0.},{2.,5.,0.,6.}};
  mkl_ddnscsr ( job, &m, &n, A[0], &m, Acsr, Aj, Ai, &info);
  for (int i=0; i< nnz; i++) {
    if (Acsr[i] != 0) {
      printf( "column = %i, A = %fn", Aj[i], Acsr[i] );
    }
  }
  for (int i=0; i< m+1; i++) {
    printf("Ai[%i] = %in", i, Ai[i]);
  }
  return 0;
}

But it returns these results

column = 1, A = 1.000000
column = 2, A = 3.000000
column = 4, A = 4.000000
column = 1, A = 4.000000
column = 3, A = 2.000000
column = 4, A = 5.000000
Ai[0] = 1
Ai[1] = 3
Ai[2] = 4
Ai[3] = 7

If I play with the value for the lda I can almost get the correct result, however I believe this is as the manual suggests. I am on using Ubuntu 12.04 and Composer 2013.3.163 if that makes difference.

Thanks

Chris


Issues with declarations of MKL functions: 'remark #424: extra ";" ignored' messages are displayed

$
0
0

When Intel C++ compiler option /W5 is turned on the compiler shows lots of 'remark #424: extra ";" ignored' messages related declaration of some functions in MKL headers. Here is a small example:

...
..\mkl\include\mkl_solvers_ee.h(51): remark #424: extra ";" ignored
  _Mkl_Api(void,feastinit,(MKL_INT* fpm));
                                         ^

..\mkl\include\mkl_solvers_ee.h(52): remark #424: extra ";" ignored
  _Mkl_Api(void,FEASTINIT,(MKL_INT* fpm));

...

 

How to solve this symmetric indefinite matrix with pardiso?

$
0
0

Hello,

        Greetings! I would like to know how to effectively solve this(attached file) symmetric indefinite system using PARDISO. This particular matrix has super diagonal dominancy and almost zero and negative offdiagonal terms. I tried with differente pardiso parameters, could not get the solution promised by other softwares. Please suggest and list pardiso parameters/options for solving this matrix. The expected(promised) solution is ~0.7 all.

Thank you

Stoka

MKL vs Microsoft exp() function

$
0
0

I have a client that is migrating a large C++ software base from 32- to 64-bit code in MS Visual Studio. One of the problems they are having is that the 32- and 64-bit versions of the C library exp() function produce results that differ by 1ulp for some operands, and this is causing regression tests to fail. One potential solution I am considering is to use Intel MKL instead of the Microsoft library. So I have a few questions:

1. Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?

2. Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons? If so, are there any performance penalties if MKL is used in place of the Microsoft library?

3. Is there any way of getting the Microsoft .NET framework to use MKL? I assume it may have the same 32/64-bit differences, although I haven't tested that yet.

4. What other benefits might my client gain by switching to MKL?

Thanks in advance - dc42

efficiently solving least squares problems iteratively

$
0
0

I'm performing an iterative routine, where at each iteration I am solving a least squares problem using ?gels. At each iteration step I update a column of the matrix A, which will ulitmately converge, as will the solution vector b.

My question is this: Because at each iteration the solution vector b is changing only very little, is it possible to solve this more efficiently than repeatedly calling ?gels (perhaps by calling some of the routines that ?gels itself calls), since I know that I am often very close to the solution?

Thank you,

Tracy

Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

$
0
0

One of our customers has encountered a problem with our library (NumPy) when linked against MKL version 11.0.3 on the Windows platform.    The program dies during an eigenvalue decomposition.    The problem does not show up on a build of NumPy against an older version of the MKL.

Attached are some screenshots which provide information about: 

1) Where the crash occurs (mkl_avx)

2) Which instruction it seems not to like (vandpd)

3) call stack

4&5) Configuration information (Windows Version and Hardware).   

The software is Anaconda 1.5 available here:  www.continuum.io but the same problem was replicated with other versions of NumPy downloaded elsewhere.   Several machines with this kind of hardware seem to have the same problem. 

Pardiso result problem

$
0
0

Hi All,

I have implemented pardiso solver in our model for flow simulation. Pardiso can solve most of our problem well and produce correct results. But recently I have a case that pardiso generates quite different results with our origional solver (ws209), which has been used for over ten years for our model.I guess if there is something wrong with the setting.

Please find the test codes (pardiso_unsym_f.f90), sparse matrices exported from our model (a_i.txt, b_i.txt, ia_i.txt, ja_i.txt), result generated by pardiso (x_out_i.txt) and result generated by ws209 solver (x_ws209_i.txt). The first value in each file is the number of values.

Can anybody help to have a check?

Thanks and regards,

Daniel

Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

$
0
0

One of our customers has encountered a problem with our library (NumPy) when linked against MKL version 11.0.3 on the Windows platform.    The program dies during an eigenvalue decomposition.    The problem does not show up on a build of NumPy against an older version of the MKL.

Attached are some screenshots which provide information about: 

1) Where the crash occurs (mkl_avx)

2) Which instruction it seems not to like (vandpd)

3) call stack

4&5) Configuration information (Windows Version and Hardware).   

The software is Anaconda 1.5 available here:  www.continuum.io but the same problem was replicated with other versions of NumPy downloaded elsewhere.   Several machines with this kind of hardware seem to have the same problem. 


Pardiso memory leak

$
0
0

 

Hi,

I have the following subroutine works perfectly. However, I am using it in a nonlinear solution. I call it many times in a program. After each call, the memory usage increases, though I am using 'Release memory' option at the end of the subroutine. Does anyone have an idea what may be the reason for memory increase here? Is there any way of chasing the variables at the begining and the end of subroutine; so that I can see which of them stay unreleased?

Many thanks.

 

SUBROUTINE SPARSE_SOL

      USE XX  ! this is module of already allocated variables

     

      integer omp_get_max_threads

      external omp_get_max_threads

      !This is OK in both cases

      INTEGER*8 pt(64)

      !All other variables

      INTEGER maxfct, mnum, mtype, phase, n, nrhs, error, msglvl

      INTEGER i, idum

      INTEGER iparm(64)           

      REAL*8 waltime1, waltime2, ddum

      complex*16 cdum

      ! Fill all arrays containing matrix data.

      DATA nrhs /1/, maxfct /1/, mnum /1/

      n = nodes   

     

      !Set up PARDISO control parameter

 

      do i = 1, 64

            iparm(i) = 0

      end do

      iparm(1) = 1 ! no solver default

      iparm(2) = 2 ! fill-in reordering from METIS

      iparm(3) = mkl_get_max_threads()

      ! numbers of processors, value of MKL_NUM_THREADS

      iparm(4) = 0 ! no iterative-direct algorithm

      iparm(5) = 0 ! no user fill-in reducing permutation

      iparm(6) = 0 ! =0 solution on the first n components of x

      iparm(7) = 0 ! not in use

      iparm(8) = 9 ! numbers of iterative refinement steps

      iparm(9) = 0 ! not in use

      iparm(10) = 13 ! perturb the pivot elements with 1E-13

      iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS

      iparm(12) = 0 ! not in use

      iparm(13) = 0 ! not in use

      iparm(14) = 0 ! Output: number of perturbed pivots

      iparm(15) = 0 ! not in use

      iparm(16) = 0 ! not in use

      iparm(17) = 0 ! not in use

      iparm(18) = -1 ! Output: number of nonzeros in the factor LU

      iparm(19) = -1 ! Output: Mflops for LU factorization

      iparm(20) = 0 ! Output: Numbers of CG Iterations

      iparm(60) = 1 ! OOC core selection

      error = 0 ! initialize error flag

      msglvl = 1 ! print statistical information

      mtype = 13 ! COMPLEX unsymmetric

      !Initialize the internal solver memory pointer. This is only

      !necessary for the FIRST call of the PARDISO solver.

            do i = 1, 64

                  pt(i) = 0

            end do

 

      !Reordering and Symbolic Factorization, This step also allocates

      !all memory that is necessary for the factorization

      phase = 11 ! only reordering and symbolic factorization

      CALL pardiso (pt,maxfct,mnum,mtype,phase,nodes, valuesCoo,  &

        rowIndex, colsCoo, idum, nrhs, iparm, msglvl, cdum, cdum, error)

      WRITE(*,*) 'Reordering completed ... '

      IF (error .NE. 0) THEN

             WRITE(*,*) 'The following ERROR was detected: ', error

             pause

      END IF

      WRITE(*,*) 'Number of nonzeros in factors = ',iparm(18)

      WRITE(*,*) 'Number of factorization MFLOPS = ',iparm(19)

 

      !Factorization.

      phase = 22 ! only factorization

      CALL pardiso(pt,maxfct,mnum,mtype, phase,nodes, valuesCoo,  &

        rowIndex, colsCoo,idum, nrhs, iparm, msglvl, cdum, cdum, error)

      WRITE(*,*) 'Factorization completed ... '

      IF (error .NE. 0) THEN

            WRITE(*,*) 'The following ERROR was detected: ', error

            pause

      ENDIF

     

      !Back substitution and iterative refinement

      iparm(8) = 2 ! max numbers of iterative refinement steps

      phase = 33 ! only factorization

!      do i = 1, n

!            b(i) = 1.d0

!      end do

 

      CALL pardiso(pt,maxfct,mnum,mtype,phase,nodes, valuesCoo,  &

        rowIndex,colsCoo,idum, nrhs, iparm, msglvl, bglb, sln, error)

      WRITE(*,*) 'Solve completed ... '

     

      !Termination and release of memory

      phase = -1 ! release internal memory

 

      CALL pardiso (pt, maxfct, mnum, mtype, phase, nodes, ddum, idum, idum,  &

      idum, nrhs, iparm, msglvl, ddum, ddum, error)

          

      RETURN

      END

AttachmentSize
Downloadsubroutine-sparse.docx15.77 KB

pardiso iparm(2) parameter

$
0
0

Dear All

From the talbe 

http://software.intel.com/en-us/articles/pardiso-parameter-table#table2

It states that

iparm(2)=0 is the MD algorithm and iparm(2)=2 is metis package.

But from the former thread 

http://software.intel.com/en-us/forums/topic/299748

"There exists another built-in reordering scheme so called MMD reordering available through iparm(2)=1."

Sergey suggested that iparm(2)=1 is the MD algorithm and iparm(2)  = 0 and 2 is metis.

My experience is same as the former thread suggested.

Is the iparm table from your website wrong? 

http://software.intel.com/en-us/forums/topic/299748

Hailong

Performance gets worse over time for the same instructions

$
0
0

First, I'm not if this is the right forum for this question. I don't know what is as the reason could be due to hardware or MKL or .NET or some other hidden factors.

I have a neural network code in C# which heavily uses MKL via PInvoke. I set a fixed number of threads and disabled dynamic threading of MKL. The C# code is used mainly before and after training. However, during training (i.e. between iterations), MKL carries most of the computational body. No memory is allocated and there's no I/O during training.

I have observed unpredictable performance across iterations (example below) and woud like to understand why. In some other runs, the number of connections processed per second dropped to ~600M for a few iterations (very strange). For the one below, it took 6h to finish the training (i.e. each iteration takes about 12 minutes on average). It's rather consistent that the perf degrates towards the end. The perf accounting is more consistent when I run a smaller job (e.g. 20 minutes to finish).

The code is large and not sharable. If you can't pinpoint why, a hint to help me investigate further would also be appreciated.

Iterations:1/30, 1504.65M connections processed per second
Iterations:2/30, 1505.16M connections processed per second
Iterations:3/30, 1505.16M connections processed per second
Iterations:4/30, 1504.96M connections processed per second
Iterations:5/30, 1503.38M connections processed per second
Iterations:6/30, 1504.68M connections processed per second
Iterations:7/30, 1502.40M connections processed per second
Iterations:8/30, 1506.11M connections processed per second
Iterations:9/30, 1503.20M connections processed per second
Iterations:10/30, 1504.95M connections processed per second
Iterations:11/30, 1502.34M connections processed per second
Iterations:12/30, 1498.91M connections processed per second
Iterations:13/30, 1490.70M connections processed per second
Iterations:14/30, 1477.59M connections processed per second
Iterations:15/30, 1459.92M connections processed per second
Iterations:16/30, 1433.61M connections processed per second
Iterations:17/30, 1402.28M connections processed per second
Iterations:18/30, 1356.30M connections processed per second
Iterations:19/30, 1342.68M connections processed per second
Iterations:20/30, 1306.84M connections processed per second
Iterations:21/30, 1263.10M connections processed per second
Iterations:22/30, 1236.72M connections processed per second
Iterations:23/30, 1209.60M connections processed per second
Iterations:24/30, 1183.91M connections processed per second
Iterations:25/30, 1157.60M connections processed per second
Iterations:26/30, 1140.60M connections processed per second
Iterations:27/30, 1112.54M connections processed per second
Iterations:28/30, 1086.06M connections processed per second
Iterations:29/30, 1071.61M connections processed per second
Iterations:30/30, 1055.94M connections processed per second

sparse right hand side reordering problem

$
0
0

Dear All

I am trying to use the sparse right hand feature of pardiso ( iparm(31)=1 ).

I tested with an identity matrix A.

When I set perm[i] = 1 for last several entries except the last one, I got the following error,

*** Error in PARDISO ( reordering_phase) error_num= -180

*** error PARDISO: reordering, symbolic factorization

perm before reordering
0 0 0 0 0 0 0 0 1 1 1 1 1 0 
perm after reordering
8 7 4 3 6 2 10 1 5 9 10 11 12 13

Notice: 10 appears twice.

But when I set, the last entry of perm is 1, i.e. perm[last] = 1, then there is no problem.

perm before reordering
0 0 0 0 0 0 0 0 1 1 1 1 1 1
perm after reordering
8 7 6 5 4 3 2 1 9 10 11 12 13 14

The attachment is my C++ test code,

Could you please give me some suggestions?

Hailong

AttachmentSize
Downloadmain.c3.98 KB

ifort: error #10104: unable to open '--start-group' (MPI with FGMRES)

$
0
0

Hello!

Can someone help me get past this error please...

I get the following error while generating my executable:

ifort: error #10104: unable to open '--start-group'

I'm trying to run FGMRES sequentially on multiple nodes of a cluster using MPI. If I compile using ifort I get no problems, but if I use mpif90 I get the above error. The command I'm using is:

mpif90 -xHost -g -traceback -debug all -check all -implicitnone -fp-stack-check -heap-arrays -ftrapuv -check pointers -check bounds -I/INTEL/mkl/include -fpp source1.f90 source2.f -L"/INTEL/mkl/lib/em64t""/INTEL/mkl/lib/em64t"/libmkl_lapack95_lp64.a "/INTEL/mkl/lib/em64t"/libmkl_solver_lp64_sequential.a "/INTEL/mkl/lib/em64t"/libmkl_intel_lp64.a -Wl,--start-group "/INTEL/mkl/lib/em64t"/libmkl_sequential.a "/INTEL/mkl/lib/em64t"/libmkl_core.a -Wl,--end-group -lpthread -lm -o executable  

Is it not possible to invoke FGMRES or any other MKL routine in an MPI environment? 

Many Thanks!

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>