Error in simmetric sparse matrix analysis step

June 10, 2014, 12:11 pm

Latest and popular articles on Intel Technologies

≫ Next: Stack Overflow Error in Eigenvalue Solver: dfeast_scsrgv

≪ Previous: libiomp5md.dll location (release build)

I am a new user of a MKL, so question will be silly.

I am trying to use pardiso_64 function to solve a system with symmetric sparse matrix.

IDE is MS VS 2013, language is C++. Matrix size in test case is relatively small (about 4000 of rows).

Here is an error i am getting.

I read documentation about pardiso solvers, and as fas as i see, error -8 is an 32-bit integer overflow, but that doesn't make sense in my case.

What are the possible reasons and where should i start to locate an error?

Source code attached.

Thanks in advance.

Attachment	Size
Download solver.cpp	3.09 KB

↧

Stack Overflow Error in Eigenvalue Solver: dfeast_scsrgv

June 8, 2014, 5:23 pm

Latest and popular articles on Intel Technologies

≫ Next: Preconditioners for banded matrix (diagonal storage format)?

≪ Previous: Error in simmetric sparse matrix analysis step

Hi,

I'm trying to solve eigenvalue problem for large sparse matrices using dfeast_scsrgv function. The function works fine for small problems (ex: 8*8 sparse matrix) but it gives System.StackOverflowException error for larger problems( ex: 200*200 sparse matrix) . I'm using Visual Studio 2008 and MKL version 11 with most recent updates installed. My system is windows 64 bit and the programming language is C++. Following I provided the eigenvalue solver code that I'm using. In debug mode when I reach dfeast_scsrgv line it gives me Stack Overflow error. I do not think I am using any infinite loop or unnecessary large arrays. I would appreciate if someone can help me to fix the problem. Thanks!

   
//Convert stiffness and mass matrix to CSR format - Seldon library
int NumStiff = M_GStiff.GetDataSize();   
Vector<double> V_GStiffVal   (NumStiff);   
Vector<int>    V_GStiffColInd(NumStiff);
Vector<int>    V_GStiffRowPtr(PrbDim+1);
ConvertToCSR(M_GStiff, prop, V_GStiffRowPtr, V_GStiffColInd, V_GStiffVal);  
 
int NumMass = M_GMass.GetDataSize();
Vector<double> V_GMassVal   (NumMass);
Vector<int>    V_GMassColInd(NumMass);
Vector<int>    V_GMassRowPtr(PrbDim+1);
ConvertToCSR(M_GMass, prop, V_GMassRowPtr, V_GMassColInd, V_GMassVal);   
 
//Release memory
M_GStiff.Clear();
M_GMass.Clear();
 
//Convert Seldon format to typical C array
double* a = V_GStiffVal.GetData();
int*   ia = V_GStiffRowPtr.GetData();
int*   ja = V_GStiffColInd.GetData();   
 
double* b = V_GMassVal.GetData();
int*   ib = V_GMassRowPtr.GetData();
int*   jb = V_GMassColInd.GetData();
 
// Convert matrix from 0-based C-notation to Fortran 1-based
int nnz = ia[PrbDim]; 
for (int i = 0; i < PrbDim+1; i++)        ia[i] += 1;    
for (int i = 0; i < nnz; i++)                 ja[i] += 1;  
 
for (int i = 0; i < PrbDim+1; i++)      ib[i] += 1;
for (int i = 0; i < nnz; i++)              jb[i] += 1;
 
// Initialize variables for the solver
double Error    = 0;
int    Loop     = 0;
int    NumMode  = 10;
double Emin     = 0;
double Emax     = pow(10.0,10.0);    
int    Flag     = 0;
char   MTyp     = 'U';
int    NumEigen = NumMode;
 
vector<int>    V_FPM (128,0);
vector<double> V_Eigen(NumMode,0);
vector<double> V_Res (NumMode,0);    
 
V_FPM[0]  = 1;
V_FPM[1]  = 8;
V_FPM[2]  = 12;
V_FPM[3]  = 20;
V_FPM[4]  = 0;
V_FPM[5]  = 0;
V_FPM[6]  = 5;
V_FPM[13] = 0;
V_FPM[63] = 0;
 
int*    P_FPM          = &V_FPM[0];
double* P_Eigen        = &V_Eigen[0];  
double* P_Res          = &V_Res[0];    
double dDum;
 
// Call Eigenvalue Solver
dfeast_scsrgv (&MTyp, &PrbDim, a, ia, ja, b, ib, jb,
                P_FPM, &Error, &Loop, &Emin, &Emax, &NumMode, P_Eigen, &dDum, &NumEigen, P_Res, &Flag);

↧

Preconditioners for banded matrix (diagonal storage format)?

June 12, 2014, 8:53 am

Latest and popular articles on Intel Technologies

≫ Next: Webinar announcement: A Tour of the Sparse Linear Algebra Functionality in Intel MKL

≪ Previous: Stack Overflow Error in Eigenvalue Solver: dfeast_scsrgv

All,

Is there a built-in routine for computing a preconditioner for a sparse matrix in diagonal storage format or what other alternatives exist? I plan on using it with the fmgres routine.

Thanks

↧

Webinar announcement: A Tour of the Sparse Linear Algebra Functionality in Intel MKL

June 13, 2014, 11:47 am

Latest and popular articles on Intel Technologies

≫ Next: MKL FFT and FFTW

≪ Previous: Preconditioners for banded matrix (diagonal storage format)?

Sparse matrix algorithms are encountered in a broad range of important scientific computing applications. Intel MKL offers a powerful set of functions that can be used to build a complete solution to many sparse linear systems. This webinar gives a tour of MKL’s sparse linear algebra components. Highlights include Sparse BLAS functions, Direct solvers for sparse linear systems, Iterative solvers, and Eigensolvers for sparse matrices based on the FEAST algorithm.

↧

MKL FFT and FFTW

June 16, 2014, 2:56 am

Latest and popular articles on Intel Technologies

≫ Next: Speed of solvers with CSR format

≪ Previous: Webinar announcement: A Tour of the Sparse Linear Algebra Functionality in Intel MKL

Does the MKL Library use FFTW internally?

If there are other facilities as well, are there anmy limitations in the radixes?

Thanks

↧

Speed of solvers with CSR format

June 16, 2014, 6:55 pm

Latest and popular articles on Intel Technologies

≫ Next: Linking problem

≪ Previous: MKL FFT and FFTW

I have been using MKL in composer 2013. Especially, pardiso and preconditioned conjugate gradient solver with CSR format to solve symmetric matrices

I wonder that using full element CSR format is much faster than having half CSR format.

Also, there is a example with jacobi precondition CG in MKL folder.

Are there another preconditioned CG examples?

↧

Linking problem

June 16, 2014, 11:35 pm

Latest and popular articles on Intel Technologies

≫ Next: Pardiso L and U factors

≪ Previous: Speed of solvers with CSR format

Hello,

I am trying to solve a linear system of equations using LAPACK ROUTINES and I have tried to link but some of the codes seem to work whiles others do not. I get the error when use the CALL GESV.

design.f90(477): error #6285: There is no matching specific subroutine for this

generic subroutine call. [GESV]

CALL GESV(xtemp, ztemp)

---------------------^

compilation aborted for design.f90 (code 1)

This does not happen when i change CALL GESV to CALL GETRF but then CALL GETRS will also no work. xtemp is a 12 x 12 matrix whiles ztemp is a vector of 12 elements.

This is the coding

		PROGRAM DESIGN

		USE Prop_mod
		USE blas95
		USE f95_precision
		USE lapack95

		IMPLICIT NONE

		!DECLARATIONS OF VARIABLES

......

		DO i = 1, num
			WRITE(30, '(*(G0.4, ",", :))') xtemp(i, :)
		ENDDO
		PAUSE

		ALLOCATE(ipiv(num))
		!CALL getrf(xtemp, ipiv)
		!CALL getrs(xtemp, ipiv, ztemp)

		CALL gesv(xtemp, ztemp)

		WRITE(*, *)
		WRITE(*, *) xtemp
		WRITE(30, *)
		WRITE(30, *)
		DO i = 1, num
			WRITE(31, '(*(G0.4, ",", :))') xtemp(i, :)
		ENDDO
		PAUSE

		WRITE(*,*) ztemp
		PAUSE

		WRITE(*, *)

When i combine Call getrf and getrs, only getrf works and the solution subroutine "getrs" fails.

Call gesv does not work at all. I really need someones help on this.

This is what i type to compile and link the codes

C:\Program Files (x86)\Intel\Composer XE 2013 SP1\Projects>ifort Props_mod.f90 d

esign.f90 /traceback mkl_rt.lib mkl_blas95_ilp64.lib mkl_lapack95_ilp64.lib libi

omp5md.lib /exe:ex1.exe

Am I not linking properly? I have tried all i know of but it is still not working.

↧

Pardiso L and U factors

June 17, 2014, 8:52 am

Latest and popular articles on Intel Technologies

≫ Next: Memory doubled when calling zgetrf in Csharp,

≪ Previous: Linking problem

I have been trying to figure out a way to extract the L and U factors of a matrix using pardiso. I read somewhere that this is not possible, but I just wanted to check with people here to see if there is a way to do it.

Thanks for the help!

↧

Memory doubled when calling zgetrf in Csharp,

June 17, 2014, 4:03 pm

Latest and popular articles on Intel Technologies

≫ Next: VSL Summary Statistics VSL_SS_SUM error

≪ Previous: Pardiso L and U factors

Hi,

I'm solving a large complex dense matrix (10k*10k) in Csharp by calling zgetrf first to LU-decompse the matrix. If I understand correctly, zgetrf will do a in-place LU-decomposition. So there should be no extra copies produced. But the memory usage becomes doubled (from 1.6G to more than 3G) when zgetrf is running. Can someone kindly tell me the reason? It would be great if I can figure out a solution to avoid the extra memory usage.

Here is the code

class Program
    {


        static void Main(string[] args)
        {
            int n1 = 10000;
            int n2 = n1 * n1;
            complex[] A = new complex[n2];
            complex[] b = new complex[n1];

            for (int i = 0; i < n1; i++)
            {
                for (int j = 0; j < n1; j++)
                {
                    int ij = j * n1 + i;
                    A[ij].r = i + 1;
                    A[ij].i = j + 1;
                }
                b[i].r = i + 1;
                b[i].i = i + 2;
            }

            int info = 0;
            int[] ipiv = new int[n1];
            MKL.LU_decomposition(A, ref n1, ref n2, ipiv, ref info);
            MKL.LU_solve(A, b, ref n1, ref n2, ipiv, ref info);
        }


        [SuppressUnmanagedCodeSecurity]
        public sealed class MKL
        {
            private MKL() { }

            [DllImport("customMKL_Sequential.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)]
            public static extern void LU_decomposition([In, Out] complex[] A, ref int n1, ref int n2, [In, Out] int[] ipiv, ref int info);

            [DllImport("customMKL_Sequential.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)]
            public static extern void LU_solve([In] complex[] A, [In, Out] complex[] b, ref int n1, ref int n2, [In] int[] ipiv, ref int info);
        }


        [StructLayout(LayoutKind.Sequential)]/* Force sequential mapping */
        public struct complex
        {
            [MarshalAs(UnmanagedType.R8, SizeConst = 64)]
            public double r;
            [MarshalAs(UnmanagedType.R8, SizeConst = 64)]
            public double i;
        };
    }

subroutine LU_decomposition (A, n1, n2, ipiv, info)
    implicit none
    !dec$ attributes dllexport :: LU_decomposition
    !dec$ attributes alias:'LU_decomposition' :: LU_decomposition

    integer, intent(in) :: n1, n2
    integer, intent(inout) :: info
    integer, dimension(n1), intent(inout) :: ipiv
    complex(8), dimension(n2), intent(inout) :: A

    call zgetrf(n1, n1, A, n1, ipiv, info)
end subroutine


subroutine LU_solve (A, b, n1, n2, ipiv, info)
    implicit none
    !dec$ attributes dllexport :: LU_solve
    !dec$ attributes alias:'LU_solve' :: LU_solve

    integer, intent(in) :: n1, n2
    integer, intent(inout) :: info
    integer, dimension(n1), intent(in) :: ipiv
    complex(8), dimension(n2), intent(inout) :: A
    complex(8), dimension(n1), intent(inout) :: b

    call zgetrs('N', n1, 1, A, n1, ipiv, b, n1, info)
end subroutine

Thanks alot in advance

↧

VSL Summary Statistics VSL_SS_SUM error

June 18, 2014, 5:12 pm

Latest and popular articles on Intel Technologies

≫ Next: Segfault in the dtpmqrt routine

≪ Previous: Memory doubled when calling zgetrf in Csharp,

Pls let me know why the code bellow shows invalid results(sum) when NR is greater than 1999.

#define NR 2000   // MEAN=1 but invalid SUM !

//#define NR 1999 // MEAN=1 and SUM==1999 of course

#define DIM 4

void Test()

{

    VSLSSTaskPtr task;

    int errcode,dim = DIM, n=NR,x_storage=VSL_SS_MATRIX_STORAGE_COLS;

    double x[NR*DIM],mean[DIM], sum[DIM],W[2];

    W[0]=0; W[1]=0;

    for(int j=0;j<DIM;++j)   {mean[j]=0;sum[j]=0;}

    for(int i=0;i<NR*DIM;++i) x[i]=1;

    errcode = vsldSSNewTask( &task, &dim, &n, &x_storage, (double*)x, 0, 0 );

    errcode = vsldSSEditTask(task,VSL_SS_ED_ACCUM_WEIGHT,W);

    errcode = vsldSSEditTask(task,VSL_SS_ED_MEAN,mean);

    errcode = vsldSSEditTask(task,VSL_SS_ED_SUM,sum);

    errcode = vsldSSCompute( task, VSL_SS_MEAN | VSL_SS_SUM, VSL_SS_METHOD_FAST );

    for(int i=0;i<dim;++i) printf("M[%d] %g S[%d] %g\n",i,mean[i],i, sum[i]);

    errcode = vslSSDeleteTask( &task );

}

The result when NR=2000 is:

M[0] 1 S[0] -2.65698e+303

M[1] 1 S[1] -2.65698e+303

M[2] 1 S[2] -2.65698e+303

M[3] 1 S[3] -2.65698e+303

I got this result under

Windows 8.1

Intel(R) Math Kernel Library Version 11.1.3 Product Build 20140416 for 32-bit applications

↧

Segfault in the dtpmqrt routine

June 19, 2014, 3:01 am

Latest and popular articles on Intel Technologies

≫ Next: How to compile the sample code in minGW complier?

≪ Previous: VSL Summary Statistics VSL_SS_SUM error

Hello,

While testing out MKL Lapack's dtpqrt and dtpmqrt routines, I've stumbled across a weird segfault. I replicated the error in this example (I should mention that I use Eigen just to make my life easier and that populateEigenMat is just creating a random matrix).

The problem is that for different value of the parameter m (the number of rows of the matrix B in Lapack's reference for dtpqrt and dtmqrt), the code either works (small values of m, and the result is correct) or it creates a segfault (larger values of m).

int main()
{

  int m =150;
  int n = 5;
  int nb = 1;
  int l = 0;
  int info;

  MatrixXd a(n,n), b(m,n), t(nb, n);
  populateEigenMat(a), populateEigenMat(b);
  a = a.eval().triangularView<Eigen::Upper>();

  info = LAPACKE_dtpqrt(LAPACK_COL_MAJOR, m, n, l, nb, a.data(), a.rows(),
  		       b.data(), b.rows(), t.data(), t.rows());

  int k = n;

  MatrixXd cA(k,n), cB(m,n), c(k+m,n);
  populateEigenMat(cA), populateEigenMat(cB);
  c<<cA,cB;

  cout<<"Still ok!\n";
  // cout<<endl<<cB<<endl<<endl<<c.block(k,0,m,n)<<endl<<endl;
  info = LAPACKE_dtpmqrt(LAPACK_COL_MAJOR, 'L', 'N', m, n, k, l, nb, b.data(), b.rows(), t.data(), t.rows(),
  			 c.data(), c.rows(), c.data()+k, c.rows());

}

I'm using version of Intel Composer: composer_xe_2013_sp1.2.144 and the following links -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -std=c++11 -xAVX -DMKL_ILP64.

Could some please tell me where I've made a mistake?

Kind regards

↧

How to compile the sample code in minGW complier?

June 22, 2014, 12:49 am

Latest and popular articles on Intel Technologies

≫ Next: PARDISO gives wrong answers when given too many rhs's

≪ Previous: Segfault in the dtpmqrt routine

Hi all,

I am using minGW as the compiler and our code has a lot of FFT. In the past we use fftw and we want to try Intel's fft to improve it.

I download the trial version of the MKL and install it on my computer (64 bit Windows 8). I also download the first sample code from the website. (https://software.intel.com/sites/products/documentation/hpc/mkl/mklman/G...)

However, I don't know how to compile it in minGW. In the pass, we just input in the windows command window: gfortran -o test.exe -O2 testfortran.f -LC:\\Windows\system -lfftw3-3. This can link our code to the fftw's fft library. But I don't know how to link the code to the new library in minGW compiler.

Could some please tell me where I've made a mistake? We appreciate your help.

We just want to do the forward and backward fft. If there is an easy example of Fortran code just tells us how to use the Intel's forward fft and backward fft, it would be very helpful.

Best regards,

↧

PARDISO gives wrong answers when given too many rhs's

June 23, 2014, 1:10 pm

Latest and popular articles on Intel Technologies

≫ Next: Compilation Interface issue with dfeast_scsrev

≪ Previous: How to compile the sample code in minGW complier?

When I call PARDISO with too many rhs's I get wrong answers.

I first call PARDISO for phase 11 and phase 22. I keep the factored matrix in memory, and, subsequently leave the subroutine that calls PARDISO.

I later call PARDISO to perform only phase 33. I send in a fully populated, "stacked" rhs vector (i.e. total number of entries is neq*nrhs).

Here's what I've found:

Case 1: When I call PARDISO for phase 33 only and all 9,850 rhs's, the PARDISO returns completely wrong answers. Many NaN's and all other completely wrong vales.

Also, after the call to PARDISO, I cannot successfully deallocate some arrays (these arrays are totally unrelated to the matrices used by PARDISO). The message is

"Invalid pointers" I suspect that PARDISO has overwritten some of the pointers (and/or values in these arrays that I cannot deallocate. Perhaps a memory leak????

Case 2: When I call PARDISO for phase 33 only, sending in 3,000 rhs's at time (i.e I call PARDISO 4 times (1st 3000, 2nd 3000, 3rd 3000, 4th 3000, and, then the remaining 850 rhs) I get wrong answers,

BUT, the answers are almost reasonable. There are no NaN's and without knowing I might guesss the answers are correct. In the Case 2, the deallocation problem, mentioned in case 1, does NOT occur.

Case 3: When I call PARDISO for phase 33 only, sending in 100 rhs's at a time (i.e. I call PARDISO 99 times) I get the correct answers for everything, and, there are no problems.

===============

What is causing this problem? I am concerned that other models (let's say I've got 500,000 equations) may provide wrong answers, and, I won't know it (i.e. I have no way

of knowing the largest number of rhs's that I can send into PARDISO in a single chunk, for a general model.

==============

I know that I'm not running out of central memory, and, I know that I'm not running out of swap space. Do I have a number of threads problem??

=====================

MKL: Version 10.3 Update 4

Operating System: Red Hat Enterprise Linux AS release 4 (Nahant Update 8)

Environment Variables set:
MKL_NUM_THREADS = 32

Computer has 32 processors

Computer has 198 Gb of memory

Computer has 68 Gb of available swap space

==================================================

I am using solution PARDISO mtype=6 (i.e. double precision complex, symmetric, in-core only)

Number of equations is 183,180

Needed number of rhs's = 9,850

==================================================

Any help would be GREATLY appreciated.

Thanks, Bob

↧

Compilation Interface issue with dfeast_scsrev

June 25, 2014, 7:56 am

Latest and popular articles on Intel Technologies

≫ Next: advice for getting rid of temporary creation and fortran95 interface for dsecnd

≪ Previous: PARDISO gives wrong answers when given too many rhs's

Hi,

I recently upgrades to XE2013 to get access to the extended eigensolvers in the MKL11 libraries.

I am trying to use dfeast_scsrev but am getting some compilation errors.

Here is my code:

SUBROUTINE TTTT(IVECT,STIFFNESS_MATRIX)

USE SPARSE_MATRIX_CLASS



! INCLUDE 'mkl_solvers_ee.fi'





INTEGER::fpm(128)

REAL(8)::EMIN,EMAX,EPSOUT

REAL(8),ALLOCATABLE::E(:),X(:,:),RES(:)

INTEGER::M0,LOOP,INFO,M

INTEGER::IVECT(*)

TYPE(SPARSE_MATRIX)::STIFFNESS_MATRIX

CALL SPARSE_MATRIX_STORAGE('CSR',STIFFNESS_MATRIX)

call feastinit (fpm)

EMIN=0D0 ; EMAX=1000D0 ; M0=ivect(12)

ALLOCATE(E(M0),X(IVECT(12),M0),RES(M0))



call dfeast_scsrev('L',ivect(12),STIFFNESS_MATRIX%MATRIX,STIFFNESS_MATRIX%ROWS(1:ivect(12)+1),STIFFNESS_MATRIX%COLUMNS,fpm, epsout, loop, emin,emax, m0, e, x, m, res, info)



RETURN

END SUBROUTINE

When I compile this code I get three errors:

C:\RMA\Programs\EFE_V1.0\ansys\SSSS.f90(24): error #8055: The procedure has a dummy argument that has the ALLOCATABLE, ASYNCHRONOUS, OPTIONAL, POINTER, TARGET, VALUE or VOLATILE attribute. Required explicit interface is missing from original source. [MATRIX]

C:\RMA\Programs\EFE_V1.0\ansys\SSSS.f90(24): error #8055: The procedure has a dummy argument that has the ALLOCATABLE, ASYNCHRONOUS, OPTIONAL, POINTER, TARGET, VALUE or VOLATILE attribute. Required explicit interface is missing from original source. [ROWS]

C:\RMA\Programs\EFE_V1.0\ansys\SSSS.f90(24): error #8055: The procedure has a dummy argument that has the ALLOCATABLE, ASYNCHRONOUS, OPTIONAL, POINTER, TARGET, VALUE or VOLATILE attribute. Required explicit interface is missing from original source. [COLUMNS]

If I uncomment the include statement I get the following error:

SSSS.f90

C:\Program Files\Intel\Composer XE 2013 SP1\mkl\include\mkl_solvers_ee.fi(459): error #8000: There is a conflict between local interface block and external interface block. [SA]

The type Sparse_Matrix is:

   TYPE SPARSE_MATRIX



       INTEGER::NUMBER_OF_ROWS=0



       INTEGER::NUMBER_OF_COLUMNS=0



       INTEGER::NUMBER_OF_NON_ZEROS=0



       INTEGER,ALLOCATABLE::ROWS(:)



       INTEGER,ALLOCATABLE::COLUMNS(:)



       REAL(8),ALLOCATABLE::MATRIX(:)

END TYPE SPARSE MATRIX

Any help in solving this issue would be gratefully received.

Thanks, ACAR

↧

advice for getting rid of temporary creation and fortran95 interface for dsecnd

June 29, 2014, 2:16 am

Latest and popular articles on Intel Technologies

≫ Next: mixing fortran77 and fortran95 interfaces

≪ Previous: Compilation Interface issue with dfeast_scsrev

I am using intel fortran compiler and intel mkl for a performance check. I am passing some array sections to Fortran 77 interface with calls like

<code>call dgemm( transa,transb,sz_s,P,P,&
            a, Ts_tilde,&
            sz_s,R_alpha,P,b,tr(:sz_s,:),sz_s)</code>

as evident, tr(:sz_s,:) is not contiguous in memory and the Fortran 77 interface is expecting a continuous block and creating a temporary for this.

What I was wondering is that will there be a difference if I create my temporary array explicitly in the code for tr and copy information from that temporary back and forth before and after the operation, or will that be the same as compiler itself creating the temporary from a performance point of view? I guess compiler will always be more efficient.

And of course any more suggestions to eliminate these temporaries are welcome.

One more point, If I use the Fortran 95 interface of the library apparently, with a similar call on a simpler test problem, no warning is issued for the creation of a temporary. Then I read in the manual of mkl that Fortran 95 interface uses assumed shape arrays which explains why temporaries are not created. Is this the logical way to continue if I can not reshape the above code to work with contiguous blocks?

However at that point, while testing the Fortran95 interfaces, I run into a problem with the support function dsecnd. With the below code using mkl_service module I am getting

dgemm95_test.f90(30): error #6404: This name does not have a type, and must have an explicit type. [DSECND]

t1 = dsecnd()

-----^

Any idea for this problem is also welcome. The simple code for the dsecnd problem is

<code>program dgemm95_test
! some modules for Fortran 95 interface
use mkl_service
use mkl95_precision
use mkl95_blas
!
implicit none
!
double precision, dimension(4,3) :: a
double precision, dimension(6,4) :: b
double precision, dimension(5,5) :: r ! result array
double precision, dimension(3,2) :: dummy_b
!
character(len=1) :: transa
character(len=1) :: transb
!
double precision :: alpha, beta, t1, t2, t
integer :: sz1, sz2

! initialize some variables
alpha = 1.0
beta = 0.0
a = 2.3
b = 4.5
r = 0.0
transa = 'n'
transb = 'n'
dummy_b = 0.0
! Fortran 95 interface
t1 = dsecnd()
call gemm( a, b(4:6,1:3:2), r(2:5,3:4),&
 transa, transb, alpha, beta )
t2 = dsecnd()
!
write(*,*) r
dummy_b  = r(2:4,4:5)
!
end program dgemm95_test</code>

Any help and advice on these points are highly appreciated.

Best regards,

Umut

↧

mixing fortran77 and fortran95 interfaces

June 30, 2014, 12:34 am

Latest and popular articles on Intel Technologies

≫ Next: fortran95 interface syevr return -1001 as info

≪ Previous: advice for getting rid of temporary creation and fortran95 interface for dsecnd

Dear all,

I programmed my code with the fortran 77 interfaces. However I was wondering if that is possible to mix fortran77 and fortran95 routines. For instance, looking at the sparse blas level 2 routine, mkl_dcsrmv, it is not explicitly stated that it has a fortran77 and fortran95 interface. It is only mentioned that it has a Fortran interface.

If this is possible how can I do this? Which modules should be used without conflicting?

Best regards,

Umut

↧

fortran95 interface syevr return -1001 as info

June 30, 2014, 6:15 am

Latest and popular articles on Intel Technologies

≫ Next: 2D FFT

≪ Previous: mixing fortran77 and fortran95 interfaces

Dear all,

As the subject line suggests, fortran95 interface of dsyevr, namely, syevr returns -1001 as info parameter. I could not understand what this means, could you please help me with this problem?

Best regards,

Umut

↧

2D FFT

June 30, 2014, 9:40 am

Latest and popular articles on Intel Technologies

≫ Next: inconsistent results from mkl_dcscmv and mkl_dcsrmv

≪ Previous: fortran95 interface syevr return -1001 as info

Hello,

I have been assigned the task of converting a matlab script to C++, and am currently working on the FFT part.

Given this input:

octave:67> r57

r57 =
   0.00000   0.20000   0.30000   0.30000   0.40000

   0.30000   0.30000   0.40000   0.40000   0.50000

   0.10000   0.30000   0.20000   0.40000   0.10000

   0.50000   0.50000   0.40000   0.30000   0.20000

   0.20000   0.20000   0.20000   0.20000   0.20000

   0.30000   0.20000   0.30000   0.20000   0.30000

   0.50000   0.50000   0.50000   0.50000   0.50000

I get this output:

octave:68> fft2(r57)

ans =
   10.90000 + 0.00000i   -0.46180 - 0.00000i   -0.23820 - 0.00000i   -0.23820 + 0.00000i   -0.46180 + 0.00000i

    0.79650 + 0.27359i   -0.55720 + 0.94400i   -0.89353 + 0.39737i   -0.10671 - 0.19919i   -0.34353 - 0.30982i

   -0.13330 + 1.20183i    0.50836 + 0.04556i    0.18356 + 0.35590i   -0.57328 - 0.00290i   -0.49515 + 0.11341i

   -1.91320 - 0.77347i   -0.54307 - 0.27387i    0.09683 - 0.23803i   -0.56867 - 0.10558i   -0.20760 - 0.41938i

   -1.91320 + 0.77347i   -0.20760 + 0.41938i   -0.56867 + 0.10558i    0.09683 + 0.23803i   -0.54307 + 0.27387i

   -0.13330 - 1.20183i   -0.49515 - 0.11341i   -0.57328 + 0.00290i    0.18356 - 0.35590i    0.50836 - 0.04556i

    0.79650 - 0.27359i   -0.34353 + 0.30982i   -0.10671 + 0.19919i   -0.89353 - 0.39737i   -0.55720 - 0.94400i

And now I am confused. The first column is as expected, with the second half computable from the 1st half. However, this is not the case for the remaining columns. The C++ code follows, but it doesn't work, among other things it assumes that the second half of the columns can be computed from the first half.

Not sure how to proceed, Is the octave (matlab clone) output correct/expected/common ? If not, if there is a problem with the test data, what would be more reasonable to test with ?

Any advice on how to handle this situation would be much appreciated. Any comments on the code also, I am not really sure that it is correct

Kind Regards,

MKL_LONG               mklNumSamples   [2] = { 5 /* columns */, 7 /* rows */};
MKL_LONG               mklInputStrides [3] = { 0, mklNumSamples[1] + 2, 1};
MKL_LONG               mklOutputStrides[3] = { 0, mklNumSamples[1] + 2, 1};
float                  mklBuffer       [(5+2)*(7+2)] = { 0 };

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();
DftiCreateDescriptor(&dftiHandle, DFTI_SINGLE, DFTI_REAL, 2, mklNumSamples);
DftiSetValue(dftiHandle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
DftiSetValue(dftiHandle, DFTI_INPUT_STRIDES, mklInputStrides);
DftiSetValue(dftiHandle, DFTI_OUTPUT_STRIDES, mklOutputStrides);
DftiCommitDescriptor(dftiHandle);
{

        /* Input is f, accessed as: f.size(), f[index] */
        size_t index[2] = { 0, 0 };

        for(index[0] = 0; index[0] < f.size()[0]; ++index[0]) /* column */

        {

            for(index[1] = 0; index[1] < f.size()[1]; ++index[1]) /* row */

            {

                mklBuffer[ index[0] * (f.size()[1]+2) + index[1]] = f[index]; /* fetches, f with column = index[0] and row = index[1]

            }

        }

    }
DftiComputeForward(dftiHandle, &mkliBuffer[0]);

{

        size_t index[2] = {0, 0 };

        for(index[0] = 0; index[0] < f.size()[0]; ++index[0])

        {

            for(index[1] = 0; index[1] < (f.size()[1] + 1)/2; ++index[1])

            {

                const PUVec2st index0(index[0], index[1]);

                const PUVec2st index1(index[0], f.size()[1] - index[1]);
                const float real =

                    mklBuffer[ index[0] * (f.size()[1] + 2) + index[1]*2+0];

                const float imag =

                    mklBuffer[ index[0] * (f.size()[1] + 2) + index[1]*2+1];
                if (index[1] == 0)

                {

                    result[index0] = std::complex<float>(real, imag);

                }

                else

                {

                    result[index0] = std::complex<float>(real, imag);

                    result[index1] = std::complex<float>(real, -imag);

                }

            }

        }

    }

↧

inconsistent results from mkl_dcscmv and mkl_dcsrmv

June 30, 2014, 12:15 pm

Latest and popular articles on Intel Technologies

≫ Next: how to tell mkl the underlying microarchitecture

≪ Previous: 2D FFT

Hello,

I tried to calculate the (sparse) matrix-vector product using mkl_dcscmv and mkl_dcsrmv. However, sometimes they gave me different result. Here is the example I was using:

Let

M : = [ 1 1 0 0 ], x := [ 1 ], and sols := [ 1 ]

[ 1 0 1 0 ] [ 1 ] [ 1 ]

[ 1 0 -1 0 ] [ 1 ] [ 1 ]

[ 0 1 0 0 ] [ 1 ] [ 1 ]

[ 0 0 1 0 ] [ 1 ].

I tried to calculate

test 1: sols := M x - sols,
test 2: sols := Mx, and
test 3: sols := Mx + sols

by using either mkl_dcscmv or mkl_dcsrmv. Here are the generated result:

=== Test 1: sols := M x - sols ===

(Mx - sols) using mkl_dcscmv (expected: [1 1 -1 0 0]^T)

sols[0] = 1.000000E+00.

sols[1] = 1.000000E+00.

sols[2] = -1.000000E+00.

sols[3] = 0.000000E+00.

sols[4] = 2.000000E+00.

(Mx - sols) using mkl_dcsrmv (expected: [1 1 -1 0 0]^T)

sols[0] = 1.000000E+00.

sols[1] = 1.000000E+00.

sols[2] = -1.000000E+00.

sols[3] = 0.000000E+00.

sols[4] = 0.000000E+00.

=== Test 2: sols := M x + 0 * sols ===

result 2.1: (Mx) using mkl_dcscmv (expected: [2 2 0 1 1]^T)

sols[0] = 2.000000E+00.

sols[1] = 2.000000E+00.

sols[2] = 0.000000E+00.

sols[3] = 1.000000E+00.

sols[4] = 2.000000E+00.

result 2.2: (Mx) using mkl_dcsrmv (expected: [2 2 0 1 1]^T)

sols[0] = 2.000000E+00.

sols[1] = 2.000000E+00.

sols[2] = 0.000000E+00.

sols[3] = 1.000000E+00.

sols[4] = 1.000000E+00.

=== Test 3: sols := M x + 1 * sols ===

result 3.1: (Mx + sol) using mkl_dcscmv (expected: [3 3 1 2 2]^T)

sols[0] = 3.000000E+00.

sols[1] = 3.000000E+00.

sols[2] = 1.000000E+00.

sols[3] = 2.000000E+00.

sols[4] = 2.000000E+00.

result: 3.2: (Mx + sol) using mkl_dcsrmv (expected: [3 3 1 2 2]^T)

sols[0] = 3.000000E+00.

sols[1] = 3.000000E+00.

sols[2] = 1.000000E+00.

sols[3] = 2.000000E+00.

sols[4] = 2.000000E+00.

===============================

My computational environment:

OS: Red Hat Enterprise Linux Server release 6.4 (Santiago)

Compiler and linker: Intel Composer-XE version 2013.2.146

GNU libc version: 2.12

===============================

Following is the complete code I was using for the test:

//===================

#include <stdlib.h>

#include <stdio.h>
#include <mkl.h>

int main (void)
{
#define NUM_ROWS 5
#define NUM_COLS 4
#define NUM_ENTS 8

    MKL_INT num_rows = NUM_ROWS;
    MKL_INT num_cols = NUM_COLS;
    MKL_INT num_ents = NUM_ENTS;
    MKL_INT i;

    double minus_one = -1.0;
    double one = +1.0;
    double zero = 0.0;
    char   notran = 'N';
    char   matdescra[4] = {'G', 'L', 'N', 'C'};

    /* M : = [ 1  1  0  0 ] x := [ 1 ] sols := [ 1 ]
     *       [ 1  0  1  0 ]      [ 1 ]         [ 1 ]
     *       [ 1  0 -1  0 ]      [ 1 ]         [ 1 ]
     *       [ 0  1  0  0 ]      [ 1 ]         [ 1 ]
     *       [ 0  0  1  0 ]                    [ 1 ]
     *
     */
    MKL_INT    M_bgn[NUM_COLS + 1] = {0, 3, 5, 8, 8};
    MKL_INT    M_idx[NUM_ENTS]     = {0, 1, 2, 0, 3, 1,  2, 4};
    double     M_val[NUM_ENTS]     = {1, 1, 1, 1, 1, 1, -1, 1};

    MKL_INT    MT_bgn[NUM_ROWS + 1] = {0, 2, 4, 6, 7, 8};
    MKL_INT    MT_idx[NUM_ENTS]     = {0, 1, 0, 0, 0, 2,  1, 2};
    double     MT_val[NUM_ENTS]     = {1, 1, 1, 1, 1, -1, 1, 1};

    double     sols[NUM_ROWS];
    double     x[NUM_COLS];

    /* initialize the solution and x */
#define INIT_SOL                                \
    for (i = 0; i < num_rows; ++i)              \
    {                                           \
        sols[i] = 1.0;                          \
    }                                           \
    for (i = 0; i < num_cols; ++i)              \
    {                                           \
        x[i] = 1.0;                             \
    }

#define PRINT_SOL(MSG)                          \
    printf("%s\n", MSG);                        \
    for (i = 0; i < num_rows; ++i)              \
    {                                           \
        printf("sols[%d] = %E.\n", i, sols[i]); \
    }

    /* test 1: compute sols := M x - sols
     *                       = [2 2 0 1 1]^T - [1 1 1 1 1]^T
     *                       = [1 1 -1 0 0]^T.
     */
    printf("\n=== Test 1: sols := M x - sols ===\n");
    INIT_SOL;
    /* test 1.1: using mkl_dcscmv */
    mkl_dcscmv(&notran, &num_rows, &num_cols, &one, matdescra,
        M_val, M_idx, M_bgn, M_bgn + 1, x, &minus_one, sols);
    PRINT_SOL("(Mx - sols) using mkl_dcscmv (expected: [1 1 -1 0 0]^T)");

    INIT_SOL;
    /* test 1.2: using mkl_dcsrmv */
    mkl_dcsrmv(&notran, &num_rows, &num_cols, &one, matdescra,
        MT_val, MT_idx, MT_bgn, MT_bgn + 1, x, &minus_one, sols);
    PRINT_SOL("(Mx - sols) using mkl_dcsrmv (expected: [1 1 -1 0 0]^T)");

    /* test 2: compute sols := M x + 0 * sols
     *                       = [2 2 0 1 1]^T - 0 * [1 1 1 1 1]^T
     *                       = [2 2 0 1 1]^T.
     */
    printf("\n=== Test 2: sols := M x + 0 * sols ===\n");
    INIT_SOL;
    /* test 2.1: using mkl_dcscmv */
    mkl_dcscmv(&notran, &num_rows, &num_cols, &one, matdescra,
        M_val, M_idx, M_bgn, M_bgn + 1, x, &zero, sols);
    PRINT_SOL("result 2.1: (Mx) using mkl_dcscmv (expected: [2 2 0 1 1]^T)");

    INIT_SOL;
    /* test 2.2: using mkl_dcsrmv */
    mkl_dcsrmv(&notran, &num_rows, &num_cols, &one, matdescra,
        MT_val, MT_idx, MT_bgn, MT_bgn + 1, x, &zero, sols);
    PRINT_SOL("result 2.2: (Mx) using mkl_dcsrmv (expected: [2 2 0 1 1]^T)");

    /* test 3: compute sols := M x + 1 * sols
     *                       = [2 2 0 1 1]^T + 1 * [1 1 1 1 1]^T
     *                       = [3 3 1 2 2]^T.
     */
    printf("\n=== Test 3: sols := M x + 1 * sols ===\n");
    INIT_SOL;
    /* test 3.1: using mkl_dcscmv */
    mkl_dcscmv(&notran, &num_rows, &num_cols, &one, matdescra,
        M_val, M_idx, M_bgn, M_bgn + 1, x, &one, sols);
    PRINT_SOL("result 3.1: (Mx + sol) using mkl_dcscmv (expected: [3 3 1 2 2]^T)");

    INIT_SOL;
    /* test 3.2: using mkl_dcsrmv */
    mkl_dcsrmv(&notran, &num_rows, &num_cols, &one, matdescra,
        MT_val, MT_idx, MT_bgn, MT_bgn + 1, x, &one, sols);
    PRINT_SOL("result: 3.2: (Mx + sol) using mkl_dcsrmv (expected: [3 3 1 2 2]^T)");

    return 0;
}

//===================

↧

how to tell mkl the underlying microarchitecture

June 30, 2014, 1:00 pm

Latest and popular articles on Intel Technologies

≫ Next: Run time problems with eigenvalue decomposition

≪ Previous: inconsistent results from mkl_dcscmv and mkl_dcsrmv

Hi:

I read the gcc document as well as intel compiler document, they both said that the default behavior will not detect the underlying microarchitecture in linux x86-64 (default is -xsse2). As a result, I need to put -march=nehalem in CXXFLAGS in g++ and put -xsse4.2 in CXXFLAGS in icpc. However, while linking with mkl, there is no flag to tell mkl that my microarchitecture is nehalem:

g++:

g++ -std=c++11 -O2 -march=nehalem -c main.cpp

g++ main.o -lmkl_rt

icpc:

icpc -std=c++11 -xsse4.2 -c main.cpp

icpc -mkl main.o

So, how to guarantee mkl can take full advantage of nehalem and get the highest performance?

Thank you very much

Chaowen GUO

↧