Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

Error in simmetric sparse matrix analysis step

$
0
0

I am a new user of a MKL, so question will be silly.

I am trying to use pardiso_64 function to solve a system with symmetric sparse matrix.

IDE is MS VS 2013, language is C++. Matrix size in test case is relatively small (about 4000 of rows).

Here is an error i am getting.







I read documentation about pardiso solvers, and as fas as i see, error -8 is an 32-bit integer overflow, but that doesn't make sense in my case.

What are the possible reasons and where should i start to locate an error?



Source code attached.



Thanks in advance.

AttachmentSize
Downloadsolver.cpp3.09 KB

Stack Overflow Error in Eigenvalue Solver: dfeast_scsrgv

$
0
0

Hi,

I'm trying to solve eigenvalue problem for large sparse matrices using dfeast_scsrgv function. The function works fine for small problems (ex: 8*8 sparse matrix) but it gives System.StackOverflowException error for larger problems( ex: 200*200 sparse matrix) . I'm using Visual Studio 2008 and MKL version 11 with most recent updates installed. My system is windows 64 bit and the programming language is C++. Following I provided the eigenvalue solver code that I'm using. In debug mode when I reach  dfeast_scsrgv  line it gives me Stack Overflow error. I do not think I am using any infinite loop or unnecessary large arrays. I would appreciate if someone can help me to fix the problem. Thanks!

   
//Convert stiffness and mass matrix to CSR format - Seldon library
int NumStiff = M_GStiff.GetDataSize();   
Vector<double> V_GStiffVal   (NumStiff);   
Vector<int>    V_GStiffColInd(NumStiff);
Vector<int>    V_GStiffRowPtr(PrbDim+1);
ConvertToCSR(M_GStiff, prop, V_GStiffRowPtr, V_GStiffColInd, V_GStiffVal);  
 
int NumMass = M_GMass.GetDataSize();
Vector<double> V_GMassVal   (NumMass);
Vector<int>    V_GMassColInd(NumMass);
Vector<int>    V_GMassRowPtr(PrbDim+1);
ConvertToCSR(M_GMass, prop, V_GMassRowPtr, V_GMassColInd, V_GMassVal);   
 
//Release memory
M_GStiff.Clear();
M_GMass.Clear();
 
//Convert Seldon format to typical C array
double* a = V_GStiffVal.GetData();
int*   ia = V_GStiffRowPtr.GetData();
int*   ja = V_GStiffColInd.GetData();   
 
double* b = V_GMassVal.GetData();
int*   ib = V_GMassRowPtr.GetData();
int*   jb = V_GMassColInd.GetData();
 
// Convert matrix from 0-based C-notation to Fortran 1-based
int nnz = ia[PrbDim]; 
for (int i = 0; i < PrbDim+1; i++)        ia[i] += 1;    
for (int i = 0; i < nnz; i++)                 ja[i] += 1;  
 
for (int i = 0; i < PrbDim+1; i++)      ib[i] += 1;
for (int i = 0; i < nnz; i++)              jb[i] += 1;
 
// Initialize variables for the solver
double Error    = 0;
int    Loop     = 0;
int    NumMode  = 10;
double Emin     = 0;
double Emax     = pow(10.0,10.0);    
int    Flag     = 0;
char   MTyp     = 'U';
int    NumEigen = NumMode;
 
vector<int>    V_FPM (128,0);
vector<double> V_Eigen(NumMode,0);
vector<double> V_Res (NumMode,0);    
 
V_FPM[0]  = 1;
V_FPM[1]  = 8;
V_FPM[2]  = 12;
V_FPM[3]  = 20;
V_FPM[4]  = 0;
V_FPM[5]  = 0;
V_FPM[6]  = 5;
V_FPM[13] = 0;
V_FPM[63] = 0;
 
int*    P_FPM          = &V_FPM[0];
double* P_Eigen        = &V_Eigen[0];  
double* P_Res          = &V_Res[0];    
double dDum;
 
// Call Eigenvalue Solver
dfeast_scsrgv (&MTyp, &PrbDim, a, ia, ja, b, ib, jb,
                P_FPM, &Error, &Loop, &Emin, &Emax, &NumMode, P_Eigen, &dDum, &NumEigen, P_Res, &Flag);

Preconditioners for banded matrix (diagonal storage format)?

$
0
0

All,

Is there a built-in routine for computing a preconditioner for a sparse matrix in diagonal storage format or what other alternatives exist? I plan on using it with the fmgres routine.

Thanks

M.

Webinar announcement: A Tour of the Sparse Linear Algebra Functionality in Intel MKL

$
0
0

Sparse matrix algorithms are encountered in a broad range of important scientific computing applications. Intel MKL offers a powerful set of functions that can be used to build a complete solution to many sparse linear systems. This webinar gives a tour of MKL’s sparse linear algebra components. Highlights include Sparse BLAS functions, Direct solvers for sparse linear systems, Iterative solvers, and Eigensolvers for sparse matrices based on the FEAST algorithm.

Register for the webinar now: https://www1.gotomeeting.com/register/389177449

MKL FFT and FFTW

$
0
0

Does the MKL Library use FFTW internally?

If there are other facilities as well, are there anmy limitations in the radixes?

Thanks

 

Speed of solvers with CSR format

$
0
0

 

 I have been using MKL in composer 2013. Especially, pardiso and preconditioned conjugate gradient solver with CSR format to solve symmetric matrices

 I wonder that using full element CSR format is much faster than having half CSR format.

 Also, there is a example with jacobi precondition CG in MKL folder.

Are there another preconditioned CG examples?

Linking problem

$
0
0

Hello, 

I am trying to solve a linear system of equations using LAPACK ROUTINES and I have tried to link but some of the codes seem to work whiles others do not. I get the error when  use the CALL GESV. 

design.f90(477): error #6285: There is no matching specific subroutine for this

generic subroutine call.   [GESV]

                CALL GESV(xtemp, ztemp)

---------------------^

compilation aborted for design.f90 (code 1)

This does not happen when i change CALL GESV to CALL GETRF but then CALL GETRS will also no work. xtemp is a 12 x 12 matrix whiles ztemp is a vector of 12 elements. 

This is the coding 

		PROGRAM DESIGN

		USE Prop_mod
		USE blas95
		USE f95_precision
		USE lapack95

		IMPLICIT NONE

		!DECLARATIONS OF VARIABLES

......

		DO i = 1, num
			WRITE(30, '(*(G0.4, ",", :))') xtemp(i, :)
		ENDDO
		PAUSE

		ALLOCATE(ipiv(num))
		!CALL getrf(xtemp, ipiv)
		!CALL getrs(xtemp, ipiv, ztemp)

		CALL gesv(xtemp, ztemp)
		WRITE(*, *)
		WRITE(*, *) xtemp
		WRITE(30, *)
		WRITE(30, *)
		DO i = 1, num
			WRITE(31, '(*(G0.4, ",", :))') xtemp(i, :)
		ENDDO
		PAUSE

		WRITE(*,*) ztemp
		PAUSE

		WRITE(*, *)

 

When i combine Call getrf and getrs, only getrf works and the solution subroutine "getrs" fails. 

Call gesv does not work at all. I really need someones help on this.

This is what i type to compile and link the codes

C:\Program Files (x86)\Intel\Composer XE 2013 SP1\Projects>ifort Props_mod.f90 d

esign.f90 /traceback mkl_rt.lib mkl_blas95_ilp64.lib mkl_lapack95_ilp64.lib libi

omp5md.lib /exe:ex1.exe

Am I not linking properly? I have tried all i know of but it is still not working. 

Pardiso L and U factors

$
0
0

I have been trying to figure out a way to extract the L and U factors of a matrix using pardiso. I read somewhere that this is not possible, but I just wanted to check with people here to see if there is a way to do it.


Thanks for the help!


Memory doubled when calling zgetrf in Csharp,

$
0
0

Hi,

I'm solving a large complex dense matrix (10k*10k) in Csharp by calling zgetrf first to LU-decompse the matrix. If I understand correctly, zgetrf will do a in-place LU-decomposition. So there should be no extra copies produced. But the memory usage becomes doubled (from 1.6G to more than 3G) when zgetrf is running. Can someone kindly tell me the reason? It would be great if I can figure out a solution to avoid the extra memory usage.

Here is the code

class Program
    {


        static void Main(string[] args)
        {
            int n1 = 10000;
            int n2 = n1 * n1;
            complex[] A = new complex[n2];
            complex[] b = new complex[n1];

            for (int i = 0; i < n1; i++)
            {
                for (int j = 0; j < n1; j++)
                {
                    int ij = j * n1 + i;
                    A[ij].r = i + 1;
                    A[ij].i = j + 1;
                }
                b[i].r = i + 1;
                b[i].i = i + 2;
            }

            int info = 0;
            int[] ipiv = new int[n1];
            MKL.LU_decomposition(A, ref n1, ref n2, ipiv, ref info);
            MKL.LU_solve(A, b, ref n1, ref n2, ipiv, ref info);
        }


        [SuppressUnmanagedCodeSecurity]
        public sealed class MKL
        {
            private MKL() { }

            [DllImport("customMKL_Sequential.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)]
            public static extern void LU_decomposition([In, Out] complex[] A, ref int n1, ref int n2, [In, Out] int[] ipiv, ref int info);

            [DllImport("customMKL_Sequential.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)]
            public static extern void LU_solve([In] complex[] A, [In, Out] complex[] b, ref int n1, ref int n2, [In] int[] ipiv, ref int info);
        }


        [StructLayout(LayoutKind.Sequential)]/* Force sequential mapping */
        public struct complex
        {
            [MarshalAs(UnmanagedType.R8, SizeConst = 64)]
            public double r;
            [MarshalAs(UnmanagedType.R8, SizeConst = 64)]
            public double i;
        };
    }
subroutine LU_decomposition (A, n1, n2, ipiv, info)
    implicit none
    !dec$ attributes dllexport :: LU_decomposition
    !dec$ attributes alias:'LU_decomposition' :: LU_decomposition

    integer, intent(in) :: n1, n2
    integer, intent(inout) :: info
    integer, dimension(n1), intent(inout) :: ipiv
    complex(8), dimension(n2), intent(inout) :: A

    call zgetrf(n1, n1, A, n1, ipiv, info)
end subroutine


subroutine LU_solve (A, b, n1, n2, ipiv, info)
    implicit none
    !dec$ attributes dllexport :: LU_solve
    !dec$ attributes alias:'LU_solve' :: LU_solve

    integer, intent(in) :: n1, n2
    integer, intent(inout) :: info
    integer, dimension(n1), intent(in) :: ipiv
    complex(8), dimension(n2), intent(inout) :: A
    complex(8), dimension(n1), intent(inout) :: b

    call zgetrs('N', n1, 1, A, n1, ipiv, b, n1, info)
end subroutine





 





 

Thanks alot in advance





 

VSL Summary Statistics VSL_SS_SUM error

$
0
0

Pls let me know why the code bellow shows invalid results(sum) when NR is greater than 1999.

#define NR  2000   // MEAN=1 but invalid SUM !

//#define NR  1999 // MEAN=1 and SUM==1999 of course

#define DIM 4

void Test()

{

    VSLSSTaskPtr task;

    int errcode,dim = DIM, n=NR,x_storage=VSL_SS_MATRIX_STORAGE_COLS;

    double x[NR*DIM],mean[DIM], sum[DIM],W[2];

    W[0]=0; W[1]=0;

    for(int j=0;j<DIM;++j)   {mean[j]=0;sum[j]=0;}

    for(int i=0;i<NR*DIM;++i) x[i]=1;

    errcode = vsldSSNewTask( &task, &dim, &n, &x_storage, (double*)x, 0, 0 );

    errcode = vsldSSEditTask(task,VSL_SS_ED_ACCUM_WEIGHT,W);

    errcode = vsldSSEditTask(task,VSL_SS_ED_MEAN,mean);

    errcode = vsldSSEditTask(task,VSL_SS_ED_SUM,sum);

    errcode = vsldSSCompute( task, VSL_SS_MEAN | VSL_SS_SUM, VSL_SS_METHOD_FAST );

    for(int i=0;i<dim;++i)  printf("M[%d] %g  S[%d] %g\n",i,mean[i],i, sum[i]);

    errcode = vslSSDeleteTask( &task );

}

The result when NR=2000 is:

M[0] 1  S[0] -2.65698e+303

M[1] 1  S[1] -2.65698e+303

M[2] 1  S[2] -2.65698e+303

M[3] 1  S[3] -2.65698e+303

I got this result under

  Windows 8.1

  Intel(R) Math Kernel Library Version 11.1.3 Product Build 20140416 for 32-bit applications

 

Segfault in the dtpmqrt routine

$
0
0

Hello,

While testing out MKL Lapack's dtpqrt and dtpmqrt routines, I've stumbled across a weird segfault. I replicated the error in this example (I should mention that I use Eigen just to make my life easier and that populateEigenMat is just creating a random matrix).

The problem is that for different value of the parameter m (the number of rows of the matrix B in Lapack's reference for dtpqrt and dtmqrt), the code either works (small values of m, and the result is correct)  or it creates a segfault (larger values of m). 

int main()
{

  int m =150;
  int n = 5;
  int nb = 1;
  int l = 0;
  int info;

  MatrixXd a(n,n), b(m,n), t(nb, n);
  populateEigenMat(a), populateEigenMat(b);
  a = a.eval().triangularView<Eigen::Upper>();

  info = LAPACKE_dtpqrt(LAPACK_COL_MAJOR, m, n, l, nb, a.data(), a.rows(),
  		       b.data(), b.rows(), t.data(), t.rows());

  int k = n;

  MatrixXd cA(k,n), cB(m,n), c(k+m,n);
  populateEigenMat(cA), populateEigenMat(cB);
  c<<cA,cB;

  cout<<"Still ok!\n";
  // cout<<endl<<cB<<endl<<endl<<c.block(k,0,m,n)<<endl<<endl;
  info = LAPACKE_dtpmqrt(LAPACK_COL_MAJOR, 'L', 'N', m, n, k, l, nb, b.data(), b.rows(), t.data(), t.rows(),
  			 c.data(), c.rows(), c.data()+k, c.rows());

}

I'm using version of Intel Composer: composer_xe_2013_sp1.2.144 and the following links -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -std=c++11 -xAVX -DMKL_ILP64.

Could some please tell me where I've made a mistake?

Kind regards

How to compile the sample code in minGW complier?

$
0
0

Hi all,

I am using minGW as the compiler and our code has a lot of FFT. In the past we use fftw and we want to try Intel's fft to improve it. 

I download the trial version of the MKL and install it on my computer (64 bit Windows 8). I also download the first sample code from the website. (https://software.intel.com/sites/products/documentation/hpc/mkl/mklman/G...)

However, I don't know how to compile it in minGW. In the pass, we just input in the windows command window: gfortran -o test.exe -O2 testfortran.f -LC:\\Windows\system -lfftw3-3. This can link our code to the fftw's fft library. But I don't know how to link the code to the new library in minGW compiler. 

Could some please tell me where I've made a mistake? We appreciate your help.

We just want to do the forward and backward fft. If there is an easy example of Fortran code just tells us how to use the Intel's forward fft and backward fft, it would be very helpful. 

Best regards,

PARDISO gives wrong answers when given too many rhs's

$
0
0

When I call PARDISO with too many rhs's I get wrong answers.


I first call PARDISO for phase 11 and phase 22. I keep the factored matrix in memory, and, subsequently leave the subroutine that calls PARDISO.


I later call PARDISO to perform only phase 33. I send in a fully populated, "stacked" rhs vector (i.e. total number of entries is neq*nrhs).


Here's what I've found:


Case 1: When I call PARDISO for phase 33 only and all 9,850 rhs's, the PARDISO returns completely wrong answers. Many NaN's and all other completely wrong vales.


Also, after the call to PARDISO, I cannot successfully deallocate some arrays (these arrays are totally unrelated to the matrices used by PARDISO). The message is


"Invalid pointers" I suspect that PARDISO has overwritten some of the pointers (and/or values in these arrays that I cannot deallocate. Perhaps a memory leak????


Case 2: When I call PARDISO for phase 33 only, sending in 3,000 rhs's at time (i.e I call PARDISO 4 times (1st 3000, 2nd 3000, 3rd 3000, 4th 3000, and, then the remaining 850 rhs) I get wrong answers,


BUT, the answers are almost reasonable. There are no NaN's and without knowing I might guesss the answers are correct. In the Case 2, the deallocation problem, mentioned in case 1, does NOT occur.


Case 3: When I call PARDISO for phase 33 only, sending in 100 rhs's at a time (i.e. I call PARDISO 99 times) I get the correct answers for everything, and, there are no problems.


===============


What is causing this problem? I am concerned that other models (let's say I've got 500,000 equations) may provide wrong answers, and, I won't know it (i.e. I have no way


of knowing the largest number of rhs's that I can send into PARDISO in a single chunk, for a general model.


==============


I know that I'm not running out of central memory, and, I know that I'm not running out of swap space. Do I have a number of threads problem??


=====================


MKL: Version 10.3 Update 4


Operating System: Red Hat Enterprise Linux AS release 4 (Nahant Update 8)


Environment Variables set:
MKL_NUM_THREADS = 32


Computer has 32 processors


Computer has 198 Gb of memory


Computer has 68 Gb of available swap space


==================================================


I am using solution PARDISO mtype=6 (i.e. double precision complex, symmetric, in-core only)


Number of equations is 183,180


Needed number of rhs's = 9,850


==================================================


Any help would be GREATLY appreciated.


Thanks, Bob

Compilation Interface issue with dfeast_scsrev

$
0
0

Hi,

I recently upgrades to XE2013 to get access to the extended eigensolvers in the MKL11 libraries.

I am trying to use dfeast_scsrev but am getting some compilation errors.

Here is my code:

    SUBROUTINE TTTT(IVECT,STIFFNESS_MATRIX)

    USE SPARSE_MATRIX_CLASS

    

 !   INCLUDE 'mkl_solvers_ee.fi'

    

      

    INTEGER::fpm(128)

    REAL(8)::EMIN,EMAX,EPSOUT

    REAL(8),ALLOCATABLE::E(:),X(:,:),RES(:)

    INTEGER::M0,LOOP,INFO,M

     

    INTEGER::IVECT(*)

    TYPE(SPARSE_MATRIX)::STIFFNESS_MATRIX

  

    CALL SPARSE_MATRIX_STORAGE('CSR',STIFFNESS_MATRIX)

   

    call feastinit (fpm)

           

    EMIN=0D0 ; EMAX=1000D0 ; M0=ivect(12)

    ALLOCATE(E(M0),X(IVECT(12),M0),RES(M0))

      

    call dfeast_scsrev('L',ivect(12),STIFFNESS_MATRIX%MATRIX,STIFFNESS_MATRIX%ROWS(1:ivect(12)+1),STIFFNESS_MATRIX%COLUMNS,fpm, epsout, loop, emin,emax, m0, e, x, m, res, info)

  

    RETURN

    END SUBROUTINE

When I compile this code I get three errors:

C:\RMA\Programs\EFE_V1.0\ansys\SSSS.f90(24): error #8055: The procedure has a dummy argument that has the ALLOCATABLE, ASYNCHRONOUS, OPTIONAL, POINTER, TARGET, VALUE or VOLATILE attribute. Required explicit interface is missing from original source.   [MATRIX]

C:\RMA\Programs\EFE_V1.0\ansys\SSSS.f90(24): error #8055: The procedure has a dummy argument that has the ALLOCATABLE, ASYNCHRONOUS, OPTIONAL, POINTER, TARGET, VALUE or VOLATILE attribute. Required explicit interface is missing from original source.   [ROWS]

C:\RMA\Programs\EFE_V1.0\ansys\SSSS.f90(24): error #8055: The procedure has a dummy argument that has the ALLOCATABLE, ASYNCHRONOUS, OPTIONAL, POINTER, TARGET, VALUE or VOLATILE attribute. Required explicit interface is missing from original source.   [COLUMNS]

 

If I uncomment the include statement I get the following error:

SSSS.f90

C:\Program Files\Intel\Composer XE 2013 SP1\mkl\include\mkl_solvers_ee.fi(459): error #8000:  There is a conflict between local interface block and external interface block.   [SA]

The type Sparse_Matrix is:

    TYPE SPARSE_MATRIX

    

        INTEGER::NUMBER_OF_ROWS=0

        

        INTEGER::NUMBER_OF_COLUMNS=0

        

        INTEGER::NUMBER_OF_NON_ZEROS=0

        

        INTEGER,ALLOCATABLE::ROWS(:)

        

        INTEGER,ALLOCATABLE::COLUMNS(:)

    

        REAL(8),ALLOCATABLE::MATRIX(:)

    END TYPE SPARSE MATRIX

Any help in solving this issue would be gratefully received.

Thanks, ACAR

 

 

advice for getting rid of temporary creation and fortran95 interface for dsecnd

$
0
0

I am using intel fortran compiler and intel mkl for a performance check. I am passing some array sections to Fortran 77 interface with calls like

<code>call dgemm( transa,transb,sz_s,P,P,&
            a, Ts_tilde,&
            sz_s,R_alpha,P,b,tr(:sz_s,:),sz_s)</code>

as evident, tr(:sz_s,:) is not contiguous in memory and the Fortran 77 interface is expecting a continuous block and creating a temporary for this.

What I was wondering is that will there be a difference if I create my temporary array explicitly in the code for tr and copy information from that temporary back and forth before and after the operation, or will that be the same as compiler itself creating the temporary from a performance point of view? I guess compiler will always be more efficient.

And of course any more suggestions to eliminate these temporaries are welcome.

One more point, If I use the Fortran 95 interface of the library apparently, with a similar call on a simpler test problem, no warning is issued for the creation of a temporary. Then I read in the manual of mkl that Fortran 95 interface uses assumed shape arrays which explains why temporaries are not created. Is this the logical way to continue if I can not reshape the above code to work with contiguous blocks?

However at that point, while testing the Fortran95 interfaces, I run into a problem with the support function dsecnd. With the below code using mkl_service module I am getting

dgemm95_test.f90(30): error #6404: This name does not have a type, and must have an explicit type.   [DSECND]

t1 = dsecnd()

-----^

Any idea for this problem is also welcome. The simple code for the dsecnd problem is

<code>program dgemm95_test
! some modules for Fortran 95 interface
use mkl_service
use mkl95_precision
use mkl95_blas
!
implicit none
!
double precision, dimension(4,3) :: a
double precision, dimension(6,4) :: b
double precision, dimension(5,5) :: r ! result array
double precision, dimension(3,2) :: dummy_b
!
character(len=1) :: transa
character(len=1) :: transb
!
double precision :: alpha, beta, t1, t2, t
integer :: sz1, sz2

! initialize some variables
alpha = 1.0
beta = 0.0
a = 2.3
b = 4.5
r = 0.0
transa = 'n'
transb = 'n'
dummy_b = 0.0
! Fortran 95 interface
t1 = dsecnd()
call gemm( a, b(4:6,1:3:2), r(2:5,3:4),&
 transa, transb, alpha, beta )
t2 = dsecnd()
!
write(*,*) r
dummy_b  = r(2:4,4:5)
!
end program dgemm95_test</code>

Any help and advice on these points are highly appreciated.

Best regards,

Umut


mixing fortran77 and fortran95 interfaces

$
0
0

Dear all,

I programmed my code with the fortran 77 interfaces. However I was wondering if that is possible to mix fortran77 and fortran95 routines. For instance, looking at the sparse blas level 2 routine, mkl_dcsrmv, it is not explicitly stated that it has a fortran77 and fortran95 interface. It is only mentioned that it has a Fortran interface.

If this is possible how can I do this? Which modules should be used without conflicting?

Best regards,

Umut

fortran95 interface syevr return -1001 as info

$
0
0

Dear all,

As the subject line suggests, fortran95 interface of dsyevr, namely, syevr returns -1001 as info parameter. I could not understand what this means, could you please help me with this problem?

Best regards,

Umut

2D FFT

$
0
0

Hello,

I have been assigned the task of converting a matlab script to C++, and am currently working on the FFT part.

Given this input:

octave:67> r57

r57 =

   0.00000   0.20000   0.30000   0.30000   0.40000

   0.30000   0.30000   0.40000   0.40000   0.50000

   0.10000   0.30000   0.20000   0.40000   0.10000

   0.50000   0.50000   0.40000   0.30000   0.20000

   0.20000   0.20000   0.20000   0.20000   0.20000

   0.30000   0.20000   0.30000   0.20000   0.30000

   0.50000   0.50000   0.50000   0.50000   0.50000

I get this output:

octave:68> fft2(r57)

ans =

   10.90000 +  0.00000i   -0.46180 -  0.00000i   -0.23820 -  0.00000i   -0.23820 +  0.00000i   -0.46180 +  0.00000i

    0.79650 +  0.27359i   -0.55720 +  0.94400i   -0.89353 +  0.39737i   -0.10671 -  0.19919i   -0.34353 -  0.30982i

   -0.13330 +  1.20183i    0.50836 +  0.04556i    0.18356 +  0.35590i   -0.57328 -  0.00290i   -0.49515 +  0.11341i

   -1.91320 -  0.77347i   -0.54307 -  0.27387i    0.09683 -  0.23803i   -0.56867 -  0.10558i   -0.20760 -  0.41938i

   -1.91320 +  0.77347i   -0.20760 +  0.41938i   -0.56867 +  0.10558i    0.09683 +  0.23803i   -0.54307 +  0.27387i

   -0.13330 -  1.20183i   -0.49515 -  0.11341i   -0.57328 +  0.00290i    0.18356 -  0.35590i    0.50836 -  0.04556i

    0.79650 -  0.27359i   -0.34353 +  0.30982i   -0.10671 +  0.19919i   -0.89353 -  0.39737i   -0.55720 -  0.94400i

 And now I am confused. The first column is as expected, with the second half computable from the 1st half. However, this is not the case for the remaining columns. The C++ code follows, but it doesn't work, among other things it assumes that the second half of the columns can be computed from the first half.

 Not sure how to proceed, Is the octave (matlab clone) output correct/expected/common ? If not, if there is a problem with the test data, what would be more reasonable to test with ?

 Any advice on how to handle this situation would be much appreciated. Any comments on the code also, I am not really sure that it is correct

Kind Regards,

 

       

MKL_LONG               mklNumSamples   [2] = { 5 /* columns */, 7 /* rows */};
MKL_LONG               mklInputStrides [3] = { 0, mklNumSamples[1] + 2, 1};
MKL_LONG               mklOutputStrides[3] = { 0, mklNumSamples[1] + 2, 1};
float                  mklBuffer       [(5+2)*(7+2)] = { 0 };

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();

DftiCreateDescriptor(&dftiHandle, DFTI_SINGLE, DFTI_REAL, 2, mklNumSamples);

DftiSetValue(dftiHandle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);

DftiSetValue(dftiHandle, DFTI_INPUT_STRIDES, mklInputStrides);

DftiSetValue(dftiHandle, DFTI_OUTPUT_STRIDES, mklOutputStrides);

DftiCommitDescriptor(dftiHandle);

{

        /* Input is f, accessed as: f.size(), f[index] */

        size_t index[2] = { 0, 0 };

        for(index[0] = 0; index[0] < f.size()[0]; ++index[0]) /* column */

        {

            for(index[1] = 0; index[1] < f.size()[1]; ++index[1]) /* row */

            {

                mklBuffer[ index[0] * (f.size()[1]+2) + index[1]] = f[index]; /* fetches, f with column = index[0] and row = index[1]

            }

        }

    }

DftiComputeForward(dftiHandle, &mkliBuffer[0]);

 

{

        size_t index[2] = {0, 0 };

        for(index[0] = 0; index[0] < f.size()[0]; ++index[0])

        {

            for(index[1] = 0; index[1] < (f.size()[1] + 1)/2; ++index[1])

            {

                const PUVec2st index0(index[0], index[1]);

                const PUVec2st index1(index[0], f.size()[1] - index[1]);

                const float real =

                    mklBuffer[ index[0] * (f.size()[1] + 2) + index[1]*2+0];

                const float imag =

                    mklBuffer[ index[0] * (f.size()[1] + 2) + index[1]*2+1];

                if (index[1] == 0)

                {

                    result[index0] = std::complex<float>(real,  imag);

                }

                else

                {

                    result[index0] = std::complex<float>(real,  imag);

                    result[index1] = std::complex<float>(real, -imag);

                }

            }

        }

    }

 

 

 

 

inconsistent results from mkl_dcscmv and mkl_dcsrmv

$
0
0

Hello,

I tried to calculate the (sparse) matrix-vector product using mkl_dcscmv and mkl_dcsrmv. However, sometimes they gave me different result. Here is the example I was using:

Let 

M : = [ 1  1  0  0 ], x := [ 1 ], and  sols := [ 1 ]  

         [ 1  0  1  0 ]         [ 1 ]                      [ 1 ]

         [ 1  0 -1  0 ]         [ 1 ]                      [ 1 ]

         [ 0  1  0  0 ]         [ 1 ]                      [ 1 ]

         [ 0  0  1  0 ]                                     [ 1 ].

I tried to calculate

  1. test 1: sols := M x - sols,
  2. test 2: sols := Mx, and
  3. test 3: sols := Mx + sols

 by using either mkl_dcscmv or mkl_dcsrmv. Here are the generated result:

=== Test 1: sols := M x - sols ===

(Mx - sols) using mkl_dcscmv (expected: [1 1 -1 0 0]^T)

sols[0] = 1.000000E+00.

sols[1] = 1.000000E+00.

sols[2] = -1.000000E+00.

sols[3] = 0.000000E+00.

sols[4] = 2.000000E+00.

(Mx - sols) using mkl_dcsrmv (expected: [1 1 -1 0 0]^T)

sols[0] = 1.000000E+00.

sols[1] = 1.000000E+00.

sols[2] = -1.000000E+00.

sols[3] = 0.000000E+00.

sols[4] = 0.000000E+00.

=== Test 2: sols := M x + 0 * sols ===

result 2.1: (Mx) using mkl_dcscmv (expected: [2 2 0 1 1]^T)

sols[0] = 2.000000E+00.

sols[1] = 2.000000E+00.

sols[2] = 0.000000E+00.

sols[3] = 1.000000E+00.

sols[4] = 2.000000E+00.

result 2.2: (Mx) using mkl_dcsrmv (expected: [2 2 0 1 1]^T)

sols[0] = 2.000000E+00.

sols[1] = 2.000000E+00.

sols[2] = 0.000000E+00.

sols[3] = 1.000000E+00.

sols[4] = 1.000000E+00.

=== Test 3: sols := M x + 1 * sols ===

result 3.1: (Mx + sol) using mkl_dcscmv (expected: [3 3 1 2 2]^T)

sols[0] = 3.000000E+00.

sols[1] = 3.000000E+00.

sols[2] = 1.000000E+00.

sols[3] = 2.000000E+00.

sols[4] = 2.000000E+00.

result: 3.2: (Mx + sol) using mkl_dcsrmv (expected: [3 3 1 2 2]^T)

sols[0] = 3.000000E+00.

sols[1] = 3.000000E+00.

sols[2] = 1.000000E+00.

sols[3] = 2.000000E+00.

sols[4] = 2.000000E+00.

===============================

My computational environment:

OS: Red Hat Enterprise Linux Server release 6.4 (Santiago)

Compiler and linker: Intel Composer-XE version 2013.2.146

GNU libc version: 2.12

===============================

Following is the complete code I was using for the test:

//===================

#include <stdlib.h>

#include <stdio.h>
#include <mkl.h>

int main (void)
{
#define NUM_ROWS 5
#define NUM_COLS 4
#define NUM_ENTS 8

    MKL_INT num_rows = NUM_ROWS;
    MKL_INT num_cols = NUM_COLS;
    MKL_INT num_ents = NUM_ENTS;
    MKL_INT i;

    double minus_one = -1.0;
    double one = +1.0;
    double zero = 0.0;
    char   notran = 'N';
    char   matdescra[4] = {'G', 'L', 'N', 'C'};

    /* M : = [ 1  1  0  0 ] x := [ 1 ] sols := [ 1 ]
     *       [ 1  0  1  0 ]      [ 1 ]         [ 1 ]
     *       [ 1  0 -1  0 ]      [ 1 ]         [ 1 ]
     *       [ 0  1  0  0 ]      [ 1 ]         [ 1 ]
     *       [ 0  0  1  0 ]                    [ 1 ]
     *
     */
    MKL_INT    M_bgn[NUM_COLS + 1] = {0, 3, 5, 8, 8};
    MKL_INT    M_idx[NUM_ENTS]     = {0, 1, 2, 0, 3, 1,  2, 4};
    double     M_val[NUM_ENTS]     = {1, 1, 1, 1, 1, 1, -1, 1};

    MKL_INT    MT_bgn[NUM_ROWS + 1] = {0, 2, 4, 6, 7, 8};
    MKL_INT    MT_idx[NUM_ENTS]     = {0, 1, 0, 0, 0, 2,  1, 2};
    double     MT_val[NUM_ENTS]     = {1, 1, 1, 1, 1, -1, 1, 1};

    double     sols[NUM_ROWS];
    double     x[NUM_COLS];

    /* initialize the solution and x */
#define INIT_SOL                                \
    for (i = 0; i < num_rows; ++i)              \
    {                                           \
        sols[i] = 1.0;                          \
    }                                           \
    for (i = 0; i < num_cols; ++i)              \
    {                                           \
        x[i] = 1.0;                             \
    }

#define PRINT_SOL(MSG)                          \
    printf("%s\n", MSG);                        \
    for (i = 0; i < num_rows; ++i)              \
    {                                           \
        printf("sols[%d] = %E.\n", i, sols[i]); \
    }

    /* test 1: compute sols := M x - sols
     *                       = [2 2 0 1 1]^T - [1 1 1 1 1]^T
     *                       = [1 1 -1 0 0]^T.
     */
    printf("\n=== Test 1: sols := M x - sols ===\n");
    INIT_SOL;
    /* test 1.1: using mkl_dcscmv */
    mkl_dcscmv(&notran, &num_rows, &num_cols, &one, matdescra,
        M_val, M_idx, M_bgn, M_bgn + 1, x, &minus_one, sols);
    PRINT_SOL("(Mx - sols) using mkl_dcscmv (expected: [1 1 -1 0 0]^T)");

    INIT_SOL;
    /* test 1.2: using mkl_dcsrmv */
    mkl_dcsrmv(&notran, &num_rows, &num_cols, &one, matdescra,
        MT_val, MT_idx, MT_bgn, MT_bgn + 1, x, &minus_one, sols);
    PRINT_SOL("(Mx - sols) using mkl_dcsrmv (expected: [1 1 -1 0 0]^T)");

    /* test 2: compute sols := M x + 0 * sols
     *                       = [2 2 0 1 1]^T - 0 * [1 1 1 1 1]^T
     *                       = [2 2 0 1 1]^T.
     */
    printf("\n=== Test 2: sols := M x + 0 * sols ===\n");
    INIT_SOL;
    /* test 2.1: using mkl_dcscmv */
    mkl_dcscmv(&notran, &num_rows, &num_cols, &one, matdescra,
        M_val, M_idx, M_bgn, M_bgn + 1, x, &zero, sols);
    PRINT_SOL("result 2.1: (Mx) using mkl_dcscmv (expected: [2 2 0 1 1]^T)");

    INIT_SOL;
    /* test 2.2: using mkl_dcsrmv */
    mkl_dcsrmv(&notran, &num_rows, &num_cols, &one, matdescra,
        MT_val, MT_idx, MT_bgn, MT_bgn + 1, x, &zero, sols);
    PRINT_SOL("result 2.2: (Mx) using mkl_dcsrmv (expected: [2 2 0 1 1]^T)");

    /* test 3: compute sols := M x + 1 * sols
     *                       = [2 2 0 1 1]^T + 1 * [1 1 1 1 1]^T
     *                       = [3 3 1 2 2]^T.
     */
    printf("\n=== Test 3: sols := M x + 1 * sols ===\n");
    INIT_SOL;
    /* test 3.1: using mkl_dcscmv */
    mkl_dcscmv(&notran, &num_rows, &num_cols, &one, matdescra,
        M_val, M_idx, M_bgn, M_bgn + 1, x, &one, sols);
    PRINT_SOL("result 3.1: (Mx + sol) using mkl_dcscmv (expected: [3 3 1 2 2]^T)");

    INIT_SOL;
    /* test 3.2: using mkl_dcsrmv */
    mkl_dcsrmv(&notran, &num_rows, &num_cols, &one, matdescra,
        MT_val, MT_idx, MT_bgn, MT_bgn + 1, x, &one, sols);
    PRINT_SOL("result: 3.2: (Mx + sol) using mkl_dcsrmv (expected: [3 3 1 2 2]^T)");

    return 0;
}

//===================

how to tell mkl the underlying microarchitecture

$
0
0

Hi:

    I read the gcc document as well as intel compiler document, they both said that the default behavior will not detect the underlying microarchitecture in linux x86-64 (default is -xsse2). As a result, I need to put -march=nehalem in CXXFLAGS in g++ and put -xsse4.2 in CXXFLAGS in icpc. However, while linking with mkl, there is no flag to tell mkl that my microarchitecture is nehalem:

g++:

g++ -std=c++11 -O2 -march=nehalem -c main.cpp

g++ main.o -lmkl_rt

icpc:

icpc -std=c++11 -xsse4.2 -c main.cpp

icpc -mkl main.o

So, how to guarantee mkl can take full advantage of nehalem and get the highest performance?

Thank you very much

Chaowen GUO

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>