Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

Random Number Generator

$
0
0

Hello,

 

I am trying to generate a vector of random numbers using the following code. However, every time I execute the code, the output vector is the same as previous run! Can you please help? Thank you in advance. - Afshin

 

    int N = 30;
    double r[N];
    VSLStreamStatePtr stream;
    int errcode;
    double a=0,sigma=0.5;
   
    /***** Initialize *****/
    errcode = vslNewStream( &stream, VSL_BRNG_MT2203, 10000 );
    printf("err = %i\n",errcode);
    /***** Call RNG *****/
    errcode = vdRngGaussian( VSL_RNG_METHOD_GAUSSIAN_BOXMULLER, stream, N, r, a, sigma );
    printf("err = %i\n",errcode);
    vslDeleteStream(&stream);
 
    for (i=0; i<N; i++){
        printf("r = %f\n",r[i]);
    }

 


spffrt2 issue

$
0
0

I am trying to use mkl_dspffrt2, but it does not give a correct result. I used packed storage as recommended. The routine does not work for n=1. Let's say the matrix is A=[10]. Have anyone used this routine?

Thanks,

How to configure libs when trying to use Inspector-executor Sparse BLAS Routines and Pardiso

$
0
0

Here is my environment : Visual Studio Community 2017 + Parallel Studio XE 2019 update 1

I`ve tried Link line advisor but it failed with 'No symbolic file loaded for mkl_avx2.dll'

And the current libs linked in are:

mkl_core.lib

mkl_intel_thread.lib

mkl_intel_ilp64.lib

mkl_blas95_ilp64.lib

impi.lib

Besides, I`ve tried add libomp5md.lib but it didn`t work for the problem. 

Finding inverse of a binary matrix by using LAPACKE_dgetrf and LAPACKE_dgetri

$
0
0

I'm trying to find out inverse of the following binary matrix by using following functions,

lapack_int LAPACKE_dgetrf (int matrix_layout , lapack_int m , lapack_int n , double * a , lapack_int lda , lapack_int * ipiv );

lapack_int LAPACKE_dgetri (int matrix_layout , lapack_int n , double * a , lapack_int lda , const lapack_int * ipiv );

But expected outputs are not achieved (Expected output is given below, which is found out by matlab inv(D).)

My_code.c

#define N 84

int main()
{

int i,j,m=N,n=N,lda=84,ipiv[N]

double D[N*N]={            //84 x 84 (Input matrix)
                    0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
                    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
                    1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,
                    0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,
                    0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,
                    0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,
                    0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,
                    0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,
                    0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,
                    1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,
                    0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,
                    0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1};

printf("LU info:%d\n",LAPACKE_dgetrf(LAPACK_ROW_MAJOR,m,n,D,lda,ipiv));

printf("Inverse Info:%d\n",LAPACKE_dgetri (LAPACK_ROW_MAJOR,N,D,lda,ipiv));

return 0;

 

Output:  I'm getting expected inverse result of matrix A, except column no:16 (entire column values are 0).

I've verified the result of LAPACKE_dgetrf.(Upto this everthing is perfect)

Could anyone explain me what is happing here and how to solve this problem?

Expected 16th col result is:

0
1.85037170770859e-17
-1.85037170770859e-17
0
1.85037170770859e-17
0
0
0
0
1.85037170770859e-17
0
1.85037170770859e-17
0
1.85037170770859e-17
7.40148683083438e-17
1
-1.85037170770859e-17
0
0
0
0
0
0
0
0
1.85037170770859e-17
0
0
3.70074341541719e-17
0
1.85037170770859e-17
-1.85037170770859e-17
1.85037170770859e-17
-1.85037170770859e-17
0
0
0
0
0
-1.85037170770859e-17
0
1.85037170770859e-17
0
0
0
0
0
0
0
0
0
0
1.85037170770859e-17
-7.40148683083438e-17
0
0
0
0
0
0
0
0
0
0
3.70074341541719e-17
0
3.70074341541719e-17
0
0
0
0
-3.70074341541719e-17
0
0
-3.70074341541719e-17
-7.40148683083438e-17
1.85037170770859e-17
0
-1.85037170770859e-17
0
0
0
0
1.85037170770859e-17

Different computation results on different processors using mkl_cbwr_set(int)

$
0
0
  1. There are two desktop PC with the following processors:  Intel Core i5 4570 (Haswell) и Intel Core i5 3330 (Ivy Bridge).
  2. Use MKL 2019.0.1, build 20180928.
  3. The app use PARDISO with the following settings: MKL_INT _mtype = 11; MKL_INT _nrhs = 1; MKL_INT _iparm1 = 2; MKL_INT _iparm3 = 0; MKL_INT _iparm4 = 2; MKL_INT _iparm7 = -1; MKL_INT _iparm9 = 13; MKL_INT _iparm10 = 1; MKL_INT _iparm12 = 1; MKL_INT _iparm20 = 2; MKL_INT _iparm23 = 0; MKL_INT _iparm33 = 1; MKL_INT _iparm34 = 1; mkl_domain_set_num_threads(1, MKL_DOMAIN_PARDISO);
  4. Call the mkl_cbwr_set(MKL_CBWR_SSE2) and next PARDISO.
  5. The native library was build using Microsoft Visual Studio 2017 (v141)
  6. The mkl_intel_lp64.lib, mkl_intel_thread.lib, mkl_core.lib used as additional dependencies.
  7. The app runs on both PC.
  8. The calculation result is different. See attached files.

Please suggest a solution to my problem.

 

AttachmentSize
Downloadapplication/zipSSE2_Results.zip454.63 KB

Pardiso Optimization

$
0
0

Hello,

i am currently solving a relatively small system of equations (3000-10000 equations), but since i have to solve it millions of times, each time with slightly different system matrices and right hand sides, it still takes a lot of time overall. So now i am trying to shave a few more ms off. My system matrix is real, not symmetric and contains about five nonzero entries per row. So now i am wondering  how to optimize the problem further. I already did the following: Call phase=13 only for the first solve, then swith to phase=23 and reuse pt. This saved about 30% calculation time.

Iparm(24)=10 also saved some time, but unfortunately only in Debug mode.
Iparm(2)=0 shaved a few ms off as well.

I Also have a few questions: Is structurally symmetric == symmetric? I thought structurally symmetric meant that the pattern of nonzero entries is symmetric, which should be the case for my matrix, but using mtype=1 resulted in an access violation in pardiso.

My system matrix does not change very much from iteration to iteration. About half the equations stay the same, i can also predict quite well which equations stay the same. Is there any way to abuse that to speed up the solution?

Also, are there maybe other solvers that may be better suited for my problem?

The function is down below, an example matrix in csr format can be found here: http://s000.tinyupload.com/?file_id=00837828014833851584
Any help is very appreciated! Even the smallest gain in execution speed will help me a great deal.

 

  

 SUBROUTINE QOED_SOLVE_PARDISO_S(A,ia,ja,pt,schleife, ZUST, GIT, SYS)
    USE MODULE_QOED_SUB
    
    IMPLICIT NONE
    TYPE(GITTER(GIT%NX,GIT%NY))                                     ,INTENT(IN)             :: GIT
    INTEGER             , DIMENSION(GIT%NX*GIT%NY+1)                ,INTENT(IN)             :: ia
    INTEGER             , DIMENSION(GIT%NX*(GIT%NY-2)*5+2*GIT%NX)   ,INTENT(IN)             :: ja
    REAL(KIND=8)        , DIMENSION(GIT%NX*(GIT%NY-2)*5+2*GIT%NX)   ,INTENT(IN)             :: A    
    TYPE(ZUSTAND(GIT%NX,GIT%NY))                                    ,INTENT(INOUT)          :: ZUST
    TYPE(SYSTEMMATRIX(GIT%NX,GIT%NY))                               ,INTENT(IN)             :: SYS
    INTEGER                                                         ,INTENT(IN)             :: schleife
    INTEGER                                                                                 :: maxfct,mnum
    INTEGER                                                                                 :: mtype
    INTEGER                                                                                 :: phase
    INTEGER                                                                                 :: n
    INTEGER                                                                                 :: nrhs
    INTEGER                                                                                 :: error
    INTEGER(KIND=8)  , DIMENSION(64)                            , INTENT(INOUT)             :: pt  
    INTEGER(KIND=8)  , DIMENSION(64)                                                        :: pt_temp!Internal pardiso data storage, initialized to zero, needs to be saved for following pardiso calls or memory leaks can occur
    INTEGER                                                                                 :: msglvl !PARDISO Output, 1-> generate statisitical output
    INTEGER          , DIMENSION(64)                                                        :: IPARM
    INTEGER          , DIMENSION(GIT%NX*GIT%NY)                                             :: PERM
 

    MSGLVL=2
    PERM=0
    phase=23
    mtype=11
    error=0
    nrhs=1
    mnum=1
    maxfct=1
    mnum=1

    n=GIT%NX*GIT%NY
    iparm=0_8
    
    CALL PARDISOINIT(pt_temp,mtype,iparm)
    if (schleife == 1) then
        pt=pt_temp
        phase=13
        endif
    
    CALL PARDISO(pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, SYS%RHS, ZUST%P0, error)

    END SUBROUTINE

The picture is for the first call, subsequent calls need significantly less time for malloc.

 

 

 

 

 

 

LAPACKE_dsyevr doesn't finish but LAPACKE_dsyev works well

$
0
0

I have a symmetrc matrix with some parameters to be changed. For some parameters, the LAPACKE_dsyevr does not return for a very long time, so I have to finish the program.

If I change to LAPACKE_dsyev, for most cases, the speed is slower than dsyevr, but the dsyev can return correctly.

What is the reason?

Thanks for your help. 

Signature of v?Fdim, v?Fmax, ...

$
0
0

Hello,

why is the second array of the functions v?Fdim, v?Fmax, v?Fmin etc. not constant?

The signature in the file mkl_vml_functions.h is:

_Mkl_Api(void, vdFdim, (const MKL_INT n, const double a[], double r1[], double r2[]))
_Mkl_Api(void, vdFmax, (const MKL_INT n, const double a[], double r1[], double r2[]))
_Mkl_Api(void, vdFmin, (const MKL_INT n, const double a[], double r1[], double r2[]))

I think it would be correct as:

_Mkl_Api(void, vdFdim, (const MKL_INT n, const double a[], const double b[], double r[]))
_Mkl_Api(void, vdFmax, (const MKL_INT n, const double a[], const double b[], double r[]))
_Mkl_Api(void, vdFmin, (const MKL_INT n, const double a[], const double b[], double r[]))

Kind regards


Question about the allocate procedure

$
0
0

Hi!

I`m wondering for what reason can it be that for an allocate statement like

if(.not. allocated(a)) then

allocate(a(size),stat=stat,errmsg=errmsg)

else if (size(a) /= size) then

deallocate(a,stat=stat,errmsg=errmsg)

allocate(a(size),stat=stat,errmsg=errmsg)

end if

gives a stat status of 0 when errmsg is 'Allocatable array or pointer is not allocated'?

CPardiso phase 33 scaling

$
0
0

Hi,

We want to use Cluster Pardiso for our finite element application. To get an estimate of performances, we used a simple code (attached file) to read a matrix from the Sparse Suite Collection (Matrix Market format) and then measure execution time for each phase (11, 22 and 33).

Factorisation (22) phase shows good scale up with MPI and OpenMP parallelization but solving phase (33) performances are not nearly as good as factorisation.

For example, the table below shows running times (in seconds) for differents combination of MPI processes and OMP threads (by process).

Serena.mtx:

MPI / OMP       Phase=22         Phase=33

2 / 2                  408.70               2.4668

2 / 4                  249.52               1.3382

2 / 8                  234.87               3.7524

2 / 16                93.879               1.3181

4 / 2                  327.69               1.8661

4 / 4                  162.16               1.9664

4 / 8                  96.526               4.4899

4 / 16                58.619               1.3763

8 / 2                  175.61               1.1638

8 / 4                  90.975               1.1006

8 / 8                  67.704               2.4264

8 / 16                39.654               0.9049

16 / 2                127.61               1.4321

16 / 4                62.155               0.9136

16 / 8                53.761               2.0407

16 / 16              26.957               0.7122

32 / 8                36.447               2.1856

32 / 16              24.977               0.3729

 

We can observe that solving does not always decrease with more MPI process or OpenMP threads. We tested other matrices (RM07R) but the same behaviour was observed. Is this normal or is it an issue? Is there a way to get better scaling?

 

Thanks a lot for any advice

Guillaume

AttachmentSize
Downloadapplication/x-gzipCPMM.tar.gz7.73 KB

Block sparse matrix with sparse matrix

$
0
0

I have a sparse matrix M, and a sparse matrix K,  both of the sparse matrices obtain the same non-zeros location, how to form a sparse matrix with following format:

A= [ M,  0.2K;  0.1M, 0.1K];

This was used to solve linear equation with Ax = b.

Thanks a lot.

 

Threading Problem in Pardiso

$
0
0

Hello,

We meet a problem in using Pardiso with multiple threads. Our purpose is solving: AX=B, X is a non-symmetry sparse matrix, A & B are vector. What we do is initialize the Pardiso,  LU decomposition by phase 22, then call function "pardiso()" to solve the problem by phase 33.

The phenomenon is: if setting up the MKL by  "mkl_set_dynamic( true )" and do nothing else, the solver always uses 1 thread only, not matter how big the matrix could be, from 1K unknown to 1M unknown.  If  setting up the MKL by  "mkl_set_dynamic( true )" and "mkl_set_num_threads(4)", we can see 4 threads are used (CPU: i7-3770,  i7-5820k, and dual xeon x5460), but the solving speed slow down by spend double time.

We also use MKL blas functions in other parts of our code, "mkl_set_dynamic( true )" works well, and we can observe multiple threads are used and can get a linear sppedup.

So, our question is how to enable Pardiso with multiple threads and get a reasonable speedup?

I'd appreciate any input and ideas,

mq

 

 

 

undefined symbol: mkl_serv_check_ptr_and_warn when calling function from shared library using Python

$
0
0

I'm calling a C function from within my Python code using the ctypes Python library. The C function calls a MKL function. This however gives me an error:

symbol lookup error: /cm/shared/apps/intel/composer_xe/2015.5.223/mkl/lib/intel64/libmkl_avx2.so: undefined symbol: mkl_serv_check_ptr_and_warn

Below is my setup, the code and the compiler command and flags used.

My setup:
intel/compiler/64/15.0/2015.5.223
intel/mkl/64/11.2/2015.5.223

LD_PRELOAD=/cm/shared/apps/intel/composer_xe/2015.5.223/mkl/lib/intel64/libmkl_core.so

The LD_LIBRARY_PATH contains, among other things, the following: LD_LIBRARY_PATH=/cm/shared/apps/intel/composer_xe/2015.5.223/mkl/lib/intel64:/cm/shared/apps/intel/composer_xe/2015.5.223/mpirt/lib/intel64:/cm/shared/apps/intel/composer_xe/2015.5.223/compiler/lib/intel64:

MM.c

#include "mkl.h"

void matrix_mult(double *A, double *B, double *C, int N, int M, int P) {
   mkl_set_num_threads(4);
   cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                   N, P, M, 1.0, A, M, B, P, 0.0, C, P);
}

 

My MM.c file:

#include "mkl.h"
void matrix_mult(double *A, double *B, double *C, int N, int M, int P) {
   mkl_set_num_threads(64);
   cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                   N, P, M, 1.0, A, M, B, P, 0.0, C, P);
}

Then I compile this file to a shared library using the following command:

icc -shared -fPIC -mkl MM.c -o MM.so

I had to change the LD_PRELOAD env variable so it contains the following: LD_PRELOAD=/cm/shared/apps/intel/composer_xe/2015.5.223/mkl/lib/intel64/libmkl_core.so

My Python file, test.py:
from ctypes import *
mkl = cdll.LoadLibrary("./MM.so")
dgemm = mkl.matrix_mult
N = 22

double_array = c_double*(N*N)
A = double_array(*[1]*N*N)
B = double_array(*[1]*N*N)
C = double_array(*[0]*N*N)
dgemm(byref(A), byref(B), byref(C), c_int(N), c_int(N), c_int(N))
 

Up and including N=21 this works fine. However, when I have a N of 22 or bigger I get the following error:

I run my code: python test.py

python: symbol lookup error: /cm/shared/apps/intel/composer_xe/2015.5.223/mkl/lib/intel64/libmkl_avx2.so: undefined symbol: mkl_serv_check_ptr_and_warn

The following thread [0] suggests that I might have multiple MKL versions on my machine. AFAIK this is not the case.

Does any of you had this problem too? Or does somebody knows how to fix it?

[0] https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...

Thanks a lot!

 

 

 

 

 

 

Help on running GMRES with MKL in C++

$
0
0

Hi,

I am new to MKL library. I tried to use to use it to solve a sparse system matrix using “GMRES” functions in MKL library.

I tried the steps in the given in examples which come with the MKL library ( “fgmres_st_criterion_c.c”). I was able to compile and run the example with ICC v19.0 ( I am using Windows 10 64bit with Intel Parallel studio 2019. I have a student license). The problem is when I ported the code into my C++ program, nothing seems to be working.

 

My code is written in C++ and I am using template classes. I am not sure whether this is possible or not. I have included my code until gmres solver is initialized. The initialization fails and the error code is not listed in the documentation (“RCI_request =-3689348814741910324” ). “ipar” variable has wrong values.

 

Additionally, I notice something strange. Let's say if I am to set “RCI_request =0” (which is not needed) before calling “dfgmres_init”. Then “dfgmres_init” will simply return RCI_request =0. But still, the “ipar” variable values are wrong. I notice the same behavior with the Conjugate Gradient method.

I have 16k elements in my problem but it will be much larger when the mesh is refined. ( n = 16000).

 

Thank you all in advance

Dilshan

 

template<class T>

void SPMV<T>::call_gmres_solver_mkl(const vector<T>& coefNeigbor, const vector<T>& RHS,

       vector<T>& phi, vector<T>& phiOld) {

if (sizeof(T) == sizeof(float)) {

              cout << "currently this program is written for double floatting point numbers only.\nplease update the source code and recompiler"<< endl;

              cout << "Program will terminate now"<< endl;

              cin.get();

              exit(0);

       }

       MKL_INT n = nFl;// number of unknows

       MKL_INT nEle = (nFl + (nFl*nface) - (nTtl - nFl));

       if (nEle <= 0) {

              cout << "The number of interfaces must be greater than zero. current value ="<< n << endl;

              cout << "Program will exit now"<< endl;

              cin.get();

              exit(0);

       }

MKL_INT RCI_request, itercount, expected_itercount = 8;//      

       double *A = (double*)malloc(nEle * sizeof(double));

       double *residual = (double*)malloc(n * sizeof(double));

 

       int id = 0;

       ia[0] = 0;

 

 

       int n0 = n * (2 * n + 1) + (n * (n + 9)) / 2 + 1;

       double *tmp = (double *)malloc(n0 * sizeof(double));

       double *rhs = (double *)malloc(n * sizeof(double));

       double *solution = (double *)malloc(n * sizeof(double));

MKL_INT ipar[128];

       double dpar[128];

// At first run phiOld = 0

for (int i = 0; i < n; i++) {

              solution[i] = phiOld[i];

              rhs[i] = RHS[i];

       }

dfgmres_init(&n, solution, rhs, &RCI_request, ipar, dpar, tmp);

if (RCI_request != 0) {

cout << "Some error occured during the gmres initialisation phase. Error code ="<< RCI_request << endl;

       goto FAILED;

}

 

// rest of the code

}

Batch normalization example

$
0
0

Hello,

This is my first program using MKL library and i want to include a simple batch normalization call after convolution. This is my code for BN 
Does anyone have an idea what i did wrong and why the execution of my code is failing ?

 

/*** BN1 section ***/

CHECK_ERR(dnnBatchNormalizationCreateForward_F64(&bn1, attributes, lt_conv1_output, 0), err);

resBn1[dnnResourceSrc] = resConv1[dnnResourceDst];

CHECK_ERR(dnnLayoutCreateFromPrimitive_F64(&lt_bn1_output, bn1, dnnResourceDst), err);

CHECK_ERR(dnnAllocateBuffer_F64((void **)&resBn1[dnnResourceDst], lt_bn1_output), err);

CHECK_ERR(init_conversion(&cv_relu1_to_user_output, &user_o, lt_user_output, lt_bn1_output, resBn1[dnnResourceDst]), err);

...............

CHECK_ERR(dnnExecute_F64(bn1, (void *)resBn1), err);

 


Problem with random number generator ?

$
0
0

This isnt a BIG problem, but it might trip someone up if they are not aware of it.

When I call RANDOM_NUMBER(XX) where xx is real(8),

the first value is always a very small number, typically under 1.D-4.

So the first value obtained is not really a random number.

Of course can use RANDOM SEED to get around this, but I thought

it was supposed to use the time and date by default.

at any rate, the first number should be as random as all the following ones, right ?

Has this been addressed before ?

MKL and CMAKE

$
0
0

Is universal cmake script for finding and linking MKL exist?

For example, I want to create solution which uses cmake and MKL:

On the first machine I want to compile my application with Intel C++ Compiler 19.0 on Windows, on the second machine i want to compile it with gcc and link MKL manually.

I found some examples of FindMKL.cmake file in the internet, but they didn't work.

Please, help!

Thank you

Why VM library is so slow in the new mode of CPUs

$
0
0

Hi, I had reached a special problem and found that the calculation which using for ... loop is more efficiency than use the VM library of MKL. I test for several examples. It shows the same kind of results.

For example. Work with CPU : Intel(R) Xeon(R) Gold 6148 CPU, Intel Parallel Studio 2018u5. After running the 'test', the result shows:

./test
Time for normal distribution
    serial:    5s
    vector (HA):    12s
    vector (LA):    11s
    vector (EP):    12s

The compile and link flags are:

-O3  -xHost -DMKL_ILP64 -I${MKLROOT}/include

And for link:

-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl

I also using Vtune to check the computing. It shows the time consuming of vdExp is almost equal the for...loop of serial part. The source code and Makefile are found in the tar package. The OS is Centos 7.3

I don't understand why the VM library runs so slowly. Is there anything wrong with the FLAGS or codes?

 

 

 

AttachmentSize
Downloadapplication/x-tarforintelforum.tar7.5 KB

How to implement numpy broadcast mechanism with mkl?

$
0
0

How to implement numpy broadcast mechanism with mkl?

  I have been confused, how to use mkl to efficiently implement the broadcast mechanism in numpy ((Element wise operator "+","-","*")?
such as
2-D array sub 1-D array
[[1,2,3],
[4,5,6],
[7,8,9]]
-
[1, 2, 3]
=
[[0, 0, 0],
[3, 3, 3],
[6, 6, 6]]

And the second operation (can be understood as a matrix multiplied by a diagonal matrix)
2-D array multiply 1-D array(Element wise multiply )
[[1,2,3],
[4,5,6],
[7,8,9]]
*
[1, 2, 3]
=
[[1, 4, 9],
[4, 10, 18],
[7, 16, 27]]

How to implement numpy broadcast mechanism with mkl?

$
0
0

How to implement numpy broadcast mechanism with mkl?

  I have been confused, how to use mkl to efficiently implement the broadcast mechanism in numpy ((Element wise operator "+","-","*")?
such as
2-D array sub 1-D array
[[1,2,3],
[4,5,6],
[7,8,9]]
-
[1, 2, 3]
=
[[0, 0, 0],
[3, 3, 3],
[6, 6, 6]]

And the second operation (can be understood as a matrix multiplied by a diagonal matrix)
2-D array multiply 1-D array(Element wise multiply )
[[1,2,3],
[4,5,6],
[7,8,9]]
*
[1, 2, 3]
=
[[1, 4, 9],
[4, 10, 18],
[7, 16, 27]]

I tried to implement with the for loop +cblas_dscal/vdSub
But I think this is not efficient, I don't know if there is any better implementation.

Viewing all 3005 articles
Browse latest View live