FGMRES preconditioner applied to?

May 7, 2017, 11:59 pm

Latest and popular articles on Intel Technologies

≫ Next: Poor scaling for real-to-real FFT with OpenMP

≪ Previous: Pardiso out of memory in phase 11

I am using the MKL's preconditioned FGMRES solver and I am trying to understand what exactly is the vector that FGMRES is asking to apply the preconditioner to. From the reference for the solver, Saad's Iterative methods for sparse linear systems, the left-preconditioned GMRES iteration (I'm assuming that FGMRES does left-preconditioning, please correct me if I'm wrong) involves computing at each step M^-1 A v_j. That is, FGMRES first asks to compute the matrix vector product A v_j, and then I would assume that FGMRES would ask to apply the preconditioner on that result, i.e. compute M^-1 A v_j. Only, when I compute the squared 2 norm of the vectors involved, I get that the vector on which FGMRES asks to apply the preconditioner (which I would assume to be A v_j) always has a unit norm, regardless of the norm of A v_j. What is this unit norm vector that FGMRES is asking the user to apply the preconditioner to?

Thread Topic:

Question

↧

Poor scaling for real-to-real FFT with OpenMP

May 13, 2017, 5:08 am

Latest and popular articles on Intel Technologies

≫ Next: numroc returns incorrect value in scalapack

≪ Previous: FGMRES preconditioner applied to?

In the attached file I use MKL to compute a real-to-real FFT using OpenMP for multithreading.

The code is compiled with

icpc -o bench-fft -Wall -O3 -g -march=native -fopenmp bench-fft.cxx -mkl

The machine has 4 cores.

It seems that the code does not scale well with the number of threads.

When run with

OMP_NUM_THREADS=1 ./bench-fft 4194304

the total time taken is 0.1640 user, 0.0440 sys while with

OMP_NUM_THREADS=2 ./bench-fft 4194304

the total time taken is 0.3000 user, 0.0560 sys. So there seems to be a large synchronization overhead since the total CPU time almost doubles.

Is this to be expected or am I doing something wrong in my code.

Attachment	Size
Download bench-fft.cxx	1.93 KB

Thread Topic:

Question

↧

numroc returns incorrect value in scalapack

May 15, 2017, 4:31 pm

Latest and popular articles on Intel Technologies

≫ Next: Performing split step fourier using MKL dfti

≪ Previous: Poor scaling for real-to-real FFT with OpenMP

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #ffe549; background-color: #2c67c8}
span.s1 {font-variant-ligatures: no-common-ligatures}

Thread Topic:

Help Me

↧

Performing split step fourier using MKL dfti

May 15, 2017, 11:11 pm

Latest and popular articles on Intel Technologies

≫ Next: DFTI_REAL_REAL Speed?

≪ Previous: numroc returns incorrect value in scalapack

Hi,

I am trying to perform split step fourier method with the following code. There seems to be a problem with the Fourier portion because when I compare my data with the values obtained with another program, they match perfectly just until I begin the Fourier transform. Here is my code:

program ss

Use MKL_DFTI

implicit none

real*8 , parameter :: distance = 100.0d0
real*8 , parameter ::beta2 = 1.0d0
real*8, parameter :: N = 1.0d0
real*8, parameter :: nt = 1024.0d0
real*8, parameter :: Tmax = 32.0d0
real*8 :: step_num
real*8 :: deltaz
real*8 :: dtau
real*8, dimension (1024) :: tau, omega, uu
real*8, parameter :: pi = 3.141592653590d0
type(DFTI_DESCRIPTOR), POINTER :: handle
complex*16 :: hhz, C= (0,1.0d0)
complex*16, dimension (1024):: dispersion, xy,temp,temp1,temp2
integer :: i, status

step_num = 100000
deltaz = distance/step_num
dtau = (2.0d0*Tmax)/nt

do i = 1,1024
tau (i) = dble(i-513)*dtau

if (i<=512) then
omega (i) = dble(i-1)*pi/Tmax
else
omega (i) = dble(i-1025)*pi/Tmax
end if

end do

uu = 1/dcosh(tau)

dispersion = exp(0.50d0*C*beta2*(omega**2)*deltaz)

hhz = 1*(0,1.0d0)*N**2*deltaz

temp = uu*exp(abs(uu)**2*hhz/2)

status = DftiCreateDescriptor(handle, DFTI_DOUBLE,DFTI_COMPLEX,1,1024)
status = DftiSetValue(handle,DFTI_FORWARD_SCALE,1.0d0)
status = DftiSetValue(handle,DFTI_BACKWARD_SCALE,1/dble(1024))
status = DftiCommitDescriptor(handle)

do j=1,step_num

status = DftiComputeBackward(handle,temp)

do i= 1,1024

temp1(i) = temp(i)*dispersion(i)

end do

Status = DftiComputeForward(handle,temp1)

do i= 1,1024

temp2(i) = temp1(i)*exp(abs(temp1(i))**2*hhz)

end do

end do

status = DftiFreeDescriptor(handle)

xy = temp2*exp(-abs(temp1)**2*hhz/2)

do i=1,1024

open (10, file = 'spec.dat')
write (10,*) tau(i), abs(xy(i)**2)

end do

end program ss

Zone:

Thread Topic:

How-To

↧

DFTI_REAL_REAL Speed?

May 16, 2017, 7:36 am

Latest and popular articles on Intel Technologies

≫ Next: Trust Region Random Results

≪ Previous: Performing split step fourier using MKL dfti

I have some code that makes heavy use of 1-D DFTs using MKL (real-to-complex and complex-to-complex). I just realized that the library supports the DFTI_REAL_REAL layout, where the real and imaginary parts of complex numbers are stored in separate arrays. I know that this can result in more efficient implementations of some algorithms due to a reduced need for SIMD shuffles. I thought that I would ask here before rearchitecting my application to use split complex layout: could I expect any speedup in the DFT implementation by using split complex versus my current interleaved layout? I run this software on AVX, AVX2, and AVX512 platforms currently.

↧

Trust Region Random Results

May 17, 2017, 3:45 am

Latest and popular articles on Intel Technologies

≫ Next: Eigenvalues not in ascending order

≪ Previous: DFTI_REAL_REAL Speed?

We Implemented TR MKL Method for our purpose.

It doesn't converge every time, and also produce different (random) solutions from the same initial condition and constraints.

About Trust Region Size parameter we use 4 size aproach:
this is, 1st time value is 100, 2nd time 10, 3rd time 1 and 0.1 value last time.
Every time we use solver, we use previous results as input, if improved.

Is there anyone tha knows the problem?
Is there any suggestion about Trust Region Size usage? best values to use and when?

Thank you very much

Gianluca

↧

Eigenvalues not in ascending order

May 17, 2017, 6:06 am

Latest and popular articles on Intel Technologies

≫ Next: MKL PARDISO: floating-point error with zero pivots ...

≪ Previous: Trust Region Random Results

Hey there,

is there a diagonalisation routine, that does not order the eigenvalues in an ascending way?

Thanks in advance,

sommerfeld

↧

MKL PARDISO: floating-point error with zero pivots ...

May 18, 2017, 6:39 am

Latest and popular articles on Intel Technologies

≫ Next: Strange bad memory of MKL spline function

≪ Previous: Eigenvalues not in ascending order

Hi,

we have detected problems with the current version of the PARDISO solver.

(MKL vers. 2017.4.210, Windows x64 architecture, static linking, Microsoft VS 2015).

PARDISO produces a floating-point error in phase 22, if the coefficient matrix of a FEM analysis has zero pivots.

Another problem is, that PARDISO doesn’t return the correct line number of the first pivot element (iparm[29]).

In former versions PARDISO worked well in this regard.

I have attached some screen shots and a code sample to demonstrate the above problems.

Thank you for your answer.

Regards

Dr. Guenter Kaufels, InfoGraph GmbH, Aachen, Germany

Attachment	Size
Download Attachment.pdf	2.17 MB

↧

Strange bad memory of MKL spline function

May 18, 2017, 6:26 am

Latest and popular articles on Intel Technologies

≫ Next: libmkl_blacs_openmpi compatibility

≪ Previous: MKL PARDISO: floating-point error with zero pivots ...

Dear all,

I am using MKL compilers_and_libraries_2017.4.210 (update 3) and following the developer guide to construct a cubic spline interpolation workflow. However no matter how I configure my inputs the scoeff values from dfdEditPPSpline1D() output stays bad memory allocation. The function return a good status but then crashes in the next function dfdConstruct1D(). I wonder whether I miss something in my configuration or inclusion of files, or whether there is a bug. My source code was copied below with dummy input data.

Regards,

Tianhua

--------------------------------------------

For compile, I included 34 h files that are enough for my project.

For link I included mkl_core.lib, mkl_core_dll.lib, mkl_intel_ilp64.lib, mkl_intel_ilp64_dll.lib, mkl_sequential.lib and mkl_sequential_dll.lib.

For execution, I included mkl_core.dll, mkl_sequential.dll and mkl_vml_def.dll

Those appear enough.

Dummy source code which has bad memory of scoeff pointer and crashes in the next function.

------{

//setup MKL data structures

#define SPLINE_ORDER DF_PP_CUBIC /* A cubic spline to construct */

int status; /* Status of a Data Fitting operation */

DFTaskPtr task; /* Data Fitting operations are task based */

/* Parameters describing the partition */

MKL_INT nx; /* The size of partition x */

MKL_INT xhint; /* Additional information about the structure of breakpoints */

/* Parameters describing the function */

MKL_INT ny; /* Function dimension */

MKL_INT yhint; /* Additional information about the function */

/* Parameters describing the spline */

MKL_INT s_order; /* Spline order */

MKL_INT s_type; /* Spline type */

MKL_INT ic_type; /* Type of internal conditions */

double* ic; /* Array of internal conditions */

MKL_INT bc_type; /* Type of boundary conditions */

double* bc; /* Array of boundary conditions */

double scoeff[(20 - 1)* SPLINE_ORDER]; /* Array of spline coefficients */

MKL_INT scoeffhint; /* Additional information about the coefficients */

MKL_INT sitehint; /* Additional information about the structure of

interpolation sites */

MKL_INT ndorder, dorder; /* Parameters defining the type of interpolation */

double* datahint; /* Additional information on partition and interpolation sites */

double *r; /* Array of interpolation results */

r = new double[20];

MKL_INT rhint; /* Additional information on the structure of the results */

MKL_INT* cell; /* Array of cell indices */

/* Initialize the partition */

nx = 10;

nsite = 20;

double x[10], y[10];

for (int i = 0; i < 10; i++)

{

x[i] = i;

y[i] = pow(i, 3);

}

for (int i = 0; i < 20; i++)

{

site[i] = i;

}

for (int i = 0; i < 19 * SPLINE_ORDER; i++)

scoeff[i] = -9999.0;

xhint = DF_NON_UNIFORM_PARTITION; /* The partition is non-uniform. */

/* Initialize the function */

ny = 1; /* The function is scalar. */

yhint = DF_NO_HINT; /* No additional information about the function is provided. */

status = dfdNewTask1D(&task, nx, x, xhint, ny, y, yhint);

if (status == 0)

{

/* Initialize spline parameters */

s_order = DF_PP_CUBIC; /* Spline is of the fourth order (cubic spline). */

s_type = DF_PP_NATURAL; /* Spline is of the natural cubic spline type. */

/* Define internal conditions for cubic spline construction (none in this example) */

ic_type = DF_NO_IC;

ic = NULL;

/* Use not-a-knot boundary conditions. In this case, the is first and the last

interior breakpoints are inactive, no additional values are provided. */

bc_type = DF_BC_FREE_END; //natural cubic spline

bc = NULL;

scoeffhint = DF_NO_HINT; /* No additional information about the spline. */

status = dfdEditPPSpline1D(task, s_order, s_type, bc_type, 0, ic_type,

0, &scoeff[0], scoeffhint);

//continue only when the internal boundary condition task is successful

if (status == 0)

{

/* Use a standard method to construct a cubic Bessel spline: */

/* Pi(x) = ci,0 + ci,1(x - xi) + ci,2(x - xi)2 + ci,3(x - xi)3, */

/* The library packs spline coefficients to array scoeff: */

/* scoeff[4*i+0] = ci,0, scoef[4*i+1] = ci,1, */

/* scoeff[4*i+2] = ci,2, scoef[4*i+1] = ci,3, */

/* i=0,...,N-2 */

status = dfdConstruct1D(task, DF_PP_SPLINE, DF_METHOD_STD);

// skip the data construction check, TZ May 2017

sitehint = DF_NON_UNIFORM_PARTITION; /* Partition of sites is non-uniform */

ndorder = 1;

dorder = 1;

datahint = DF_NO_APRIORI_INFO; /* No additional information about breakpoints or

sites is provided. */

rhint = DF_MATRIX_STORAGE_ROWS; /* The library packs interpolation results

in row-major format. */

cell = NULL; /* Cell indices are not required. */

/* Solve interpolation problem using the default method: compute the spline values

at the points site(i), i=0,..., nsite-1 and place the results to array r */

status = dfdInterpolate1D(task, DF_INTERP, DF_METHOD_PP, nsite, site,

sitehint, ndorder, &dorder, datahint, r, rhint, cell);

/* De-allocate Data Fitting task resources */

status = dfDeleteTask(&task);

}

--------------------

Thread Topic:

Question

↧

libmkl_blacs_openmpi compatibility

May 18, 2017, 12:49 pm

Latest and popular articles on Intel Technologies

≫ Next: "Trust Region Algorithm" Questions

≪ Previous: Strange bad memory of MKL spline function

Hello everybody,

I have a question regarding the libmkl_blacs_openmpi* libraries. Which Openmpi version is this library supposed to be compatible with ?

I could not find this information in the usual MKL or compiler release notes. By testing I determined that the libmkl_blacs_openmpi_lp64.so from the MKL which is bundled with intel 2016 update 4 is compatible with openmpi 2.0, i.e. programs using the libmkl_scalapack_lp64.so work and apparently give correct results. However, using the libraries from the intel 2017 update 2 distribution together with openmpi 2.0 and 2.1 gives programs producing a runtime error as soon as BLACS routines are called. I had no time to test intel 2017 update 4 yet, but an authoritative answer on the compatibility would be helpful even if it should work with update 4.

Of course if this is documented in detail somewhere a pointer to the documentation is appreciated, too.

Best Regards

Christof

↧

"Trust Region Algorithm" Questions

May 19, 2017, 1:41 am

Latest and popular articles on Intel Technologies

≫ Next: Are LAPACKE_cgesdd and LAPACKE_cgesvd SVD calculations reliable?

≪ Previous: libmkl_blacs_openmpi compatibility

Along this period, we have developed a calculation method that uses the Trust Region MKL API (with constraints).

We found many difficulties, but after a lot of efforts we have obtained some quite good results.

By the way, we have found also some strange behavior of your functions (eg. dtrnlspbc_solve …).

Here some question that can help us and also other users to understand the usage of this algorithm better:

1) If we enlarge the constraints the calculation seems more stable. Maybe, the constraints work also during optimization process? In this case, the search procedure could not found some results insides the constraints range. Do you confirm this?

2) It seems that the algorithm fails when it found minimums near the constraints. Is this problem related with the Jacobian?

3) With same input values (initial conditions and constraints) it produces different results, sometimes very closed each other. Is it used random number generator inside the algorithm? Is this the reason?

4) Is there any suggested criteria to setup the trust region size parameter?

5) Is there any suggested criteria to setup the constraints?

Thank you very much

Gianluca

↧

Are LAPACKE_cgesdd and LAPACKE_cgesvd SVD calculations reliable?

May 20, 2017, 11:12 pm

Latest and popular articles on Intel Technologies

≫ Next: Segmentation faults with sparse FEAST

≪ Previous: "Trust Region Algorithm" Questions

I'm using LAPACKE_cgesdd and LAPACKE_cgesvd to compute the singular values of a matrix. Both the routines have the option to compute the singular values only. The problem I have is that, in the following four test cases:

Full SVD with LAPACKE_cgesdd;
Full SVD with LAPACKE_cgesvd;
Singular values only with LAPACKE_cgesdd;
Singular values only with LAPACKE_cgesvd.

I receive different singular values. In particular:

Test, 3 x 4 matrix

    a[0].real(5.91); a[0].imag(-5.96);
    a[1].real(7.09); a[1].imag(2.72);
    a[2].real(7.78); a[2].imag(-4.06);
    a[3].real(-0.79); a[3].imag(-7.21);
    a[4].real(-3.15); a[4].imag(-4.08);
    a[5].real(-1.89); a[5].imag(3.27);
    a[6].real(4.57); a[6].imag(-2.07);
    a[7].real(-3.88); a[7].imag(-3.30);
    a[8].real(-4.89); a[8].imag(4.20);
    a[9].real(4.10); a[9].imag(-6.70);
    a[10].real(3.28); a[10].imag(-3.84);
    a[11].real(3.84); a[11].imag(1.19);

Full SVD with LAPACKE_cgesdd

   17.8592720031738
   11.4463796615601
   6.74482488632202

Full SVD with LAPACKE_cgesvd

   17.8651084899902
   11.3695945739746
   6.83876800537109

Singular values only with LAPACKE_cgesdd

   17.8592758178711
   11.4463806152344
   6.74482440948486

Singular values only with LAPACKE_cgesvd

   17.8705902099609
   11.5145053863525
   6.82878828048706

As it can be seen, even for the same routine, the results can change since the third significant digit when switching from full SVD to singular values only.

My questions are:

Is this reasonable?

Am I doing something wrong?

Thank you in advance for any help.

This is the code I'm using:

    #include <stdlib.h>
    #include <stdio.h>
    #include <algorithm>    // std::min
    #include <time.h>
    #include <complex>
    #include <mkl.h>
    #include "mkl_lapacke.h"

    #include "TimingCPU.h"
    #include "InputOutput.h"

    //#define FULL_SVD
    #define PRINT_MATRIX
    #define PRINT_SINGULAR_VALUES
    //#define PRINT_LEFT_SINGULAR_VECTORS
    //#define PRINT_RIGHT_SINGULAR_VECTORS
    #define SAVE_MATRIX
    #define SAVE_SINGULAR_VALUES
    //#define SAVE_LEFT_SINGULAR_VECTORS
    //#define SAVE_RIGHT_SINGULAR_VECTORS

    #define GESDD
    //#define GESVD

    /*************************************************************/
    /* PRINT A SINGLE PRECISION COMPLEX MATRIX STORED COLUMNWISE */
    /*************************************************************/
    void print_matrix_col(char const *desc, MKL_INT Ncols, MKL_INT Nrows, std::complex<float>* a, MKL_INT LDA) {
        printf( "\n %s\n[", desc);
        for(int i = 0; i < Ncols; i++) {
            for(int j = 0; j < Nrows; j++)
                printf("(%6.2f,%6.2f)", a[i * LDA + j].real(), a[i * LDA + j].imag());
            printf( "\n" );
        }
    }

    /**********************************************************/
    /* PRINT A SINGLE PRECISION COMPLEX MATRIX STORED ROWWISE */
    /**********************************************************/
    void print_matrix_row(char const *desc, int Nrows, int Ncols, std::complex<float>* a, int LDA) {
        printf( "\n %s\n", desc);
        for (int i = 0; i < Ncols; i++) {
            for (int j = 0; j < Ncols; j++)
                printf("%2.10f + 1i * %2.10f ", a[i * LDA + j].real(), a[i * LDA + j].imag());
            printf( ";\n" );
        }
    }

    /****************************************/
    /* PRINT A SINGLE PRECISION REAL MATRIX */
    /****************************************/
    void print_rmatrix(char const *desc, MKL_INT m, MKL_INT n, float* a, MKL_INT lda ) {
        MKL_INT i, j;
        printf( "\n %s\n", desc );
        for( i = 0; i < m; i++ ) {
            for( j = 0; j < n; j++ ) printf( " %6.2f", a[i*lda+j] );
            printf( "\n" );
        }
    }

    /********/
    /* MAIN */
    /********/
    int main() {

        const int Nrows = 3;            // --- Number of rows
        const int Ncols = 4;            // --- Number of columns
        const int LDA   = Ncols;
        const int LDU   = Nrows;
        const int LDVT  = Ncols;

        const int numRuns   = 20;        // --- Number of runs for timing

        TimingCPU timerCPU;

        // --- Allocating space and initializing the input matrix
        std::complex<float> *a = (std::complex<float> *)malloc(Nrows * Ncols * sizeof(std::complex<float>));
        srand(time(NULL));
    //    for (int k = 0; k < Nrows * Ncols; k++) {
    //        a[k].real((float)rand() / (float)(RAND_MAX));
    //        a[k].imag((float)rand() / (float)(RAND_MAX));
    //    }
        a[0].real(5.91); a[0].imag(-5.96);
        a[1].real(7.09); a[1].imag(2.72);
        a[2].real(7.78); a[2].imag(-4.06);
        a[3].real(-0.79); a[3].imag(-7.21);
        a[4].real(-3.15); a[4].imag(-4.08);
        a[5].real(-1.89); a[5].imag(3.27);
        a[6].real(4.57); a[6].imag(-2.07);
        a[7].real(-3.88); a[7].imag(-3.30);
        a[8].real(-4.89); a[8].imag(4.20);
        a[9].real(4.10); a[9].imag(-6.70);
        a[10].real(3.28); a[10].imag(-3.84);
        a[11].real(3.84); a[11].imag(1.19);

        // --- Allocating space for the singular vector matrices
        #ifdef FULL_SVD
        std::complex<float> *u  = (std::complex<float> *)malloc(Nrows * LDU  * sizeof(std::complex<float>));
        std::complex<float> *vt = (std::complex<float> *)malloc(Ncols * LDVT * sizeof(std::complex<float>));
        #endif

        // --- Allocating space for the singular values
        float *s = (float *)malloc(std::min(Nrows, Ncols) * sizeof(float));

        #ifdef GESVD
        float *superb = (float *)malloc((std::min(Nrows, Ncols) - 1) * sizeof(float));
        #endif

        // --- Print and/or save input matrix
        #ifdef PRINT_MATRIX
        print_matrix_row("Matrix (stored rowwise)", Ncols, Nrows, a, LDA);
        #endif
        #ifdef SAVE_MATRIX
        saveCPUcomplextxt(a, "/home/angelo/Project/SVD/MKL/a.txt", Ncols * Nrows);
        #endif

        // --- Compute singular values
        MKL_INT info;
        float   timing = 0.f;
        for (int k = 0; k < numRuns; k++) {
            timerCPU.StartCounter();
            // --- The content of the input matrix a is destroyed on output
            #if defined(FULL_SVD) && defined(GESDD)
            printf("Running Full SVD - GESDD\n");
            MKL_INT info = LAPACKE_cgesdd(LAPACK_ROW_MAJOR, 'A', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s, (MKL_Complex8 *)u, LDU, (MKL_Complex8 *)vt, LDVT);
            #endif
            #if !defined(FULL_SVD) && defined(GESDD)
            printf("Running singular values only - GESDD\n");
            MKL_INT info = LAPACKE_cgesdd(LAPACK_ROW_MAJOR, 'N', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s, NULL, LDU, NULL, LDVT);
            #endif
            #if defined(FULL_SVD) && defined(GESVD)
            printf("Running Full SVD - GESVD\n");
            MKL_INT info = LAPACKE_cgesvd(LAPACK_ROW_MAJOR, 'A', 'A', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s,
             (MKL_Complex8 *)u, LDU, (MKL_Complex8 *)vt, LDVT, superb);
            #endif
            #if !defined(FULL_SVD) && defined(GESVD)
            printf("Running singular values only - GESVD\n");
            MKL_INT info = LAPACKE_cgesvd(LAPACK_ROW_MAJOR, 'N', 'N', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s,
             NULL, LDU, NULL, LDVT, superb);
            #endif
            if(info > 0) { // --- Check for convergence
                printf( "The algorithm computing SVD failed to converge.\n" );
                exit(1);
            }
            timing = timing + timerCPU.GetCounter();
        }
        printf("Timing = %f\n", timing / numRuns);

        // --- Print and/or save singular values
        #ifdef PRINT_SINGULAR_VALUES
        print_rmatrix("Singular values", 1, Ncols, s, 1);
        #endif
        #ifdef SAVE_SINGULAR_VALUES
        saveCPUrealtxt(s, "/home/angelo/Project/SVD/MKL/s.txt", std::min(Nrows, Ncols));
        #endif

        // --- Print left singular vectors
        #ifdef PRINT_LEFT_SINGULAR_VECTORS
        print_matrix_col("Left singular vectors (stored columnwise)", Ncols, Ncols, u, LDU);
        #endif
        #if defined(FULL_SVD) && defined(SAVE_LEFT_SINGULAR_VECTORS)
        saveCPUcomplextxt(u, "/home/angelo/Project/SVD/MKL/u.txt", Nrows * LDU);
        #endif

        // --- Print right singular vectors
        #ifdef PRINT_RIGHT_SINGULAR_VECTORS
        print_matrix_col("Right singular vectors (stored rowwise)", Ncols, Nrows, vt, LDVT);
        #endif
        #if defined(FULL_SVD) && defined(SAVE_RIGHT_SINGULAR_VECTORS)
        saveCPUcomplextxt(vt, "/home/angelo/Project/SVD/MKL/vt.txt", Ncols * LDVT);
        #endif

        exit(0);
    }

compiled as

    g++ -DMKL_ILP64 -fopenmp -m64 -I$MKLROOT/include svdMKLComplexSingle.cpp TimingCPU.cpp InputOutput.cpp -L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm -fno-exceptions

Zone:

Windows*

Thread Topic:

Question

↧

Segmentation faults with sparse FEAST

May 22, 2017, 5:46 am

Latest and popular articles on Intel Technologies

≫ Next: MKL library

≪ Previous: Are LAPACKE_cgesdd and LAPACKE_cgesvd SVD calculations reliable?

Dear,

I am using the interface dfeast_scsrev for computing eigenvalues and eigenvectors of a sparse matrix sorted using a CSR format (3-vector).

It works fine with small sparse matrices with a size of about ~10,000. However, I got a segmentation fault with a sparse matrix of size ~130,000 or bigger.

Below is the error message I got:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
snpblup 000000000438CD21 tbk_trace_stack_i Unknown Unknown
snpblup 000000000438AE5B tbk_string_stack_ Unknown Unknown
snpblup 00000000043418B4 Unknown Unknown Unknown
snpblup 00000000043416C6 Unknown Unknown Unknown
snpblup 00000000042EACD7 Unknown Unknown Unknown
snpblup 00000000042EF0B0 Unknown Unknown Unknown
libpthread-2.17.s 00002AAAAACDE100 Unknown Unknown Unknown
snpblup 000000000047D059 Unknown Unknown Unknown
snpblup 0000000000451F4A Unknown Unknown Unknown
snpblup 000000000041DBFA Unknown Unknown Unknown
snpblup 0000000000407A10 Unknown Unknown Unknown
snpblup 000000000040741E Unknown Unknown Unknown
libc-2.17.so 00002AAAABD50B15 __libc_start_main Unknown Unknown
snpblup 0000000000407329 Unknown Unknown Unknown

To run it I used 10 (OMP) threads. I set ulimit -s unlimited and exported KMP_STACKSIZE equal to 20G.

I am not sure where is the issue. Any help would be appreciated.

Thank you.

Jeremie

Thread Topic:

Bug Report

↧

MKL library

May 22, 2017, 7:13 am

Latest and popular articles on Intel Technologies

≫ Next: Difference between cgesvd and LAPACKE_cgesvd or cgesdd and LAPACKE_cgesdd

≪ Previous: Segmentation faults with sparse FEAST

Sir

I have to install a code. it requiyes linking of lapack n blas file.
the code was written in 2009 using mkl 8 version. according to it for linking paths are

LROOT = /opt/intel/mkl/lib/intel64/
LAPACK = -lmkl_lapack -lmkl
BLAS = -L$(LROOT) -lmkl_intel64 -lguide -lpthread

LFLAGS = $(LIBSCE) $(BLAS) $(LAPACK)

now i am having 2016 version of mkl. it does not have guide, mkl, pthread etc.
i know
-lmkl_lapack is replaced by lmkl_lapack95_ilp64

how to modify the commands as per 2016 version to link n compile

thanks

Thread Topic:

How-To

↧

Difference between cgesvd and LAPACKE_cgesvd or cgesdd and LAPACKE_cgesdd

May 23, 2017, 5:56 am

Latest and popular articles on Intel Technologies

≫ Next: MKL FFT: fftw_mpi_plan_many_transpose

≪ Previous: MKL library

I would like to know the difference between

LAPACKE_cgesvd

(see https://software.intel.com/en-us/node/521150) and

cgesvd

(see https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/cgesvd_ex.c.htm).

Is LAPACKE_cgesvd just a wrapper around cgesvd automatically selecting the resources? Is there any performance improvement/penalty using the former or the latter?

Thank you very much for any help.

Zone:

Server

Thread Topic:

Question

↧

MKL FFT: fftw_mpi_plan_many_transpose

May 24, 2017, 2:59 pm

Latest and popular articles on Intel Technologies

≫ Next: Problems about how to use multithreaded intel MKL

≪ Previous: Difference between cgesvd and LAPACKE_cgesvd or cgesdd and LAPACKE_cgesdd

Hello,

I have been trying to use the MKL FFT through the FFTW3 interface and now am stuck with plan creation. The following line of code produces a NULL value, which is subsequently caught and kills the program:

plan_t_XY = fftw_mpi_plan_many_transpose(g.nxrtot, g.nyrtot, (g.nzrtot+2), g.nxrtot/g.nProc, g.nyrtot/g.nProc, wr1, wr1, MPI_COMM_WORLD, FFT_PLANNING);

Each of the *tot variables is an integer power of 2, ditto with g.nproc. wr1 is a pointer to a float work array, and FFT_PLANNING is set to FFTW_MEASURE.

Any ideas why

plan_t_XY = NULL

after this?

Thread Topic:

Help Me

↧

Problems about how to use multithreaded intel MKL

May 24, 2017, 7:29 pm

Latest and popular articles on Intel Technologies

≫ Next: Basic linking problem with MKL library

≪ Previous: MKL FFT: fftw_mpi_plan_many_transpose

Hello，

I have some problems about multithreaded intel MKL

1. I use the function MKL_SET_NUM_THREADS(2) to change the thread number. Then I want to check the value of MKL_NUM_THREADS, by using function getnv (mklname, value), mklname = MKL_NUM_THREADS. The value is always 1.

I wonder if the value of MKL_NUM_THREADS will be changed when the parallelization starts.

If so, is there a way to check the value of MKL_NUM_THREADS? How do I know if the value of MKL_NUM_THREADS is really changed after I set it?

2. I use the function MKL_SET_NUM_THREADS(2) to change the thread number. According to the runtime it works and the cpu efficiency is about 200%. The function I test is DGEMM. But when I test DGEQP3 or DGEQPF, the runtime is not changed and the cpu efficiency is always 100%.

I think there is something wrong but I can’t figure it out.

I want to know why.

Is there some examples for me about how to use multithreaded intel MKL?

Thank you ！

OS: Linux, Red Hat 5.5

MKL: compilerpro-12.0.0.084

Ifort: composerxe-2011.0.084

Thread Topic:

Help Me

↧

Basic linking problem with MKL library

May 29, 2017, 12:06 am

Latest and popular articles on Intel Technologies

≫ Next: Access violation in dss_solve_real

≪ Previous: Problems about how to use multithreaded intel MKL

Hi everybody,

Long ago I used to use mkl library 9.1.027, and used it with great joy to run some programs that included a lot of operations on matrices. And it was easy to include in the program, just right click projects->properties and include:

c/c++-->general-->Additional include directories C:\Program Files\Intel\MKL\9.1.027\include

linker-->general-->Additional library directories C:\Program Files\Intel\MKL\9.1.027\ia32\lib

linker-->input-->additional dependencies libguide40.lib mkl_c.lib

and in the code to use

extern "C" {//
#include"mkl_cblas.h"
#include"mkl_lapack.h"
#include "mkl_service.h"
#include "mkl.h"
}//

and the programs worked and the library performed really well. Now i would like to see if there are any improvements that could benefit my programs, and so i downloaded the free mkl library w_mkl_2017.3.210.exe, installed it (got something like IntelSWTools\compilers_and_libraries_2017.4.210), and went to right click projects->properties, removed the include that were used for mkl library 9.1.027. Then per instruction from the website first i went Configuration Properties->Intel Performance Libraries and set the Use Intel MKL option to Parallel, tried to compile program in Visual studio, and it compiled but when I run the program it says "The application was unable to start correctly (0xc000007b). Click OK to close the application". Then I used the command line tool to give me the dependencies to use, to input manually, so I need static linking, for windows, ia32 architecture, openmp, in visual studio, and the tool gave me what to use and I placed the suggestions (hopefully in the right place) but the results were the same, I got the exact same message.

As above I added:

c/c++-->general-->Additional include directories C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\include

linker-->general-->Additional library directories C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\lib\ia32

C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\..\compiler\lib\ia32

linker-->input-->additional dependencies mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib

I have windows 7 ultimate and for Visual Studio 2013 Ultimate.

Any and all help how to make the mkl library run is more than welcome. It can be the automatic linking with Intel performance libraries or by manually setting the needed field (I would prefer this one but anything is ok).

Thanks a lot in advance.

↧

Access violation in dss_solve_real

May 30, 2017, 6:43 am

Latest and popular articles on Intel Technologies

≫ Next: Fortran95 Interface causes segfault

≪ Previous: Basic linking problem with MKL library

Hello,

We are calling MKL from C# code by using general mechanism described in article Using Intel® MKL in your C# program.

The code is using DSS (Direct Sparse Solver) for a large system of equations with numerous different right-hand sides. That is, it makes one call to dss_factor_real and a lot of subsequent calls to dss_solve_real. Generally it works fine. But sometimes, at an advanced interation, an error ocurrs in dss_solve_real:
"Attempted to read or write protected memory. This is often an indication that other memory is corrupt."

So far no particular trigger for this error was detected. In the past I have noticed that if a memory allocation occurs after creating internal dss structures (i.e. after dss_factor_real which obviously must take up a lot of space due to LU decomposition), then a subsequent call to a computing dss function will likely cause an error above. And therefore I am careful not to make any explicit allocations when dss handle is in use. But the error still occurs once in a while.
Just in case, threading is disabled by setting MKL_NUM_THREADS=1.

Any help will be greatly appreciated.

Thread Topic:

Help Me

↧

Fortran95 Interface causes segfault

May 31, 2017, 12:57 am

Latest and popular articles on Intel Technologies

≫ Next: How to delete mkl out-of-core temporary files

≪ Previous: Access violation in dss_solve_real

I am trying to use the zheevd or zheev with their Fortran95 interface:

program main
    use lapack95
    implicit none
    complex(8) :: H(2,2), w(2)

    H = 1d0
    call zheevd(H,w, 'V')
    write (*,*) H
    write (*,*) "######"
    write (*,*) w
end program main

which I compile with:

ifort test.f90 -o bla.x  ${MKLROOT}/lib/intel64/libmkl_blas95_lp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl  -I${MKLROOT}/include/intel64/lp64 -I${MKLROOT}/include

as suggested by the Link Line Adivsor. When I run this simple example I get:

> $ ./bla.x
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
bla.x              00000000016D7984  Unknown               Unknown  Unknown
libpthread-2.23.s  00007F083A8F6390  Unknown               Unknown  Unknown
bla.x              0000000000404257  Unknown               Unknown  Unknown
bla.x              0000000000403B41  Unknown               Unknown  Unknown
bla.x              00000000004039B2  Unknown               Unknown  Unknown
bla.x              000000000040392E  Unknown               Unknown  Unknown
libc-2.23.so       00007F083A02F830  __libc_start_main     Unknown  Unknown
bla.x              0000000000403829  Unknown               Unknown  Unknow

How do I use the Fortran95 Interface correctly? The 77 interface is just to verbose for my liking.

Thread Topic:

How-To

↧