I am using the MKL's preconditioned FGMRES solver and I am trying to understand what exactly is the vector that FGMRES is asking to apply the preconditioner to. From the reference for the solver, Saad's Iterative methods for sparse linear systems, the left-preconditioned GMRES iteration (I'm assuming that FGMRES does left-preconditioning, please correct me if I'm wrong) involves computing at each step M^-1 A v_j. That is, FGMRES first asks to compute the matrix vector product A v_j, and then I would assume that FGMRES would ask to apply the preconditioner on that result, i.e. compute M^-1 A v_j. Only, when I compute the squared 2 norm of the vectors involved, I get that the vector on which FGMRES asks to apply the preconditioner (which I would assume to be A v_j) always has a unit norm, regardless of the norm of A v_j. What is this unit norm vector that FGMRES is asking the user to apply the preconditioner to?
FGMRES preconditioner applied to?
Poor scaling for real-to-real FFT with OpenMP
In the attached file I use MKL to compute a real-to-real FFT using OpenMP for multithreading.
The code is compiled with
icpc -o bench-fft -Wall -O3 -g -march=native -fopenmp bench-fft.cxx -mkl
The machine has 4 cores.
It seems that the code does not scale well with the number of threads.
When run with
OMP_NUM_THREADS=1 ./bench-fft 4194304
the total time taken is 0.1640 user, 0.0440 sys while with
OMP_NUM_THREADS=2 ./bench-fft 4194304
the total time taken is 0.3000 user, 0.0560 sys. So there seems to be a large synchronization overhead since the total CPU time almost doubles.
Is this to be expected or am I doing something wrong in my code.
Thread Topic:
numroc returns incorrect value in scalapack
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #ffe549; background-color: #2c67c8}
span.s1 {font-variant-ligatures: no-common-ligatures}
Thread Topic:
Performing split step fourier using MKL dfti
Hi,
I am trying to perform split step fourier method with the following code. There seems to be a problem with the Fourier portion because when I compare my data with the values obtained with another program, they match perfectly just until I begin the Fourier transform. Here is my code:
program ss
Use MKL_DFTI
implicit none
real*8 , parameter :: distance = 100.0d0
real*8 , parameter ::beta2 = 1.0d0
real*8, parameter :: N = 1.0d0
real*8, parameter :: nt = 1024.0d0
real*8, parameter :: Tmax = 32.0d0
real*8 :: step_num
real*8 :: deltaz
real*8 :: dtau
real*8, dimension (1024) :: tau, omega, uu
real*8, parameter :: pi = 3.141592653590d0
type(DFTI_DESCRIPTOR), POINTER :: handle
complex*16 :: hhz, C= (0,1.0d0)
complex*16, dimension (1024):: dispersion, xy,temp,temp1,temp2
integer :: i, status
step_num = 100000
deltaz = distance/step_num
dtau = (2.0d0*Tmax)/nt
do i = 1,1024
tau (i) = dble(i-513)*dtau
if (i<=512) then
omega (i) = dble(i-1)*pi/Tmax
else
omega (i) = dble(i-1025)*pi/Tmax
end if
end do
uu = 1/dcosh(tau)
dispersion = exp(0.50d0*C*beta2*(omega**2)*deltaz)
hhz = 1*(0,1.0d0)*N**2*deltaz
temp = uu*exp(abs(uu)**2*hhz/2)
status = DftiCreateDescriptor(handle, DFTI_DOUBLE,DFTI_COMPLEX,1,1024)
status = DftiSetValue(handle,DFTI_FORWARD_SCALE,1.0d0)
status = DftiSetValue(handle,DFTI_BACKWARD_SCALE,1/dble(1024))
status = DftiCommitDescriptor(handle)
do j=1,step_num
status = DftiComputeBackward(handle,temp)
do i= 1,1024
temp1(i) = temp(i)*dispersion(i)
end do
Status = DftiComputeForward(handle,temp1)
do i= 1,1024
temp2(i) = temp1(i)*exp(abs(temp1(i))**2*hhz)
end do
end do
status = DftiFreeDescriptor(handle)
xy = temp2*exp(-abs(temp1)**2*hhz/2)
do i=1,1024
open (10, file = 'spec.dat')
write (10,*) tau(i), abs(xy(i)**2)
end do
end program ss
Zone:
Thread Topic:
DFTI_REAL_REAL Speed?
I have some code that makes heavy use of 1-D DFTs using MKL (real-to-complex and complex-to-complex). I just realized that the library supports the DFTI_REAL_REAL layout, where the real and imaginary parts of complex numbers are stored in separate arrays. I know that this can result in more efficient implementations of some algorithms due to a reduced need for SIMD shuffles. I thought that I would ask here before rearchitecting my application to use split complex layout: could I expect any speedup in the DFT implementation by using split complex versus my current interleaved layout? I run this software on AVX, AVX2, and AVX512 platforms currently.
Trust Region Random Results
We Implemented TR MKL Method for our purpose.
It doesn't converge every time, and also produce different (random) solutions from the same initial condition and constraints.
About Trust Region Size parameter we use 4 size aproach:
this is, 1st time value is 100, 2nd time 10, 3rd time 1 and 0.1 value last time.
Every time we use solver, we use previous results as input, if improved.
Is there anyone tha knows the problem?
Is there any suggestion about Trust Region Size usage? best values to use and when?
Thank you very much
Gianluca
Eigenvalues not in ascending order
Hey there,
is there a diagonalisation routine, that does not order the eigenvalues in an ascending way?
Thanks in advance,
sommerfeld
MKL PARDISO: floating-point error with zero pivots ...
Hi,
we have detected problems with the current version of the PARDISO solver.
(MKL vers. 2017.4.210, Windows x64 architecture, static linking, Microsoft VS 2015).
PARDISO produces a floating-point error in phase 22, if the coefficient matrix of a FEM analysis has zero pivots.
Another problem is, that PARDISO doesn’t return the correct line number of the first pivot element (iparm[29]).
In former versions PARDISO worked well in this regard.
I have attached some screen shots and a code sample to demonstrate the above problems.
Thank you for your answer.
Regards
Dr. Guenter Kaufels, InfoGraph GmbH, Aachen, Germany
Strange bad memory of MKL spline function
Dear all,
I am using MKL compilers_and_libraries_2017.4.210 (update 3) and following the developer guide to construct a cubic spline interpolation workflow. However no matter how I configure my inputs the scoeff values from dfdEditPPSpline1D() output stays bad memory allocation. The function return a good status but then crashes in the next function dfdConstruct1D(). I wonder whether I miss something in my configuration or inclusion of files, or whether there is a bug. My source code was copied below with dummy input data.
Regards,
Tianhua
--------------------------------------------
For compile, I included 34 h files that are enough for my project.
For link I included mkl_core.lib, mkl_core_dll.lib, mkl_intel_ilp64.lib, mkl_intel_ilp64_dll.lib, mkl_sequential.lib and mkl_sequential_dll.lib.
For execution, I included mkl_core.dll, mkl_sequential.dll and mkl_vml_def.dll
Those appear enough.
Dummy source code which has bad memory of scoeff pointer and crashes in the next function.
------{
//setup MKL data structures
#define SPLINE_ORDER DF_PP_CUBIC /* A cubic spline to construct */
int status; /* Status of a Data Fitting operation */
DFTaskPtr task; /* Data Fitting operations are task based */
/* Parameters describing the partition */
MKL_INT nx; /* The size of partition x */
MKL_INT xhint; /* Additional information about the structure of breakpoints */
/* Parameters describing the function */
MKL_INT ny; /* Function dimension */
MKL_INT yhint; /* Additional information about the function */
/* Parameters describing the spline */
MKL_INT s_order; /* Spline order */
MKL_INT s_type; /* Spline type */
MKL_INT ic_type; /* Type of internal conditions */
double* ic; /* Array of internal conditions */
MKL_INT bc_type; /* Type of boundary conditions */
double* bc; /* Array of boundary conditions */
double scoeff[(20 - 1)* SPLINE_ORDER]; /* Array of spline coefficients */
MKL_INT scoeffhint; /* Additional information about the coefficients */
MKL_INT sitehint; /* Additional information about the structure of
interpolation sites */
MKL_INT ndorder, dorder; /* Parameters defining the type of interpolation */
double* datahint; /* Additional information on partition and interpolation sites */
double *r; /* Array of interpolation results */
r = new double[20];
MKL_INT rhint; /* Additional information on the structure of the results */
MKL_INT* cell; /* Array of cell indices */
/* Initialize the partition */
nx = 10;
nsite = 20;
double x[10], y[10];
for (int i = 0; i < 10; i++)
{
x[i] = i;
y[i] = pow(i, 3);
}
for (int i = 0; i < 20; i++)
{
site[i] = i;
}
for (int i = 0; i < 19 * SPLINE_ORDER; i++)
scoeff[i] = -9999.0;
xhint = DF_NON_UNIFORM_PARTITION; /* The partition is non-uniform. */
/* Initialize the function */
ny = 1; /* The function is scalar. */
yhint = DF_NO_HINT; /* No additional information about the function is provided. */
status = dfdNewTask1D(&task, nx, x, xhint, ny, y, yhint);
if (status == 0)
{
/* Initialize spline parameters */
s_order = DF_PP_CUBIC; /* Spline is of the fourth order (cubic spline). */
s_type = DF_PP_NATURAL; /* Spline is of the natural cubic spline type. */
/* Define internal conditions for cubic spline construction (none in this example) */
ic_type = DF_NO_IC;
ic = NULL;
/* Use not-a-knot boundary conditions. In this case, the is first and the last
interior breakpoints are inactive, no additional values are provided. */
bc_type = DF_BC_FREE_END; //natural cubic spline
bc = NULL;
scoeffhint = DF_NO_HINT; /* No additional information about the spline. */
status = dfdEditPPSpline1D(task, s_order, s_type, bc_type, 0, ic_type,
0, &scoeff[0], scoeffhint);
//continue only when the internal boundary condition task is successful
if (status == 0)
{
/* Use a standard method to construct a cubic Bessel spline: */
/* Pi(x) = ci,0 + ci,1(x - xi) + ci,2(x - xi)2 + ci,3(x - xi)3, */
/* The library packs spline coefficients to array scoeff: */
/* scoeff[4*i+0] = ci,0, scoef[4*i+1] = ci,1, */
/* scoeff[4*i+2] = ci,2, scoef[4*i+1] = ci,3, */
/* i=0,...,N-2 */
status = dfdConstruct1D(task, DF_PP_SPLINE, DF_METHOD_STD);
// skip the data construction check, TZ May 2017
sitehint = DF_NON_UNIFORM_PARTITION; /* Partition of sites is non-uniform */
ndorder = 1;
dorder = 1;
datahint = DF_NO_APRIORI_INFO; /* No additional information about breakpoints or
sites is provided. */
rhint = DF_MATRIX_STORAGE_ROWS; /* The library packs interpolation results
in row-major format. */
cell = NULL; /* Cell indices are not required. */
/* Solve interpolation problem using the default method: compute the spline values
at the points site(i), i=0,..., nsite-1 and place the results to array r */
status = dfdInterpolate1D(task, DF_INTERP, DF_METHOD_PP, nsite, site,
sitehint, ndorder, &dorder, datahint, r, rhint, cell);
/* De-allocate Data Fitting task resources */
status = dfDeleteTask(&task);
}
--------------------
Thread Topic:
libmkl_blacs_openmpi compatibility
Hello everybody,
I have a question regarding the libmkl_blacs_openmpi* libraries. Which Openmpi version is this library supposed to be compatible with ?
I could not find this information in the usual MKL or compiler release notes. By testing I determined that the libmkl_blacs_openmpi_lp64.so from the MKL which is bundled with intel 2016 update 4 is compatible with openmpi 2.0, i.e. programs using the libmkl_scalapack_lp64.so work and apparently give correct results. However, using the libraries from the intel 2017 update 2 distribution together with openmpi 2.0 and 2.1 gives programs producing a runtime error as soon as BLACS routines are called. I had no time to test intel 2017 update 4 yet, but an authoritative answer on the compatibility would be helpful even if it should work with update 4.
Of course if this is documented in detail somewhere a pointer to the documentation is appreciated, too.
Best Regards
Christof
"Trust Region Algorithm" Questions
Along this period, we have developed a calculation method that uses the Trust Region MKL API (with constraints).
We found many difficulties, but after a lot of efforts we have obtained some quite good results.
By the way, we have found also some strange behavior of your functions (eg. dtrnlspbc_solve …).
Here some question that can help us and also other users to understand the usage of this algorithm better:
1) If we enlarge the constraints the calculation seems more stable. Maybe, the constraints work also during optimization process? In this case, the search procedure could not found some results insides the constraints range. Do you confirm this?
2) It seems that the algorithm fails when it found minimums near the constraints. Is this problem related with the Jacobian?
3) With same input values (initial conditions and constraints) it produces different results, sometimes very closed each other. Is it used random number generator inside the algorithm? Is this the reason?
4) Is there any suggested criteria to setup the trust region size parameter?
5) Is there any suggested criteria to setup the constraints?
Thank you very much
Gianluca
Are LAPACKE_cgesdd and LAPACKE_cgesvd SVD calculations reliable?
I'm using LAPACKE_cgesdd and LAPACKE_cgesvd to compute the singular values of a matrix. Both the routines have the option to compute the singular values only. The problem I have is that, in the following four test cases:
- Full SVD with LAPACKE_cgesdd;
- Full SVD with LAPACKE_cgesvd;
- Singular values only with LAPACKE_cgesdd;
- Singular values only with LAPACKE_cgesvd.
I receive different singular values. In particular:
Test, 3 x 4 matrix
a[0].real(5.91); a[0].imag(-5.96); a[1].real(7.09); a[1].imag(2.72); a[2].real(7.78); a[2].imag(-4.06); a[3].real(-0.79); a[3].imag(-7.21); a[4].real(-3.15); a[4].imag(-4.08); a[5].real(-1.89); a[5].imag(3.27); a[6].real(4.57); a[6].imag(-2.07); a[7].real(-3.88); a[7].imag(-3.30); a[8].real(-4.89); a[8].imag(4.20); a[9].real(4.10); a[9].imag(-6.70); a[10].real(3.28); a[10].imag(-3.84); a[11].real(3.84); a[11].imag(1.19);
Full SVD with LAPACKE_cgesdd
17.8592720031738 11.4463796615601 6.74482488632202
Full SVD with LAPACKE_cgesvd
17.8651084899902 11.3695945739746 6.83876800537109
Singular values only with LAPACKE_cgesdd
17.8592758178711 11.4463806152344 6.74482440948486
Singular values only with LAPACKE_cgesvd
17.8705902099609 11.5145053863525 6.82878828048706
As it can be seen, even for the same routine, the results can change since the third significant digit when switching from full SVD to singular values only.
My questions are:
Is this reasonable?
Am I doing something wrong?
Thank you in advance for any help.
This is the code I'm using:
#include <stdlib.h> #include <stdio.h> #include <algorithm> // std::min #include <time.h> #include <complex> #include <mkl.h> #include "mkl_lapacke.h" #include "TimingCPU.h" #include "InputOutput.h" //#define FULL_SVD #define PRINT_MATRIX #define PRINT_SINGULAR_VALUES //#define PRINT_LEFT_SINGULAR_VECTORS //#define PRINT_RIGHT_SINGULAR_VECTORS #define SAVE_MATRIX #define SAVE_SINGULAR_VALUES //#define SAVE_LEFT_SINGULAR_VECTORS //#define SAVE_RIGHT_SINGULAR_VECTORS #define GESDD //#define GESVD /*************************************************************/ /* PRINT A SINGLE PRECISION COMPLEX MATRIX STORED COLUMNWISE */ /*************************************************************/ void print_matrix_col(char const *desc, MKL_INT Ncols, MKL_INT Nrows, std::complex<float>* a, MKL_INT LDA) { printf( "\n %s\n[", desc); for(int i = 0; i < Ncols; i++) { for(int j = 0; j < Nrows; j++) printf("(%6.2f,%6.2f)", a[i * LDA + j].real(), a[i * LDA + j].imag()); printf( "\n" ); } } /**********************************************************/ /* PRINT A SINGLE PRECISION COMPLEX MATRIX STORED ROWWISE */ /**********************************************************/ void print_matrix_row(char const *desc, int Nrows, int Ncols, std::complex<float>* a, int LDA) { printf( "\n %s\n", desc); for (int i = 0; i < Ncols; i++) { for (int j = 0; j < Ncols; j++) printf("%2.10f + 1i * %2.10f ", a[i * LDA + j].real(), a[i * LDA + j].imag()); printf( ";\n" ); } } /****************************************/ /* PRINT A SINGLE PRECISION REAL MATRIX */ /****************************************/ void print_rmatrix(char const *desc, MKL_INT m, MKL_INT n, float* a, MKL_INT lda ) { MKL_INT i, j; printf( "\n %s\n", desc ); for( i = 0; i < m; i++ ) { for( j = 0; j < n; j++ ) printf( " %6.2f", a[i*lda+j] ); printf( "\n" ); } } /********/ /* MAIN */ /********/ int main() { const int Nrows = 3; // --- Number of rows const int Ncols = 4; // --- Number of columns const int LDA = Ncols; const int LDU = Nrows; const int LDVT = Ncols; const int numRuns = 20; // --- Number of runs for timing TimingCPU timerCPU; // --- Allocating space and initializing the input matrix std::complex<float> *a = (std::complex<float> *)malloc(Nrows * Ncols * sizeof(std::complex<float>)); srand(time(NULL)); // for (int k = 0; k < Nrows * Ncols; k++) { // a[k].real((float)rand() / (float)(RAND_MAX)); // a[k].imag((float)rand() / (float)(RAND_MAX)); // } a[0].real(5.91); a[0].imag(-5.96); a[1].real(7.09); a[1].imag(2.72); a[2].real(7.78); a[2].imag(-4.06); a[3].real(-0.79); a[3].imag(-7.21); a[4].real(-3.15); a[4].imag(-4.08); a[5].real(-1.89); a[5].imag(3.27); a[6].real(4.57); a[6].imag(-2.07); a[7].real(-3.88); a[7].imag(-3.30); a[8].real(-4.89); a[8].imag(4.20); a[9].real(4.10); a[9].imag(-6.70); a[10].real(3.28); a[10].imag(-3.84); a[11].real(3.84); a[11].imag(1.19); // --- Allocating space for the singular vector matrices #ifdef FULL_SVD std::complex<float> *u = (std::complex<float> *)malloc(Nrows * LDU * sizeof(std::complex<float>)); std::complex<float> *vt = (std::complex<float> *)malloc(Ncols * LDVT * sizeof(std::complex<float>)); #endif // --- Allocating space for the singular values float *s = (float *)malloc(std::min(Nrows, Ncols) * sizeof(float)); #ifdef GESVD float *superb = (float *)malloc((std::min(Nrows, Ncols) - 1) * sizeof(float)); #endif // --- Print and/or save input matrix #ifdef PRINT_MATRIX print_matrix_row("Matrix (stored rowwise)", Ncols, Nrows, a, LDA); #endif #ifdef SAVE_MATRIX saveCPUcomplextxt(a, "/home/angelo/Project/SVD/MKL/a.txt", Ncols * Nrows); #endif // --- Compute singular values MKL_INT info; float timing = 0.f; for (int k = 0; k < numRuns; k++) { timerCPU.StartCounter(); // --- The content of the input matrix a is destroyed on output #if defined(FULL_SVD) && defined(GESDD) printf("Running Full SVD - GESDD\n"); MKL_INT info = LAPACKE_cgesdd(LAPACK_ROW_MAJOR, 'A', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s, (MKL_Complex8 *)u, LDU, (MKL_Complex8 *)vt, LDVT); #endif #if !defined(FULL_SVD) && defined(GESDD) printf("Running singular values only - GESDD\n"); MKL_INT info = LAPACKE_cgesdd(LAPACK_ROW_MAJOR, 'N', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s, NULL, LDU, NULL, LDVT); #endif #if defined(FULL_SVD) && defined(GESVD) printf("Running Full SVD - GESVD\n"); MKL_INT info = LAPACKE_cgesvd(LAPACK_ROW_MAJOR, 'A', 'A', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s, (MKL_Complex8 *)u, LDU, (MKL_Complex8 *)vt, LDVT, superb); #endif #if !defined(FULL_SVD) && defined(GESVD) printf("Running singular values only - GESVD\n"); MKL_INT info = LAPACKE_cgesvd(LAPACK_ROW_MAJOR, 'N', 'N', Nrows, Ncols, (MKL_Complex8 *)a, LDA, s, NULL, LDU, NULL, LDVT, superb); #endif if(info > 0) { // --- Check for convergence printf( "The algorithm computing SVD failed to converge.\n" ); exit(1); } timing = timing + timerCPU.GetCounter(); } printf("Timing = %f\n", timing / numRuns); // --- Print and/or save singular values #ifdef PRINT_SINGULAR_VALUES print_rmatrix("Singular values", 1, Ncols, s, 1); #endif #ifdef SAVE_SINGULAR_VALUES saveCPUrealtxt(s, "/home/angelo/Project/SVD/MKL/s.txt", std::min(Nrows, Ncols)); #endif // --- Print left singular vectors #ifdef PRINT_LEFT_SINGULAR_VECTORS print_matrix_col("Left singular vectors (stored columnwise)", Ncols, Ncols, u, LDU); #endif #if defined(FULL_SVD) && defined(SAVE_LEFT_SINGULAR_VECTORS) saveCPUcomplextxt(u, "/home/angelo/Project/SVD/MKL/u.txt", Nrows * LDU); #endif // --- Print right singular vectors #ifdef PRINT_RIGHT_SINGULAR_VECTORS print_matrix_col("Right singular vectors (stored rowwise)", Ncols, Nrows, vt, LDVT); #endif #if defined(FULL_SVD) && defined(SAVE_RIGHT_SINGULAR_VECTORS) saveCPUcomplextxt(vt, "/home/angelo/Project/SVD/MKL/vt.txt", Ncols * LDVT); #endif exit(0); }
compiled as
g++ -DMKL_ILP64 -fopenmp -m64 -I$MKLROOT/include svdMKLComplexSingle.cpp TimingCPU.cpp InputOutput.cpp -L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm -fno-exceptions
Zone:
Thread Topic:
Segmentation faults with sparse FEAST
Dear,
I am using the interface dfeast_scsrev for computing eigenvalues and eigenvectors of a sparse matrix sorted using a CSR format (3-vector).
It works fine with small sparse matrices with a size of about ~10,000. However, I got a segmentation fault with a sparse matrix of size ~130,000 or bigger.
Below is the error message I got:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
snpblup 000000000438CD21 tbk_trace_stack_i Unknown Unknown
snpblup 000000000438AE5B tbk_string_stack_ Unknown Unknown
snpblup 00000000043418B4 Unknown Unknown Unknown
snpblup 00000000043416C6 Unknown Unknown Unknown
snpblup 00000000042EACD7 Unknown Unknown Unknown
snpblup 00000000042EF0B0 Unknown Unknown Unknown
libpthread-2.17.s 00002AAAAACDE100 Unknown Unknown Unknown
snpblup 000000000047D059 Unknown Unknown Unknown
snpblup 0000000000451F4A Unknown Unknown Unknown
snpblup 000000000041DBFA Unknown Unknown Unknown
snpblup 0000000000407A10 Unknown Unknown Unknown
snpblup 000000000040741E Unknown Unknown Unknown
libc-2.17.so 00002AAAABD50B15 __libc_start_main Unknown Unknown
snpblup 0000000000407329 Unknown Unknown Unknown
To run it I used 10 (OMP) threads. I set ulimit -s unlimited and exported KMP_STACKSIZE equal to 20G.
I am not sure where is the issue. Any help would be appreciated.
Thank you.
Jeremie
Thread Topic:
MKL library
Sir
I have to install a code. it requiyes linking of lapack n blas file.
the code was written in 2009 using mkl 8 version. according to it for linking paths are
LROOT = /opt/intel/mkl/lib/intel64/
LAPACK = -lmkl_lapack -lmkl
BLAS = -L$(LROOT) -lmkl_intel64 -lguide -lpthread
LFLAGS = $(LIBSCE) $(BLAS) $(LAPACK)
now i am having 2016 version of mkl. it does not have guide, mkl, pthread etc.
i know
-lmkl_lapack is replaced by lmkl_lapack95_ilp64
how to modify the commands as per 2016 version to link n compile
thanks
ab
Thread Topic:
Difference between cgesvd and LAPACKE_cgesvd or cgesdd and LAPACKE_cgesdd
I would like to know the difference between
LAPACKE_cgesvd
(see https://software.intel.com/en-us/node/521150) and
cgesvd
Is LAPACKE_cgesvd just a wrapper around cgesvd automatically selecting the resources? Is there any performance improvement/penalty using the former or the latter?
Thank you very much for any help.
Zone:
Thread Topic:
MKL FFT: fftw_mpi_plan_many_transpose
Hello,
I have been trying to use the MKL FFT through the FFTW3 interface and now am stuck with plan creation. The following line of code produces a NULL value, which is subsequently caught and kills the program:
plan_t_XY = fftw_mpi_plan_many_transpose(g.nxrtot, g.nyrtot, (g.nzrtot+2), g.nxrtot/g.nProc, g.nyrtot/g.nProc, wr1, wr1, MPI_COMM_WORLD, FFT_PLANNING);
Each of the *tot variables is an integer power of 2, ditto with g.nproc. wr1 is a pointer to a float work array, and FFT_PLANNING is set to FFTW_MEASURE.
Any ideas why
plan_t_XY = NULL
after this?
Thread Topic:
Problems about how to use multithreaded intel MKL
Hello,
I have some problems about multithreaded intel MKL
1. I use the function MKL_SET_NUM_THREADS(2) to change the thread number. Then I want to check the value of MKL_NUM_THREADS, by using function getnv (mklname, value), mklname = MKL_NUM_THREADS. The value is always 1.
I wonder if the value of MKL_NUM_THREADS will be changed when the parallelization starts.
If so, is there a way to check the value of MKL_NUM_THREADS? How do I know if the value of MKL_NUM_THREADS is really changed after I set it?
2. I use the function MKL_SET_NUM_THREADS(2) to change the thread number. According to the runtime it works and the cpu efficiency is about 200%. The function I test is DGEMM. But when I test DGEQP3 or DGEQPF, the runtime is not changed and the cpu efficiency is always 100%.
I think there is something wrong but I can’t figure it out.
I want to know why.
Is there some examples for me about how to use multithreaded intel MKL?
Thank you !
OS: Linux, Red Hat 5.5
MKL: compilerpro-12.0.0.084
Ifort: composerxe-2011.0.084
Thread Topic:
Basic linking problem with MKL library
Hi everybody,
Long ago I used to use mkl library 9.1.027, and used it with great joy to run some programs that included a lot of operations on matrices. And it was easy to include in the program, just right click projects->properties and include:
c/c++-->general-->Additional include directories C:\Program Files\Intel\MKL\9.1.027\include
linker-->general-->Additional library directories C:\Program Files\Intel\MKL\9.1.027\ia32\lib
linker-->input-->additional dependencies libguide40.lib mkl_c.lib
and in the code to use
extern "C" {//
#include"mkl_cblas.h"
#include"mkl_lapack.h"
#include "mkl_service.h"
#include "mkl.h"
}//
and the programs worked and the library performed really well. Now i would like to see if there are any improvements that could benefit my programs, and so i downloaded the free mkl library w_mkl_2017.3.210.exe, installed it (got something like IntelSWTools\compilers_and_libraries_2017.4.210), and went to right click projects->properties, removed the include that were used for mkl library 9.1.027. Then per instruction from the website first i went Configuration Properties->Intel Performance Libraries and set the Use Intel MKL option to Parallel, tried to compile program in Visual studio, and it compiled but when I run the program it says "The application was unable to start correctly (0xc000007b). Click OK to close the application". Then I used the command line tool to give me the dependencies to use, to input manually, so I need static linking, for windows, ia32 architecture, openmp, in visual studio, and the tool gave me what to use and I placed the suggestions (hopefully in the right place) but the results were the same, I got the exact same message.
As above I added:
c/c++-->general-->Additional include directories C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\include
linker-->general-->Additional library directories C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\lib\ia32
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\..\compiler\lib\ia32
linker-->input-->additional dependencies mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib
I have windows 7 ultimate and for Visual Studio 2013 Ultimate.
Any and all help how to make the mkl library run is more than welcome. It can be the automatic linking with Intel performance libraries or by manually setting the needed field (I would prefer this one but anything is ok).
Thanks a lot in advance.
Access violation in dss_solve_real
Hello,
We are calling MKL from C# code by using general mechanism described in article Using Intel® MKL in your C# program.
The code is using DSS (Direct Sparse Solver) for a large system of equations with numerous different right-hand sides. That is, it makes one call to dss_factor_real and a lot of subsequent calls to dss_solve_real. Generally it works fine. But sometimes, at an advanced interation, an error ocurrs in dss_solve_real:
"Attempted to read or write protected memory. This is often an indication that other memory is corrupt."
So far no particular trigger for this error was detected. In the past I have noticed that if a memory allocation occurs after creating internal dss structures (i.e. after dss_factor_real which obviously must take up a lot of space due to LU decomposition), then a subsequent call to a computing dss function will likely cause an error above. And therefore I am careful not to make any explicit allocations when dss handle is in use. But the error still occurs once in a while.
Just in case, threading is disabled by setting MKL_NUM_THREADS=1.
Any help will be greatly appreciated.
Thread Topic:
Fortran95 Interface causes segfault
I am trying to use the zheevd or zheev with their Fortran95 interface:
program main use lapack95 implicit none complex(8) :: H(2,2), w(2) H = 1d0 call zheevd(H,w, 'V') write (*,*) H write (*,*) "######" write (*,*) w end program main
which I compile with:
ifort test.f90 -o bla.x ${MKLROOT}/lib/intel64/libmkl_blas95_lp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -I${MKLROOT}/include/intel64/lp64 -I${MKLROOT}/include
as suggested by the Link Line Adivsor. When I run this simple example I get:
> $ ./bla.x forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source bla.x 00000000016D7984 Unknown Unknown Unknown libpthread-2.23.s 00007F083A8F6390 Unknown Unknown Unknown bla.x 0000000000404257 Unknown Unknown Unknown bla.x 0000000000403B41 Unknown Unknown Unknown bla.x 00000000004039B2 Unknown Unknown Unknown bla.x 000000000040392E Unknown Unknown Unknown libc-2.23.so 00007F083A02F830 __libc_start_main Unknown Unknown bla.x 0000000000403829 Unknown Unknown Unknow
How do I use the Fortran95 Interface correctly? The 77 interface is just to verbose for my liking.