How to compile and run Quantum Espresso with intel MKL and Open MPI on Mac OS X

February 13, 2019, 11:45 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL 5 times slower when called from a MEX function in Octave

≪ Previous: dfeast_sygv -4 error, BUT B IS POSITIVE DEFINITE!!!

I want to run Quantum Espresso faster on Mac Pro (multi-core). I think using intel MKL(+FTW3) and Open MPI with intel compiler is the best solution. It was a process of trial and error, and I finally achieved to compile pw.x. But, when I run pw.x for example

~/q-e/bin/pw.x < graphene.scf.in > graphene.scf.out

I got an eroor.

forrtl: severe (174): SIGSEGV, segmentation fault occurred 
Image PC Routine Line Source
pw.x 000000011021EB14 Unknown Unknown Unknown
libsystem_platfor 00007FFF560EFF5A Unknown Unknown Unknown
pw.x 000000010FE3D7CB Unknown Unknown Unknown
....

I assume that the compilation is wrong. I show machine environments and compile processes below.

Mac Pro : 2.7 GHz 12-Core Intel Xeon E5 : 64 GB 1866 MHz DDR3 : High Sierra 10.13.6
Xcode : 9.4.1
Intel compiler(fortran) : compilers_and_libraries_2019.1.144
Intel compiler(C++,MKL) : compilers_and_libraries_2019.2.184
Open MPI : 3.0.3

And I did "modify install" to install "Cluster support" and "Intel MKL core libraries for Fortran" at compilers_and_libraries_2019.2.184

path settings

source /opt/intel/compilers_and_libraries_2019.1.144/mac/bin/compilervars.sh intel64
source /opt/intel/compilers_and_libraries_2019.2.184/mac/bin/compilervars.sh intel64

source /opt/intel/compilers_and_libraries_2019/mac/mkl/bin/mklvars.sh intel64

compile FFTW3 with intel compiler

cd ${MKLROOT}/interfaces/fftw3xf
sudo make libintel64 compiler=intel

```
cp libfftw3xf_intel.a ${MKLROOT}/lib/
```

compile Open MPI with intel compiler

./configure --prefix=/opt/openmpi CC=icc CXX=icpc F77=ifort FC=ifort
sudo make
sudo make install

echo 'export PATH=/opt/openmpi/bin:$PATH'>> .bashrc

make settings for using mkl on openmpi

cd ${MKLROOT}/interfaces/mklmpi
sudo make libintel64 interface=ilp64 MPICC='mpicc' INSTALL_LIBNAME
  ='libmkl_blacs_openmpi_ilp64'

cp obj_ilp64/libmkl_blacs_openmpi_ilp64.a ${MKLROOT}/lib/

make QE (only PW)

./configure --enable-parallel --with-scalapack=openmpi --disable-openmp F90=ifort
  MPIF90=mpif90 CC=mpicc CXX=icc F77=ifort 
  LAPACK_LIBS=" ${MKLROOT}/lib/libmkl_intel_ilp64.a ${MKLROOT}/lib/libmkl
  _sequential.a ${MKLROOT}/lib/libmkl_core.a -lpthread -lm -ldl" 
  BLAS_LIBS="${MKLROOT}/lib/libmkl_intel_ilp64.a ${MKLROOT}/lib/libmkl
  _sequential.a ${MKLROOT}/lib/libmkl_core.a -lpthread -lm -ldl" 
  FFT_LIBS="${MKLROOT}/lib/libfftw3xf_intel.a -lmkl_intel_ilp64 -lmkl_sequential
   -lmkl_core -lpthread" MPI_LIBS="-L/opt/openmpi/lib" 
  DFLAGS="-D__INTEL -D__FFTW3 -D__MPI -D__SCALAPACK -D__PARA" 
  IFLAGS="-I/Users/username/q-e/include -I/Users/username/q-e/FoX/finclude 
  -I/Users/username/q-e/S3DE/iotk/include/ 
  -I${MKLROOT}/include -I${MKLROOT}/include/fftw" 
  SCALAPACK_LIBS="${MKLROOT}/lib/libmkl_scalapack_ilp64.a 
  ${MKLROOT}/lib/libmkl_cdft_core.a ${MKLROOT}/lib/libmkl_intel_ilp64.a 
  ${MKLROOT}/lib/libmkl_sequential.a ${MKLROOT}/lib/libmkl_core.a 
  ${MKLROOT}/lib/libmkl_blacs_openmpi_ilp64.a -lpthread -lm -ldl" 
  LDFLAGS="-static-intel"

make -j8 pw

* At the step 4.1, warnings happened.

mpicc -c -Wall -fPIC    -DMKL_ILP64 -I../../include mklmpi-impl.c -o obj_ilp64/mklmpi-impl.o
mklmpi-impl.c(87): warning #1786: variable "ompi_mpi_ub" (declared at line 928 of "/opt/open
  mpi/include/mpi.h") was declared deprecated ("MPI_UB is deprecated in MPI-2.0")
      RETURN_IF(xdatatype,MPI_UB);
      ^

mklmpi-impl.c(338): warning #1786: function "MPI_Address" (declared at line 1201 of "/opt/open
  mpi/include/mpi.h") was declared deprecated ("MPI_Address is superseded by MPI_Get_address in MPI-2.0")
  	int err = MPI_Address(location, address);
  	          ^

mklmpi-impl.c(397): warning #1786: function "MPI_Attr_get" (declared at line 1241 of "/opt/open
  mpi/include/mpi.h") was declared deprecated ("MPI_Attr_get is superseded by MPI_Comm_get_attr in MPI-2.0")
      int res = MPI_Attr_get(X2COMM(comm),X2KEYVAL(keyval),attribute_val,flag);
                ^

Sorry for writing such a long message.　How should I modify above steps to run Quantum Espresso?

FYI : I read the topic "How to correctly compile Quantum Espresso with intel MKL especially intel FFTW" , and I post my question in this forum.

↧

MKL 5 times slower when called from a MEX function in Octave

February 14, 2019, 1:03 am

Latest and popular articles on Intel Technologies

≫ Next: Memory Leak in MKL

≪ Previous: How to compile and run Quantum Espresso with intel MKL and Open MPI on Mac OS X

Hi,

I'm trying to use Intel MKL library in an Octave MEX function, but the performance that I achieve using some MKL functions such as cblas_cgemm is 5 time slower when called from Octave rather than a compiled C executable. I'm using the same compilation flags for both C code and MEX functions in my testing, where I basically compare the speed of a very simple C matrix multiplication script and the same script wrapped in a MEX function (find this short example attached).

This is how I compile the C code:

gcc -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include -Wall -L/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 -o cgemm_test_c matmult_c.c -lmkl_gnu_thread -lmkl_rt -lmkl_core -lmkl_intel_ilp64 -lgomp -lpthread -lm -ldl

This is how I compile the MEX function:

mex -v -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include -Wall -L/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 -o cgemm_test_mex matmult_c.c matmult_mex.c -lmkl_gnu_thread -lmkl_rt -lmkl_core -lmkl_intel_ilp64 -lgomp -lpthread -lm -ldl

And this is what the mex command is really doing:

gcc -c -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/octave-4.2.2/octave/.. -I/usr/include/octave-4.2.2/octave -I/usr/include/hdf5/serial  -pthread -fopenmp -g -O2 -fdebug-prefix-map=/build/octave-DtqyIg/octave-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security  -Wall  -I. -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include  -DMEX_DEBUG matmult_c.c -o matmult_c.o

gcc -c -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/octave-4.2.2/octave/.. -I/usr/include/octave-4.2.2/octave -I/usr/include/hdf5/serial  -pthread -fopenmp -g -O2 -fdebug-prefix-map=/build/octave-DtqyIg/octave-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security  -Wall  -I. -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include  -DMEX_DEBUG matmult_mex.c -o matmult_mex.o

g++ -I/usr/include/octave-4.2.2/octave/.. -I/usr/include/octave-4.2.2/octave -I/usr/include/hdf5/serial -I/usr/include/mpi  -pthread -fopenmp -g -O2 -fdebug-prefix-map=/build/octave-DtqyIg/octave-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security -shared -Wl,-Bsymbolic  -Wall -o cgemm_test_mex.mex  matmult_c.o matmult_mex.o   -L/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 -lmkl_gnu_thread -lmkl_rt -lmkl_core -lmkl_intel_ilp64 -lgomp -lpthread -lm -ldl -L/usr/lib/x86_64-linux-gnu/octave/4.2.2 -L/usr/lib/x86_64-linux-gnu -loctinterp -loctave -Wl,-Bsymbolic-functions -Wl,-z,relro

Test results:

C code: Elapsed time per multiplication: ~1.86 ms

MEX code: Elapsed time per multiplication: ~8.55ms

I have tested different optimisation flags but the results are virtually the same thing. This has been tested in 2 Intel machines with Ubuntu 18.04 and Ubuntu 14.04, yielding very similar results in all cases. MKL environment variables are set as per "source /opt/intel/mkl/bin/mklvars.sh intel64"

Many thanks in advance,

Juan.

Attachment	Size
Download matmult_c.c	1.45 KB
Download matmult_mex.c	136 bytes
Download matmult.h	134 bytes

↧

Memory Leak in MKL

February 14, 2019, 8:40 am

Latest and popular articles on Intel Technologies

≫ Next: Typos in the mkl_spblas.h?

≪ Previous: MKL 5 times slower when called from a MEX function in Octave

Our program analyzes several images every few seconds. Each image is processed in its own thread. The threads are ordinary Windows threads. MKL functions are used in the process. We used a memory leak detection tool to find leaks in the application (Memory Validator by Software Verification). The tool revealed a leak in calls to MKL vsMul function (32 bits for each image). The call stack is as follows:

vsMul->mkl_vml_kernel_GetTTableIndex->_vmlGetThreadLocalData

I found a note in Intel documentation (https://software.intel.com/en-us/mkl-windows-developer-guide-avoiding-me...) and accordingly I tried to use both the mkl_disable_fast_mm function and the MKL_DISABLE_FAST_MM environment variable. None of them helped to eliminate the memory leak.

How can I overcome this memory leak problem?

↧

Typos in the mkl_spblas.h?

February 14, 2019, 11:16 am

Latest and popular articles on Intel Technologies

≫ Next: bad interaction betwenn mkl_dynamic and potri in mkl 19.02???

≪ Previous: Memory Leak in MKL

Some of the function declarations in mkl_spblas.h do not agree with the documentation. In particular, in the create and export routines, the parameter names for the CSC format coincide with those for CSR format, which I don't believe accurately describes the nature of the parameters (e.g. `rows_start` and `col_indx` should be `cols_start` and `row_indx`).

Below are a few of the declarations I find in the mkl_spblas.h file:

sparse_status_t mkl_sparse_d_create_csc(
sparse_matrix_t *A,
const sparse_index_base_t indexing, /* indexing: C-style or Fortran-style */
const MKL_INT rows,
const MKL_INT cols,
MKL_INT *rows_start,
MKL_INT *rows_end,
MKL_INT *col_indx,
double *values );
/* cf. https://software.intel.com/en-us/mkl-developer-reference-c-mkl-sparse-cr... */

sparse_status_t mkl_sparse_d_export_csc(
const sparse_matrix_t source,
sparse_index_base_t *indexing, /* indexing: C-style or Fortran-style */
MKL_INT *rows,
MKL_INT *cols,
MKL_INT **rows_start,
MKL_INT **rows_end,
MKL_INT **col_indx,
double **values );
/* cf. https://software.intel.com/en-us/mkl-developer-reference-c-mkl-sparse-ex... */

↧

bad interaction betwenn mkl_dynamic and potri in mkl 19.02???

February 17, 2019, 2:07 pm

Latest and popular articles on Intel Technologies

≫ Next: Keep some values of vector.

≪ Previous: Typos in the mkl_spblas.h?

the program below implements the inversion of an autoregressive matrix.

Program Test
  use blas95
  use lapack95
  USE IFPORT
  use mkl_service
  implicit none
  integer(kind=8) :: istat, n, c1, c2, ise
  integer(kind=4) :: dy
  character(len=200) :: msg
  Real(kind=8), allocatable :: A(:,:)
  real(kind=8) :: r1=0.0D0, r2=0.0D0
  outer:block
    dy=1
    write(*,*) "dynamic: ", dy
    call mkl_set_dynamic(dy)
    call mkl_set_num_threads(mkl_get_max_threads())
    n=10000
    write(*,"(*(g0"",""))") n
    r1=dclock()
    !!start building the matrix
    allocate(&
      &A(n,n),&
      &stat=istat,errmsg=msg)
    if(istat/=0) Then
      write(*,*) msg;exit outer
    end if
    !$OMP PARALLEL DO PRIVATE(c1)
    Do c1=1,size(A,2)
      Do c2=c1,size(A,1)
        A(c2,c1)=0.5**(c2-c1)
      end Do
    end Do
    !$OMP END PARALLEL DO
    ise=size(A,1)
    !$OMP PARALLEL DO PRIVATE(c1) FIRSTPRIVATE(ise)
    Do c1=1,ise-1
      A(c1,(c1+1):ise)=A((c1+1):ise,c1)
    End Do
    !$OMP END PARALLEL DO
    r2=Dclock()
    write(*,*) "alloc: ", r2-r1
    !!end building matrix
    r2=Dclock()
    call potrf(A=A,UPLO="U",INFO=istat)
    r1=dclock()
    write(*,*) "potrf: ",r1-r2
    call POTRI(A=A,Info=istat)
    r2=dclock()
    write(*,*) "potri: ",r2-r1
  End block outer
End Program Test

For setting mkl_dynamic to 0 or 1, I noticed hardly any difference in processing time when using mkl 17.08.

mkl_dynamic=0:

potrf: 0.88 seconds, potri: 2.12 seconds

mkl_dynamic=1

potrf: 0.58 seconds, potri: 2.09 seconds

However, with mkl 19.02 the differences are such that mkl_dynamic=0 makes the program unusable.

mkl_dynamic=0:

potrf: 0.37 seconds, potri: 110.69 seconds

mkl_dynamic=1

potrf: 0.37 seconds, potri: 1.11 seconds

Times were obtained on Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz.

Environment variables were:

MKL_NUM_THREADS=36

KMP_AFFINITY=granularity=core,scatter

I noticed that potri in mkl 19.02 use a lot of time all 72 threads (including hyperthreading)

Is this a newly introduced bug or am I doing anything wrong.

Thanks

↧

Keep some values of vector.

February 19, 2019, 11:10 am

Latest and popular articles on Intel Technologies

≫ Next: pardiso out of memory (-2) in phase 33

≪ Previous: bad interaction betwenn mkl_dynamic and potri in mkl 19.02???

Hello, I want to keep some elements of a vector without using a for-loop. For example,

x = [ 10 20 30 40 .... 100] and index = [1 5 10];

In MATLAB you use x(1,index).

Could you please tell me if there is a command with mkl?

Thank you very much.

↧

pardiso out of memory (-2) in phase 33

February 19, 2019, 11:05 pm

Latest and popular articles on Intel Technologies

≫ Next: some worry about FFT's half result

≪ Previous: Keep some values of vector.

Hi,

pardiso stops with error message -2 in phase 33 when increasing the number of rhs from 600 to 700. 600 rhs runs fine with a memory use of 146.2GB ("RES") reported by "top". The system has 256GB of RAM, so there is still +40% left. What is the issue when increasing #rhs by 100??

Thanks

↧

some worry about FFT's half result

February 19, 2019, 6:27 pm

Latest and popular articles on Intel Technologies

≫ Next: Question about matrix inverse

≪ Previous: pardiso out of memory (-2) in phase 33

Hi,Intel masters,

I am writing to ask for information about the FFT's half result . In matlab,i use fft to compute an real array ,and then use ifft ,i'll get the initial array.However , in MKL, i just coule get a N/2 + 1 result. Could I had some configs to got the full result? If not ,maybe i could to Conj the first half of the result to fill the second result and reverse that ,because of the fft oufput in matlab , that is mirroring .

I would be grateful to receive a prompt reply and I am sorry for my broken English.

thanks,xD.

↧

Question about matrix inverse

February 20, 2019, 12:20 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_sparse_d_mv returns different values than mkl_dcsrsymv

≪ Previous: some worry about FFT's half result

Recetnly, i was trying to replace the function for matrix inverse by that of MKL. However, after making it, i found that the time used for computing was not improved and be slower. I don't know the real reason.It could be appreciated that anyone can help me.

The original subroutine was syminvg.

SUBROUTINE syminvg(n,q,norm,nsing,parflg,sbrName,bigajj_limit,utest)

! -------------------------------------------------------------------------
! Purpose: Inversion symm. matrix with pivot search on the diagonal.
! Algorithm acm 150, H. Rutishauser 1963
! Change: upper triangle of matrix in vektor q(n*(n+1)/2)
!
! Author: H. Rutishauser, C. Urschl
!
! Created: 24-Jan-2003
!
! Changes: 09-Aug-2010 RD: Generic use, only one version for all purposes
! 27-Oct-2010 SL: use m_bern with ONLY
! 23-May-2011 RD: Special handling for diagonal elements with NaN
! 30-Jul-2012 RD: Evaluate the inversion (only for GRP_AIUB)
! 22-Sep-2012 RD: Change variable name "unit" to "utest"
!
! Copyright: Astronomical Institute
! University of Bern
! Switzerland
! -------------------------------------------------------------------------

I used two subroutines of intel mkl:

CALL DPPTRF('U',n,q0,INFO) CALL DPPTRI('U',n,q0,INFO)

↧

mkl_sparse_d_mv returns different values than mkl_dcsrsymv

February 20, 2019, 3:39 am

Latest and popular articles on Intel Technologies

≫ Next: Slight Discrepancies in AXPY results vs explicit loops

≪ Previous: Question about matrix inverse

Hello,

we are updating the code in my organisation to get rid of the deprecated SPBLAS functions. Our software makes some structure calculations with sparse matrices codified as CSR (symmetric, upper, one-based indexes, non-unit diagonal) and force/displacement vectors. Until now we were using mkl_dscrsymv to multiply matrix by vector (we were obtaining exactly the same values codifying the vector as a one-row matrix and using mkl_dcsrmm because we experimented some segmentation faults with an older MKL version; btw, we are working with MS Visual Studio C++ on an Intel Core i7).

The fact is, all our operations with the Inspector-Executor functions (building the matrix from a COO, adding, even the results from PARDISO) are ok, but the multiplication with mkl_sparse_d_mv return a few significativelly different values than the deprecated functions, and that makes our structures fail. For instance, with a rank=1792 and a vector with 496 non-zero values, we find 144 different values in the multiplication, 36 of them being significally different (at least 1% in absolute value).

This is the code for the operation V_out = M x V_in :

    matrix_descr matrix_descr;
   matrix_descr.type = sparse_matrix_type_t::SPARSE_MATRIX_TYPE_SYMMETRIC;
   matrix_descr.mode = sparse_fill_mode_t::SPARSE_FILL_MODE_UPPER;
   matrix_descr.diag = sparse_diag_type_t::SPARSE_DIAG_NON_UNIT;

sparse_status_t res = mkl_sparse_d_mv (sparse_operation_t::SPARSE_OPERATION_NON_TRANSPOSE, 1.0, M, matrix_descr, V_in, 0.0, V_out);

The return value is always sparse_status_t::SPARSE_STATUS_SUCCESS, but the values returned into V_out do not match the ones expected.

Is this a problem you were aware of, or have I found some bug?

Thanks,

David B.

↧

Slight Discrepancies in AXPY results vs explicit loops

February 20, 2019, 12:32 pm

Latest and popular articles on Intel Technologies

≫ Next: Error building application with mkl

≪ Previous: mkl_sparse_d_mv returns different values than mkl_dcsrsymv

Hi,

In some of our codes, we have noticed some slight discrepancies in the results when using MKL AXPY calls and our own explicit loops to calculate Y=Y+ALPHA*X. When we are performing iterative matrix solutions, the final solutions using AXPY can be significantly worse than the solutions using explicit loops instead of AXPY.

I've attached a test case under 64-bit MS Windows using Intel MKL 2019 update 2. For some vectors and scale factors, the results are identical down to machine precision. For other vector and scale factor combinations, you can barely maintain the specified floating point precision. In the attached screen shot, the left column of numbers shows the RMS error between the AXPY result and the explicit loop result for two cases. In the first case, the two results are different and in the second case no difference is discernible between the two results. Since AXPY loops have no dependencies between vector entries, it does not seem like it could be a threading issue. While we are aware that floating point operations are always tricky, we don't understand why independent operations of y(i) = y(i) + alpha*x(i) are returning such different results.

We've tried the various suggestions from the documentation about getting repeatable floating point solutions to no avail. Any advice would be much appreciated. Is this just to be expected or is there some other MKL issue going on?

Thanks,

John

Attachment	Size
Download xaxpy.zip	828.48 KB
Download screen_output.png	71.95 KB

↧

Error building application with mkl

February 22, 2019, 12:15 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Studio XE 2019 On linux

≪ Previous: Slight Discrepancies in AXPY results vs explicit loops

I am having issues with build an application. When running make, it runs successfully until linking the executable. There I encounter the following error: "ld: library not found for -liomp5". I tried building the examples in mkl library and there were no problems. The MKLROOT has been set to the intel's default install location. Any idea how could I solve this issue?

I am using MacOS 10.13.6.

↧

Intel® Parallel Studio XE 2019 On linux

February 25, 2019, 12:23 pm

Latest and popular articles on Intel Technologies

≫ Next: Trunst Region optimization dtrnlsp_solve aborts intermittently returning 1502

≪ Previous: Error building application with mkl

Hi,

I've been using parallel studio xe 2019 for evaluation on linux. I had a problem when compiling a very simple test fortran code using

"ifort source.f90 -mkl -qopenmp" it was working on the development machine but when used on a different machine where the parallel studio is not installed i was getting the error: "error while loading shared libraries". What i did is i copied the file libiomp5.so to usr/lib in the virgin machine so that the simple code with print number of threads works. However, when i just introduced to the code the pardiso solver from mkl, the code works on the development machine but on the virgin machine i'm getting: "kmp_aligned_malloc version VERSION not defined in file libiomp5.so with link time reference".

Any help is appreciated.

Thanks !

↧

Trunst Region optimization dtrnlsp_solve aborts intermittently returning 1502

February 22, 2019, 2:03 pm

Latest and popular articles on Intel Technologies

≫ Next: Batch normalization implementation

≪ Previous: Intel® Parallel Studio XE 2019 On linux

I'm calling dtrnlsp_solve from C++. Depending on changes to the surrounding code, compile options (debug vs optimized) etc, Sometimes it works, sometimes it fails on the first call, returning code 1502. What does the code mean? Any suggestions on how to fix it? I can't find anything wrong in my code.

CENTRAL_MKL_LIB = /depot/intel/math_kernel_library_2018.1.163/mkl

MATH_LIBS_linux64 = -Wl,--start-group $(CENTRAL_MKL_LIB)/intel64/libmkl_intel_lp64.a $(CENTRAL_MKL_LIB)/intel64/libmkl_sequential.a $(CENTRAL_MKL_LIB)/intel64/libmkl_core.
a -Wl,--end-group -lpthread -lm

For a while it was working if compiled optimized but aborting if compiled debug. Then I included the call to dtrnlsp_check (see below) and then it would work compiled debug but not optimized. Then I change some code an another file and now it always fails???

double *xx = (double*) malloc (sizeof (double)*nn);
double *fjac = (double*) malloc (sizeof (double)*nn*sz2);
double *fvec= (double*) malloc (sizeof (double)*sz2);
for(int i=0; i<nn; i++) xx[i]=0.0;
for(int i=0; i<sz2*nn; i++) fjac[i]=0.0;
for(int i=0; i<sz2; i++) fvec[i]=0.0;
double eps[6]={.00001, 0.00001, 0.00001, 0.00001, 0.00001, 0.00001};
MKL_INT iter1=100;
MKL_INT iter2=10;
double rs=0.0;
int res = dtrnlsp_init(&handle, &nn, &sz2, xx, eps, &iter1, &iter2, &rs);
if(res != TR_SUCCESS) {
printf("ERROR SCE dtrnlsp_init failed\n");
exit(1);
}
#if 0
MKL_INT info[6];
res = dtrnlsp_check(&handle, &nn, &sz2, fjac, fvec, eps, info);
if(res != TR_SUCCESS) {
printf("ERROR SCE dtrnlsp_check failed code=%d\n", res);
exit(1);
}
#endif
MKL_INT RCI_Request;
int success=0;
while(success == 0){
res = dtrnlsp_solve(&handle, fvec, fjac, &RCI_Request);
if(res != TR_SUCCESS) {
printf("ERROR SCE dtrnlsp_solve failed code=%d\n", res);
exit(1);
}

......

Thanks

Greg

↧

Batch normalization implementation

February 26, 2019, 11:06 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_sparse_d_create_csc different definitions

≪ Previous: Trunst Region optimization dtrnlsp_solve aborts intermittently returning 1502

Hello,

I need a simple example of batch normalization function using Intel MKL.

I tried to write my own code but I had an error during the execution and I couldn't locate the problem. I posted a question here

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...

and still didn't get a solution for my problem. Could you please help me with this ?

Thank you.

↧

mkl_sparse_d_create_csc different definitions

February 27, 2019, 1:54 am

Latest and popular articles on Intel Technologies

≫ Next: MKL Feast (inner memory problem: info=-2)

≪ Previous: Batch normalization implementation

Hi,

I'm having a problem with CSC type sparse matrix generation. The function mkl_sparse_d_create_csc is defined in the intel MKL documentation in the following manner:

sparse_status_t mkl_sparse_d_create_csc (sparse_matrix_t       *A,
                                                                sparse_index_base_t indexing,
                                                                MKL_INT rows,
                                                                MKL_INT cols,
                                                                MKL_INT *cols_start,
                                                                MKL_INT *cols_end,
                                                                MKL_INT *row_indx,
                                                                double *values);

However in the header mkl_spblas.h it is defined differently:

sparse_status_t mkl_sparse_d_create_csc(sparse_matrix_t        *A,
                                               sparse_index_base_t    indexing, /* indexing: C-style or Fortran-style */
                                                               MKL_INT    rows,
                                                               MKL_INT    cols,
                                               MKL_INT    *rows_start,
                                                               MKL_INT    *rows_end,
                                                               MKL_INT    *col_indx,
                                               double        *values );

The definition in the documentation seems to look like a true definition of a CSC type matrix however the definition in the header file looks like a CSR type sparse matrix definition. When I try generating a CSC matrix I get a memory access violation. However I manually checked the input vectors and they are all correctly specified as in the documentation. So I wonder, since the function looks to be defined in the header like a CSR format sparse matrix, should I also give it in this structure?

I'm using MKL version 2019.0.2 for windows.

Thanks for the help,

Adriaan

↧

MKL Feast (inner memory problem: info=-2)

February 27, 2019, 7:08 am

Latest and popular articles on Intel Technologies

≫ Next: Linking MKL in the makefile

≪ Previous: mkl_sparse_d_create_csc different definitions

Hi,

i'm trying to solve (sparse) symmetric generalized eigenvalue problems using dfeast_scrgv().

It always worked fine with relatively small problems (up to 2000*2000 sparse matrix), but it turned out i can't solve the bigger ones (about 60000*60000 sparse matrix) as it always returns 'info = -2", that I know it refers to inner memory problems, but I don't know how to fix it.

The problem is solvable using other algorithms (not included in MKL).

I'm using the latest Intel MKL 2019 library and Intel C++ Compiler 19.0 on Visual Studio 2013 Platform Toolset.

May I ask you for some help?

Thanks in advance,

Daniele

↧

Linking MKL in the makefile

February 27, 2019, 7:27 am

Latest and popular articles on Intel Technologies

≫ Next: cblas_ddot access violation

≪ Previous: MKL Feast (inner memory problem: info=-2)

Hi:

I went to the intel-mkl-link-line-advisor , and got the following:

Use this link line:

${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_cdft_core.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_openmpi_lp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

Compiler options:

-I${MKLROOT}/include

Please tell me how to include them in the makefile.

Thanks,

↧

cblas_ddot access violation

March 2, 2019, 3:27 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL_SPARSE_D_TRSM ............ no result when called from fortran

≪ Previous: Linking MKL in the makefile

Hi,

I was just trying to learn how to use the 64 bit platform on Parallel Studio XE, when running a very simple example (using cblas_ddot() - as shown below) I've got the following exception:

Exception thrown at 0x00007FF7AA60EB36 in mkl64bit.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

Then I tried the Intel Inspector tool and it says there was an invalid memory access: 0x1eb36 vmovsd xmm4, qword ptr [r9+rax*8]

The code is:

int main() {
   double v1[3], v2[3];
   for (size_t i = 0; i != 3; ++i)
       v1[i] = v2[i] = 1.5;

double result;
result = cblas_ddot(3, v1, 1, v2, 1);

return 0;}

I'm using Intel Parallel Studio XE Cluster Edition 2019 for Windows on Visual Studio 2017.

May you please help me to understand what's wrong and how could I fix the problem?

Thanks in advance,

Daniele

↧

MKL_SPARSE_D_TRSM ............ no result when called from fortran

March 2, 2019, 11:12 pm

Latest and popular articles on Intel Technologies

≫ Next: pardiso out of memory (-2) in phase 33

≪ Previous: cblas_ddot access violation

the program below implements a "csr" class which holds a sparse csr matrix copied from the mkl manual. This matrix is used for Inspector-Executor routines MKL_SPARSE_D_CREATE_CSR, MKL_SPARSE_COPY and MKL_SPARSE_D_TRSM. All routines finish without error. However, MKL_SPARSE_D_TRSM does not seem to solve for anything. Am I doing something wrong here??

include "mkl_spblas.f90"
Module Data_Kind
  Implicit None
  Integer, Parameter :: Large=Selected_Int_Kind(12)
  Integer, Parameter :: Medium=Selected_Int_Kind(8)
  Integer(Medium), Parameter :: Double=Selected_Real_Kind(15,100)
End Module Data_Kind
Module Mod_csr
  use data_kind
  implicit None
  Type :: csr
    integer(large), allocatable :: nrows, ncols
    integer(large), allocatable :: rowpos(:), colpos(:),&
      & pointerB(:), pointerE(:)
    real(double), allocatable :: values(:)
  contains
    Procedure :: iii => Subiii
  end type csr
contains
  Subroutine Subiii(this)
    Implicit None
    Class(csr), intent(inout) :: this
    !!2017 mkl manual page 3233
    allocate(this%nrows,this%ncols,source=5)
    allocate(this%rowpos(6),source=(/1,4,6,9,12,14/))
    allocate(this%colpos(13),source=(/1,2,4,1,2,3,4,5,1,3,4,2,5/))
    allocate(this%pointerB(5),source=(/1,4,6,9,12/))
    allocate(this%pointerE(5),source=(/4,6,9,12,14/))
    allocate(this%values(13),&
      &source=(/1.0D0,-1.0D0,-3.0D0,-2.0D0,5.0D0,&
      &4.0D0,6.0D0,4.0D0,-4.0D0,2.0D0,7.0D0,8.0D0,-5.0D0/))
  end Subroutine Subiii
End Module Mod_Csr
Program Test
  use data_kind
  USE IFPORT
  use Mod_csr, only: csr
  use MKL_SPBLAS
  USE, INTRINSIC :: ISO_C_BINDING
  Implicit none
  Type(csr) :: tscsr
  integer(c_int) :: isstat=0
  Type(Sparse_Matrix_T) :: handle, handle1
  Type(MATRIX_DESCR) :: descr
  real(double), allocatable :: in(:,:), out(:,:)
  outer:block
    descr%TYPE=SPARSE_MATRIX_TYPE_GENERAL
    descr%DIAG=SPARSE_DIAG_NON_UNIT
    call tscsr%iii()
    isstat=MKL_SPARSE_D_CREATE_CSR(&
      &handle,&
      &SPARSE_INDEX_BASE_ONE,&
      &tscsr%nrows,&
      &tscsr%ncols,&
      &tscsr%pointerB,&
      &tscsr%pointerE,&
      &tscsr%colpos,&
      &tscsr%values&
      &)
    if(isstat/=0) Then
      write(*,*) "error 1 ",isstat;exit outer
    End if
    isstat=mkl_sparse_copy(handle,descr,handle1)
    if(isstat/=0) Then
      write(*,*) "error 2 ",isstat;exit outer
    End if
    allocate(in(tscsr%nrows,2),out(tscsr%nrows,2))
    in=1.0;out=0.0
    isstat=MKL_SPARSE_D_TRSM(&
      &SPARSE_OPERATION_NON_TRANSPOSE,&
      &1.0_double,&
      &handle1,&
      &descr,&
      &SPARSE_LAYOUT_COLUMN_MAJOR,&
      &in,&
      &2,&
      &size(in,1),&
      &out,&
      &size(out,1)&
      &)
    if(isstat/=0) Then
      write(*,*) "error 3 ",isstat;exit outer
    end if
    write(*,*) maxval(in), maxval(out), minval(out)
  end block outer
End Program Test

Matrix "out" is supposed to contain the sum over the rows of the inverse of the sparse matrix. But "out" contains only zeros. Also isstat from MKL_SPARSE_D_TRSM does not indicate any error.

The compiling and linking was:

ifort --version
ifort (IFORT) 19.0.2.187 20190117

ifort -i8 -warn nounused -warn declarations -O3 -static -align array64byte -mkl=parallel -qopenmp -parallel -c -o OMP_MKLPARA_ifort_4.20.12-arch1-1-ARCH/Test.o Test.f90 -I /opt/intel/compilers_and_libraries_2019.2.187/linux/mkl/include/

ifort -i8 -warn nounused -warn declarations -O3 -static -align array64byte -mkl=parallel -qopenmp -parallel -o Test_OMP_MKLPARA_4.20.12-arch1-1-ARCH OMP_MKLPARA_ifort_4.20.12-arch1-1-ARCH/Test.o /opt/intel/compilers_and_libraries_2019.2.187/linux/mkl/lib/intel64/libmkl_blas95_ilp64.a /opt/intel/compilers_and_libraries_2019.2.187/linux/mkl/lib/intel64/libmkl_lapack95_ilp64.a -Wl,--start-group /opt/intel/compilers_and_libraries_2019.2.187/linux/mkl/lib/intel64/libmkl_intel_ilp64.a /opt/intel/compilers_and_libraries_2019.2.187/linux/mkl/lib/intel64/libmkl_core.a /opt/intel/compilers_and_libraries_2019.2.187/linux/mkl/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -lpthread -lm -ldl

Any suggestion?

↧