Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

Using multiple DFTI DESCRIPTOR (FFT in MKL)

$
0
0

Is it possible to create and commit several different DFTI descriptor and re-use them later (the FFT of different sizes will be called many times, and creating the descriptor and free it for each call seems not efficient). In other words, can the descriptor be created/committed and then saved in some arrays?

 


Scalapack raise error under certain circumstance

$
0
0

Dear All,

      I am using IntelMPI + ifort + MKL to compile Quantum-Espresso 6.1. Everthing works fine except invoking scalapack routines. Calls to PDPOTRF may exit with non-zero error code under certain circumstance. In an example, with 2 nodes * 8 processors per node the program works but with 4 nodes * 4 processors per node the program fails. If I_MPI_DEBUG is used,  for the failed case there are following messages just before the call exit with code 970, while for the working case there is no such messages:

[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676900, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675640, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x26742b8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676b58, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x26769c8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676c20, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675fa0, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676068, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676a90, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676e78, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2678778, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675898, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675a28, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675bb8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674f38, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676ce8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676130, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674768, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674448, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674b50, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675e10, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675708, operation 2, size 2300, lkey 1879682311

         Could you provide any suggestion about what is the possible cause here? Thanks very much.

Feng

SVD hangs periodically

$
0
0

We are noticing that SVD, both dgesvd and dgesdd, will hang periodically. The call stack terminates with a call to either one of those functions. Killing the process and rerunning alleviates the problem.

We suspect there's some form of locking going on in SVD somewhere. Something we've tried is the following:

  • start a new thread to compute SVD
  • wait until finish or timeout
  • if timeout, try again
  • if timeout, throw

What was interesting about this is that the first thread will block (just like before we put the threading in, but the second thread wouldn't even start. Now, there are a limited number of resources and actions that could stop a new thread. One of them is that every thread must call into DllMain of every assembly in the process. If one of the assemblies is doing something non-standard (holding a lock), then you are dead.

We suspect there's a problem somewhere with Intel, and possibly, another library in our stack. We generally do not mess with DllMain and thread registration.

How to add MKL with TBB threading to application that uses TBB under Visual Studio?

$
0
0

I have installed IntelSWTools 2017.4:
c:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows

My application uses TBB:
Release(/MD): tbb\lib\intel64\vc12\tbb.lib       -> redist\intel64\tbb\vc12\tbb.dll
Debug(/MDd):  tbb\lib\intel64\vc12\tbb_debug.lib -> redist\intel64\tbb\vc12\tbb_debug.dll

How to add MKL with TBB threading (so that there would be only one TBB instance) to application that uses TBB?

The problem is that mkl_tbb_thread_dll.dll (as I understand) linked with:
tbb\lib\intel64\vc_mt\tbb.lib -> redist\intel64\tbb\vc_mt\tbb.dll
which is different from what my application uses.

Is redist\intel64\tbb\vc_mt\tbb.dll for mkl_tbb_thread_dll.dll can be changed to redist\intel64\tbb\vc12\tbb.dll ?
If yes how to be with Debug build (somehow rename redist\intel64\tbb\vc12\tbb_debug.dll to tbb.dll)?

 

Are DNN functions thread safe?

$
0
0

It seems that dnnExecute_F32 can not be called in multi-thread, does it?

How to store A to get fastest performance of AT*x using cblas_dgemv?

$
0
0

Hello,

I am using cblas_dgemv to obtain AT*x.  The size of the matrix A is about 10000 Rows x 20000 columns.  I am storing A in row major format.  Ai,j+1 is stored next to Aij

My questions are as follows (in order to get fastest execution time):

  1. What is better way to store A -- row major format or column major format? does it matter?
  2. Is it better to store A and set TransA=CblasTrans or store AT directly and use it with TransA=CblasNoTrans.
  3. If answer to #2 is to use AT directly, is it better to store AT in rowmajor format or column major format?

Another related question I have has to do with byte alignment.  Let us say we are storing in A in row major format.  A has m rows and n columns.  I have read that, when doing multithreading using openmp, to avoid false sharing it is better if each row of A starts at a byte aligned boundary.  A common way of doing that is by padding the number of columns such that it is divisible by 8 (64 bytes for 8 doubles).  So LDA = n + (8 - n%8).  Does doing this help dgemv run faster?

Finally, For my calculation I need alpha=1 and beta=0.  Does cblas_dgemv optimize for this trivial case or does it do the extra and in this unneccessary calculations?

Thanks in advance for any help.

Warnings from libiomp5.a when linking on Macintosh with Xcode 8

$
0
0

We are using MKL and linking with libiomp5.a. Since starting to use Xcode 8.3.3 to build, we have been getting a large number of warnings like this:

:-1: warning: pointer not aligned at address 0x11132A942 (anon + 112 from ...libs/libiomp5.a(iomp.o))

Does anyone know how to silence this?

Thanks, John Weeks

Regarding cluster_sparse_solver

$
0
0

I am Mehdi and this is my first time using this forum.

I need to used cluster_sparse_solver in my FORTRAN Finite Element program. Because the degree of freedom of my system is very high (1^6), the number of nonzero members in the stiffness matrix (A in Ax=B) will be also very high in a way that I can not store the number of non-zero in an integer  number with type 4 and I must use integer(8). Therefore, the parameter ia (row indexing of sparse matrix) must be integer(8). 

In this situation, how I should compile my program. I have tried to use 4 bit and 8 bit libraries, during compiling of my program and none of them are working. Shall I use all of the integers in my program with type integer(8)? When ia in integer(8) and ja is integer(4), is it possible to compile the program?

Please help me. I can provide any more information you may need.

Bests

Mehdi


Pardiso and pardiso_64

$
0
0

Hello,

I have a FORTRAN program for solving flow field (Stokes flow) problem using FEM. Firstly, I used PARDISO solver to solve the coupled problem. I was totally happy with PARDISO, because it is faster than MATLAB (I used to work with MATLAB and I am new in ifort). Bellow, I have summarized the command related to PARDISO:

!----------Matrix solution PARDISO--------------------------------------------

!          [M ]*{X} = {RHS}

INTEGER                            :: PT(64), MTYPE, IPARM(64)

INTEGER                            :: MAXFCT,MNUM,PHASE,N , NRHS , MSGLVL , ERROR

INTEGER,ALLOCATABLE                :: PERM(:)

REAL(8),ALLOCATABLE                  :: X(:)

INTEGER,ALLOCATABLE                :: ja(:) ,  ia(:)

REAL(8),ALLOCATABLE,DIMENSION(:)   :: M

! ABOVE FOR INTRODUCING THE PARAMETERS

MTYPE = 11       

CALL PARDISOINIT (PT, MTYPE, IPARM)

ALLOCATE( PERM(dof) , STAT = ISTAT )  ! dof : degree of freedom

ALLOCATE( X(dof)    , STAT = ISTAT )

MAXFCT = 1

MNUM   = 1

PHASE  = 13

N      = dof          

PERM   = 1 

NRHS   = 1

MSGLVL = 0      

CALL CPU_TIME(start)

CALL PARDISO (PT, MAXFCT, MNUM, MTYPE, PHASE, N, M_SPARSE, ia,ja, PERM, NRHS, IPARM, MSGLVL, RHS, X, ERROR)

CALL CPU_TIME(finish)               

write(*,*) (finish-start)*1000 , 'msec'   , ERROR

This code works without any problem and perfectly.

My problem is for the situation that I want to use PARDISO_64. According to the document, all of the input and output integers, should be INTEGER(8), therefore:

!----------Matrix solution PARDISO--------------------------------------------

!  MX = RHS

INTEGER(8)                         :: PT(64), MTYPE, IPARM(64)

INTEGER(8)                         :: MAXFCT,MNUM,PHASE,N , NRHS , MSGLVL , ERROR

INTEGER(8),ALLOCATABLE             :: PERM(:)

REAL(8),ALLOCATABLE                    :: X(:)

INTEGER(8),ALLOCATABLE                :: ja(:) ,  ia(:)

REAL(8),ALLOCATABLE,DIMENSION(:)   :: M

! ABOVE FOR INTRODUCING THE PARAMETERS

MTYPE = 11       

CALL PARDISOINIT (PT, MTYPE, IPARM)

ALLOCATE( PERM(dof) , STAT = ISTAT )  ! dof : degree of freedom

ALLOCATE( X(dof)    , STAT = ISTAT )

MAXFCT = 1

MNUM   = 1

PHASE  = 13

N      = dof          

PERM   = 1 

NRHS   = 1

MSGLVL = 0      

CALL CPU_TIME(start)

CALL PARDISO_64 (PT, MAXFCT, MNUM, MTYPE, PHASE, N, M_SPARSE, ia,ja, PERM, NRHS, IPARM, MSGLVL, RHS, X, ERROR)

CALL CPU_TIME(finish)               

write(*,*) (finish-start)*1000 , 'msec'   , ERROR

but, this  code does not working and it gives the following error:

forrtl: server (157): program Exception – access violation

I use the following to compile my code:

ifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90 main00.f90 -o t1 -Qmkl -heap-arrays

 

USEFULLS.f90, CONSTANTS.f90, PRE_PROCESSOR_3D.f90, DATATYPES.f90, VEL_SUBS.f90, and SPARSE_SUB.f90 are modules that are developed by me and main00.f90 is the main program. I think (I am not sure) that I must use some other keys in my compile command line.

I have similar problem for cluster_sparse_solver, which works well, but cluster_sparse_solver_64 does not work!!!!!!

Best regards

Mehdi

 

cluster_sparse_solve library and path setting in UBUNTU

$
0
0

p { margin-bottom: 0.1in; line-height: 120%; }

Hello,

I have a question regarding compiling a program containing cluster_sparse_solve.

I have developed a program for 3D flow field calculation using finite element method, using Intel parallel studio in my laptop, which is working with WINDOWS OS. I have compiled the program using the following lines:

 

step1 :

set path=C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\redist\intel64\mkl; C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\redist\intel64\compiler;C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\redist\intel64\tbb\vc_mt;%path%

 

step 2 :

set lib=C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\lib\intel64;%lib%

 

for the mentioned lined, I have used mkl_link_tool mpiifort C:\FORTRAN\Programmes\MPI\main01.f90

 

Then, after setting the path and libraries, I have used the followings:

 

step 3:

mpiifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90 -I"C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\include""parallel01.f90" mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib mkl_blacs_intelmpi_lp64.lib impi.lib libiomp5md.lib -o Pstatic -heap-arrays

 

It works perfectly, without any problem. Also, I shall add that I have also compiled the program using dynamic libraries, as well. (I have used the the online link advisor for the recently mentioned line (step 3)- https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/)

 

But, I want to use the in a better computer which has Intel parallel Studio just under a LINUX OS. I have used again the online link advisor and then I have used the following to compile my code:

 

mpiifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90 -I. /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/include "parallel01.f90" -Wl,--start-group . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_intel_lp64.a . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_intel_thread.a . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_core.a . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o Pstatic -heap-arrays

 

But I do not know:

a – How to determine the path and libraries in linux system (I mean what is mentioned in step 1 and step 2, but for linux)

b – How to find the suitable lines and libraries in linux system

c- Is there anything like mkl_link_tool for linux, and if yes where is it?

 

At the moment, it gives the following errors:

 

parallel01.f90(17): error #7002: Error in opening the compiled module file. Check INCLUDE paths. [MKL_CLUSTER_SPARSE_SOLVER]

USE MKL_CLUSTER_SPARSE_SOLVER

------------^

parallel01.f90(61): error #6457: This derived type name has not been declared. [MKL_CLUSTER_SPARSE_SOLVER_HANDLE]

TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: CPT(64)

-------------^

parallel01.f90(268): error #6404: This name does not have a type, and must have an explicit type. [CPT]

CPT(:)%dummy = 0

--------^

parallel01.f90(268): error #6514: Substring or array slice notation requires CHARACTER type or array. [CPT]

CPT(:)%dummy = 0

--------^

parallel01.f90(268): error #6460: This is not a field name that is defined in the encompassing structure. [DUMMY]

CPT(:)%dummy = 0

---------------^

parallel01.f90(268): error #6158: The structure-name is invalid or is missing. [CPT]

CPT(:)%dummy = 0

--------^

compilation aborted for parallel01.f90 (code 1)

 

Best regards

Mehdi

Pardiso iparm(30) not returning equation number correctly

$
0
0

I am using Pardiso with the 2017 Update 2 Intel Fortran compiler in VS 2015 and I'm finding that when using mtype=2 (real and symmetric positive definite matrix) if my matrix has a singularity, iparm(30) always returns 1 rather than the location of the equation where the singularity occurs. In the 2016 version of the compiler this worked correctly. Has something changed or is this a bug?

​Refer to the Pardiso documentation as follows:
If Intel MKL PARDISO detects zero or negative pivot for mtype=2 or mtype=4 matrix types, the factorization is stopped, Intel MKL PARDISO returns immediately with an error = -4, and iparm(30) reports the number of the equation where the first zero or negative pivot is detected.
 

MKL Memory Allocator

$
0
0

Hello,

I am looking for information regarding the Memory Allocator embedded in MKL.

We are using intensively MKL_malloc/MKL_free in a project and are planning to add a memory manager on top of it. Our goal is to reuse aligned memory without freeing it and to have per thread memory pools and to have the ability to do fine tuning on those memory pools. (We are indeed challenged by the memory consumption)

The mkl_disable_fast_mm page refers to a per thread memory pool but with no more details. Does anyone have more information? (lock-free malloc with per thread heap? Monitoring the available memory in the memory pool, etc.)

We are also considering a deactivation of mkl memory manager and rely on intel TBB malloc implementation (either by redefining memory function with i_malloc, or with std::vector> style implementation). Does anyone have feedback of such implementation?

Thank you
Arnaud

cluster_sparse_solver library and path setting in LINUX

$
0
0

p { margin-bottom: 0.1in; line-height: 120%; }

Hello,

I have a question regarding compiling a program containing cluster_sparse_solve.

I have developed a program for 3D flow field calculation using finite element method, using Intel parallel studio in my laptop, which is working with WINDOWS OS. I have compiled the program using the following lines:

 

step1 :

set path=C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\redist\intel64\mkl; C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\redist\intel64\compiler;C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\redist\intel64\tbb\vc_mt;%path%

 

step 2 :

set lib=C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\lib\intel64;%lib%

 

for the mentioned lined, I have used mkl_link_tool mpiifort C:\FORTRAN\Programmes\MPI\main01.f90

 

Then, after setting the path and libraries, I have used the followings:

 

step 3:

mpiifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90 -I"C:\MyIntel\IntelSWTools\compilers_and_libraries_2017.4.210\windows\mkl\include""parallel01.f90" mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib mkl_blacs_intelmpi_lp64.lib impi.lib libiomp5md.lib -o Pstatic -heap-arrays

 

It works perfectly, without any problem. Also, I shall add that I have also compiled the program using dynamic libraries, as well. (I have used the the online link advisor for the recently mentioned line (step 3)- https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/)

 

But, I want to use the in a better computer which has Intel parallel Studio just under a LINUX OS. I have used again the online link advisor and then I have used the following to compile my code:

 

mpiifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90 -I. /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/include "parallel01.f90" -Wl,--start-group . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_intel_lp64.a . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_intel_thread.a . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_core.a . /opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o Pstatic -heap-arrays

 

But I do not know:

a – How to determine the path and libraries in linux system (I mean what is mentioned in step 1 and step 2, but for linux)

b – How to find the suitable lines and libraries in linux system

c - I have found mkl_link_tool in linux instalation directory, but it does not work! Why?

 

At the moment, it gives the following errors:

 

parallel01.f90(17): error #7002: Error in opening the compiled module file. Check INCLUDE paths. [MKL_CLUSTER_SPARSE_SOLVER]

USE MKL_CLUSTER_SPARSE_SOLVER

------------^

parallel01.f90(61): error #6457: This derived type name has not been declared. [MKL_CLUSTER_SPARSE_SOLVER_HANDLE]

TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: CPT(64)

-------------^

parallel01.f90(268): error #6404: This name does not have a type, and must have an explicit type. [CPT]

CPT(:)%dummy = 0

--------^

parallel01.f90(268): error #6514: Substring or array slice notation requires CHARACTER type or array. [CPT]

CPT(:)%dummy = 0

--------^

parallel01.f90(268): error #6460: This is not a field name that is defined in the encompassing structure. [DUMMY]

CPT(:)%dummy = 0

---------------^

parallel01.f90(268): error #6158: The structure-name is invalid or is missing. [CPT]

CPT(:)%dummy = 0

--------^

compilation aborted for parallel01.f90 (code 1)

 

Best regards

Mehdi

MKL_SINGLE_PATH_ENABLE

$
0
0

Hello,

We have legacy code that call :

mkl_enable_instructions(MKL_SINGLE_PATH_ENABLE);

MKL_SINGLE_PATH_ENABLE is not in the documentation (anymore ?).

In mkl_service.h, defines are :

#define  MKL_ENABLE_SSE4_2          0
#define  MKL_ENABLE_AVX             1
#define  MKL_ENABLE_AVX2            2
#define  MKL_ENABLE_AVX512_MIC      3
#define  MKL_ENABLE_AVX512          4
#define  MKL_ENABLE_AVX512_MIC_E1   5
#define  MKL_SINGLE_PATH_ENABLE     0x0600

Does anyone know if MKL_SINGLE_PATH_ENABLE is still used in MKL 2017 (still present in mkl_service.h) ? Can we consider that it is the most restrictive mode ?

Thank you.

Arnaud

CANNOT use cluster_sparse_solver in linux!

$
0
0

p { margin-bottom: 0.1in; line-height: 120%; }

Hi everyone,

 

I have developed a code which uses cluster_sparse_solver. I can compile it (static) in my PC, which has WINDOWS10 OS.

But, I can not compile it in linux!!

 

I use the following commands for compilation:

 

$MKLPATH=$MKLROOT/lib/intel64

$MKLINCLUDE=$MKLROOT/include

 

and then

 

$mpiifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90 -L$MKLPATH -I$MKLINCLUDE/ -I$MKLINCLUDE/intel64/lp64 parallel00.f90 -Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a $MKLROOT/lib/intel64/libmkl_intel_thread.a $MKLROOT/lib/intel64/libmkl_core.a $MKLROOT/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o t1

 

But, I see the following error message:

 

parallel00.f90(17): error #7002: Error in opening the compiled module file. Check INCLUDE paths. [MKL_CLUSTER_SPARSE_SOLVER]

USE MKL_CLUSTER_SPARSE_SOLVER

------------^

parallel00.f90(61): error #6457: This derived type name has not been declared. [MKL_CLUSTER_SPARSE_SOLVER_HANDLE]

TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: PT(64)

-------------^

parallel00.f90(268): error #6404: This name does not have a type, and must have an explicit type. [PT]

PT(:)%dummy = 0

--------^

parallel00.f90(268): error #6514: Substring or array slice notation requires CHARACTER type or array. [PT]

PT(:)%dummy = 0

--------^

parallel00.f90(268): error #6460: This is not a field name that is defined in the encompassing structure. [DUMMY]

PT(:)%dummy = 0

---------------^

parallel00.f90(268): error #6158: The structure-name is invalid or is missing. [PT]

PT(:)%dummy = 0

--------^

compilation aborted for parallel00.f90 (code 1)

 

Does anyone have any suggestion to solve this problem? Shall I compile MKL_CLUSTER_SPARSE_SOLVER.f90 in the include directory or it is not necessary? (I should say that even this compilation is not possible in my linux system, but I can do it in WINDOWS OS)

 

Best regards

Mehdi


Access Violation Error while using dgesvd for c on Visual Studio

$
0
0

Hello, while running the following code on Visual Studio 2015:

 

#include <thread>
#include <mkl.h>
#include <random>
#include <ctime>

const int MatrixLayout = LAPACK_COL_MAJOR;

int GetIndex(int i, int j, int m, int n);
void Initialize2DArray( double** matrix, int rows, int columns );
void PopulateRandMatrix( double **& matrix, int m, int n );
/*Wraps inputs intended for svdcmp to those accepted by the MKL library.*/
int wrapperForSVD( double **u, int m, int n, double *w, double **v );
#define M 500
#define N 400

int main() {
   int m = M;
   int n = N;
   double** A = new double*[m];
   double* w = new double[(m>n)?n:m];
   double** V = new double*[n];

   Initialize2DArray( A, m, n );
   Initialize2DArray( V, n, n );

   PopulateRandMatrix( A, m, n );

   int getReturn = wrapperForSVD( A, m, n, w, V );

}

/*Wraps inputs intended for svdcmp to those accepted by the MKL library.*/
int wrapperForSVD( double **u, int m, int n, double *w, double **v )
{
   mkl_verbose( 1 );
   const unsigned int NumberOfThreads = std::thread::hardware_concurrency() > 0 ? (int)std::thread::hardware_concurrency() : 4;
   //Return only variable
   const int ReturnError = 1;
   const char JobUVt = 'A';
   lapack_int lda = (MatrixLayout == LAPACK_COL_MAJOR) ? m : n;
   lapack_int ldu = m;
   lapack_int ldvt = n;

   //save old thread value and set threads locally incase of future omp threadding of application
   int oldThreadNumber = mkl_set_num_threads_local( NumberOfThreads );


   double* aOneDArray = (double*)malloc( m*n * sizeof( double ) );//new double[m*n];
   if (!aOneDArray)
      return ReturnError;

   //convert 2d matrix to 1d array
   for (int i = 0; i < m; i++)
      for (int j = 0; j < n; j++)
         aOneDArray[GetIndex( i, j, m, n )] = u[i][j];

   double * uOneDArray = (double*)malloc( ldu*m * sizeof( double ) );//new double[ldu*m];
   double * vOneDArray = (double*)malloc( ldvt*n * sizeof( double ) );//new double[ldvt*n];
   double * superb = (double*)malloc( sizeof( double )*(m > n) ? n-2 : m-2 );//new double[(m>n)?n:m];
   if (!uOneDArray || !vOneDArray || !superb)
      return ReturnError;

   int testFailConvergence = LAPACKE_dgesvd( MatrixLayout, JobUVt, JobUVt, m, n, aOneDArray, lda, w, uOneDArray, ldu, vOneDArray, ldvt, superb );


   //if matrix converged
   if (testFailConvergence == 0) {
      //convert 1d arrays to 2d arrays
      for (int i = 0; i < m; i++)
         for (int j = 0; j < n; j++)
            u[i][j] = uOneDArray[GetIndex( i, j, m, n )];
      int smallerOfMN = (m < n) ? m : n;
      for (int i = 0; i < smallerOfMN; i++)
         for (int j = 0; j < smallerOfMN; j++)
            v[j][i] = vOneDArray[GetIndex( i, j, smallerOfMN, smallerOfMN )];
   }
   else
      testFailConvergence = ReturnError;

   free( aOneDArray ); //delete[] oneDArray;
   free( uOneDArray ); //delete[] uOneDArray;
   free( vOneDArray ); //delete[] vOneDArray;
   free( superb ); //delete[] superb;

   //reset thread count
   mkl_set_num_threads_local( oldThreadNumber );

   return testFailConvergence;

}


//Maps the correct index from a 2d array to a 1d array
//NOTE: Fucntion is dependent upon MatrixLayout and will return the correct
//layout for both row major and column major re-mapping
int GetIndex(int i, int j, int m, int n)
{
   if (MatrixLayout == LAPACK_ROW_MAJOR)
      return (i*n) + j;
   else
      return (j*m) + i;
}

void Initialize2DArray(double ** matrix, int rows, int columns)
{
   for (int i = 0; i < rows; i++)
      matrix[i] = new double[columns];
   return;
}

void PopulateRandMatrix(double **& matrix, int m, int n)
{

   double** old = nullptr;
   std::srand(std::time(NULL));
   if (matrix != nullptr)
      old = matrix;
   matrix = new double*[m];

   for (int i = 0; i < m; i++) {
      if (old != nullptr)
         delete[] old[i];
      matrix[i] = new double[n];
      for (int j = 0; j < n; j++) {
         double randOut = std::rand();
         matrix[i][j] = randOut;
      }
   }
}

I get the following error:

Exception thrown at 0x013A1C0B in DebugSVD.exe: 0xC0000005: Access violation writing location 0x00AC1000.

While the error message informs us that this is an Access violation, it is virtually useless in helping determine where the access violation is stemming from. Visual Studio is only able to show me disassembly, so debugging in that way is impossible as well.

The program outputs the following (because mkl_verbose is set to one) before throwing the error:

MKL_VERBOSE Intel(R) MKL 2017.0 Update 3 Product build 20170413 for 32-bit Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.30GHz intel_thread
MKL_VERBOSE DGESVD(A,A,500,400,00B60040,500,0055E008,00CF0040,500,00EE0040,400,003FF524,-1,0) 32.05ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
MKL_VERBOSE DGESVD(A,A,500,400,00B60040,500,0055E008,00CF0040,500,00EE0040,400,01231080,46000,0) 3.98s CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8

 

My settings are as follows: MKL is enabled in the Visual Studio project and set to Parallel. The only other changes are that I copied libiomp5md.dll into the project folder.

I appreciate any help debugging my problem! Thank you so much.

Pardiso generates wrong result for a small test matrix

$
0
0

Hi, 

I've written a test program to get familiar with the pardiso. However I find that comparing with the result from Eigen the pardiso spits out totally incorrect answer. I have totally no idea what went wrong and please help. Code is attached.  The compile command I used is also listed as: 

icpc -I ~/Lib/Eigen -std=c++11 -mkl=parallel -qopenmp -O3 -xCORE-AVX2 PTEST.cpp -o PardisoTest

 

Best,

Izzy 

AttachmentSize
Downloadtext/x-c++srcPTEST.cpp2.8 KB

Is DFTI_NUMBER_OF_TRANSFORMS data-parallel?

$
0
0

If I set DFTI_NUMBER_OF_TRANSFORMS to 4 on a AVX computer, or 8 on a AVX-512 KNL, will MKL's DftiComputeForward/Backward compute the FFT's of similar but independant, non-overlapping arrays simultaneously in SIMD or sequentially one after the other?

Thanks

cluster_sparse_solver and cluster_sparse_solver_64

$
0
0

Hello all,

I have developed a code for 3D fluid flow using FEM coupeled method. Once I have used cluster_sparse_solver. For compiling this file I have used the following terms, using intel link advisor:

mpiifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90  parallel00.f90 -I"%MKLROOT%"\include  -heap-arrays mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib mkl_blacs_lp64_dll.lib impi.lib libiomp5md.lib -o t1

When I increase the mesh number, the nonzero components of the A matrix (Ax = RHS) will exceed 500000000. So, based on the advide of the online document of the intel, I use cluster_sparse_solver_64. This time all of the integer input parameters are integer(8). in order to compile I do the following (again based on the link advisor):

mpiifort USEFULLS.f90 CONSTANTS.f90 PRE_PROCESSOR_3D.f90 DATATYPES.f90 VEL_SUBS.f90 SPARSE_SUB.f90  parallel01.f90  /4I8 -I"%MKLROOT%"\include  mkl_intel_ilp64.lib mkl_intel_thread.lib mkl_core.lib mkl_blacs_intelmpi_ilp64.lib impi.lib libiomp5md.lib  -o t1

But, I see the following error:

parallel01.f90(281): error #6285: There is no matching specific subroutine for this generic subroutine call.   [CLUSTER_SPARSE_SOLVER_64]

Can someone please help me. I had similar problem with pardiso and pardiso_64. But, in that case I add -i8 to the compiling term and the problem was solved. But, for cluster_sparse_solver, it seems to be more complicated.

Best regards

Mehdi

 

 

 

softmax

$
0
0

Does MKL support the popular DNN operation called softmax?

I cannot find any suitable function.

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>