Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

opnemp dynamically switch on/off in mkl

$
0
0

Hello! I am trying to mix mkl and TBB in my project, therefore I am trying to make MKL dynamically switch on/off as I wish. I use the mkl_set_num_threads() to set the number of threads, when I want the multi-threading mode off, I set it to one before I call the real working function, else if I want to use the multi-threading mode I set mkl_set_num_threads before the working function is called. The code (take BLAS2 as an example) is like this:

   // set omp to run
   if (withOMP) {
      omp_init(); // set it to maximum number of threads we can find with mkl_set_num_threads()
   }else{
      omp_turnoff(); // set the number of threads to 1
   }   

#ifdef WITH_SINGLE_PRECISION
   sgemv(&symA, &row_A, &col_A, &alpha, A, &ld_A, x, &inc_x, &beta, y, &inc_y);
#else
   dgemv(&symA, &row_A, &col_A, &alpha, A, &ld_A, x, &inc_x, &beta, y, &inc_y);
#endif

 however, in my BLAS and LAPACK testing I find if the openmp is turned off, then it can never been switched on again. The running with MKL will always be in serial mode no matter the calls of omp_init(). the multi-threading mode could be observed only if omp_turnoff() is not called.

I am useing MKL 11.1 version of multi-threading library(intel64) together with g++(version 4.7.2, also 64 bit). My library linking is 

-L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64 -lmkl_core  -lmkl_intel_thread -liomp5 -ldl -lpthread -lm

Thanks very much!

fenglai 


-mkl=parallel appears to link with both ilp64 and lp64 MKL libraries

$
0
0

As the number of data elements in the application data gets bigger we're moving to use the ilp64 MKL routines.

I used the Intel Link Line Advisor to specify the required link libraries. Based on this advice, if I build and link an application thus:

icc -DMKL_ILP64 -w testlapack.cpp -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5  -lpthread -lm -o testlapackp

and use ldd -v to examine what libraries are linked with the executable I find as expected:

        linux-vdso.so.1 =>  (0x00007fffb8bff000)
        libmkl_intel_ilp64.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_ilp64.so (0x00007f111c837000)
        libmkl_intel_thread.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_thread.so (0x00007f111b878000)
        libmkl_core.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_core.so (0x00007f111a1ba000)
        libiomp5.so => /opt/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libiomp5.so (0x00007f1119e9f000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1119c76000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f11199f2000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f11196ec000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f11194d5000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1119142000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1118f3e000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f111cf53000)

ie the ilp64 library (highlighted)  is properly linked. With > 2**31 array elements the application builds, runs and, parallelises as expected.

If I try to simplify the link line a bit using:

icc -DMKL_ILP64 -w testlapack.cpp -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -mkl=parallel -lm -o testlapackp

and examine the executable with ldd -v I get:

        linux-vdso.so.1 =>  (0x00007fff1edff000)
        libmkl_intel_ilp64.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_ilp64.so (0x00007ff92cda4000)
        libmkl_intel_lp64.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007ff92c65f000)

        libmkl_intel_thread.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_thread.so (0x00007ff92b6a1000)
        libmkl_core.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_core.so (0x00007ff929fe3000)
        libiomp5.so => /opt/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libiomp5.so (0x00007ff929cc7000)
        libm.so.6 => /lib64/libm.so.6 (0x00007ff929a38000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007ff929732000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff92951b000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff9292fe000)
        libc.so.6 => /lib64/libc.so.6 (0x00007ff928f6b000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007ff928d66000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff92d4c0000)

ie both the ilp64 and lp64 libraries.

The application built with this linkline executes, parallelises etc OK with an array containing 2.5G elements (50k x 50k) but my question is why is the lp64 library being linked?

Thanks

David

 

 

 

cblas_cdotu_sub and cblas_cdoc_sub

$
0
0

Hi,

I have noticed that the cblas functions cblas_cdotu_sub and cblas_cdotc_sub are returning 0 for the scalar product instead of the correct value. In earlier versions of MKL, the bug was also present for double precision functions (cblas_zdotu_sub and cblas_ztodc_sub). In the evaluation version for Linux I downloaded these last days, the bug is present only for single precision.

Best regards.

 

SVD with via "divide and conquer" method (LAPACKE_sgesdd)

$
0
0

Hi,

I am searching for the most efficient SVD calculation approach in MKL and about to conclude this is "LAPACKE_sgesdd". Could you please help me with two questions:

1. Is "LAPACKE_sgesdd" really the fastest routine in MKL in sense of SVD decomposition or I missed something?

2. Why when I make a call

LAPACKE_sgesdd(CblasRowMajor, 'A', dim, dim, X, dim, e, U, dim, V, dim);

everything works fine (U, V, and e arrays are filled as imposed by MKL manual), but when I make a call with jobz = 'O'

LAPACKE_sgesdd(CblasRowMajor, 'O', dim, dim, X, dim, e, U, dim, V, dim);

error is output "MKL ERROR: Parameter 10 was incorrect on entry to SGESDD."?

 

Thanks,

Victor.

AttachmentSize
Downloadmkl_test_0.cpp1.73 KB

DSBGVX documentation: ifail dimension is n?

$
0
0

dsbgvx kept crashing on me with an access violation for large enough problems (n=3636). I am compiling for an x64 target.

Meticulously studying the manual, http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/index.htm#GUID-659297EE-D35B-428C-ABBE-1A7EBE2B0F6E, I noticed that the dimension of ifail is m for dsbgvx, and n for sbgvx.

Allocating n integers for ifail, dsbgvx stops crashing.

 

Another peculiarity is that dsbgvx does not crash on me while compiled for an Win32 target.

Questions:

I am selecting only m out of n eigenvalues. The information I aim for should fit in w(1:m) and z(1:n,1:m), as discussed in the Fortran 95 interface of e.g. ?stein. So, is w(n) and z(n,n) really required in dsbgvx?

Now that ifail(n) works and ifail(m) fails, should the manual read that the dimension of ifail must be n? And if so, why do w(m), z(n,m) and ifail(m) work for Win32 targets?

 

MKL_PARDISO in os x

$
0
0

Hello, I am trying to solve a sparse linear system in fortran 90 using the mkl pardiso solver as in the following code: 



        do i = 1, 64
            iparm(i) = 0
        end do

        error = 0 ! initialize error flag
        msglvl = 1 ! print statistical information
        mtype = 13 ! complex unsymmetric
        !C.. Initiliaze the internal solver memory pointer. This is only
        !C necessary for the FIRST call of the PARDISO solver.
        do i = 1, 64
           pt( i )%DUMMY =  0 
        end do
        
        maxfct = 1
        mnum = 1
        nrhs = 1
        !C.. Reordering and Symbolic Factorization, This step also allocates
        !C all memory that is necessary for the factorization
        phase = 11 ! only reordering and symbolic factorization
        print*, ' calling sparse solver'

        CALL PARDISO (pt, maxfct, mnum, mtype, phase, nPardiso, values, rowIndex, columns, &
             perm, nrhs, iparm, msglvl, b, Ex, error)
        WRITE(*,*) 'Reordering completed ... '

My code compiles (intel compiler ifort (IFORT) 12.1.3 20120130) without problem but once reached the solver call the program stops and I have to kill it. 

The output on my screen is, before the program behave strangely, just 

calling sparse solver

I've also tried with different parameters in iparm, as per the examples folder, but without different results. 

I also print out the matrix, to be sure is in the correct form. 

double complex values:(-1.000000,0.0000000E+00) (1.000000,0.0000000E+00) (0.5000000,0.0000000E+00)
 (-1.500000,0.0000000E+00) (1.000000,0.0000000E+00) (-0.2222222,0.0000000E+00)
 (-0.1111111,0.0000000E+00) (0.3333333,0.0000000E+00) (1.000000,0.0000000E+00)

integer rowIndex:  1           3           6           9          10

integer columns: 1           2           1           2           3           2            3           4           4

b is imaginary and the system has a solution. 

Any suggestion? 

Error linking with Lapack

$
0
0

Hi

I am trying to run an example that solves a set of linear equations using SGESV, but during linking I get the following error messages:

 

1>mkl_intel_c.lib(_sgesv.obj) : error LNK2019: unresolved external symbol _mkl_serv_set_progress referenced in function _sgesv
1>mkl_intel_c.lib(_sgesv.obj) : error LNK2019: unresolved external symbol _mkl_serv_setxer referenced in function _sgesv
1>mkl_intel_c.lib(_sgesv.obj) : error LNK2019: unresolved external symbol _mkl_lapack_sgesv referenced in function _sgesv
1>mkl_intel_c.lib(_misc_mkl_progress_iface_u.obj) : error LNK2019: unresolved external symbol _mkl_serv_default_progress referenced in function _MKL_PROGRESS
1>mkl_intel_c.lib(_misc_mkl_xerbla_iface_u.obj) : error LNK2019: unresolved external symbol _mkl_serv_default_xerbla referenced in function _XERBLA

 

What am I doing wrong?

PARDISO: Large number of subsequent identical calls result in very different runtimes

$
0
0

Hi,

I am working with Pardiso in C++ on sparse symmetric positive definite matrices. Unexpectedly slow performance led me to do the following experiment:

I define a matrix and run phase 11 (symbolic factorization) once.

Then I repeatedly run phase 22 followed by phase 33, using default iparm parameters, let's say 1000 times, on the exact same matrix.

For most iterations the speed is as expected, but for a (usually small) number of iterations, phase 22 takes much longer to complete. For example, on a run of 1000 iterations, the minimum duration for phase 22 might be 0.002 seconds while the longest might be 10.5 seconds.

In my experience, the more iterations are performed, the larger the speed discrepancies get - for 10000 iterations, the longest phase 22 took almost 30 seconds. It doesn't seem to be a memory leak, and it doesn't follow an obvious pattern - the iterations don't get slower and slower. The slow iterations are spread out seemingly randomly amongst the fast ones.

By default I perform my computations using OpenMP with 4 threads, but I have tried to disable multithreading without achieving any better results. Since I work on a type of problem where I need to solve large numbers of subsequent problems with identical sparsity structure but changed numerical values, this problem is currently preventing me from working efficiently.

Please let me know if I can provide additional data or if there's anything you'd like me to try to track down the problem.

Best,

Mathias

 

 

 


VML accuracy environment variable

$
0
0

Would it be a stupid idea to have an environment variable  MKL_VML_MODE that - if set - would call vmlSetMode()  ?

MKL_VML_MODE - Comma separated strings of parameters as defined in mkl_vml_defines.h. The corresponding values as defined in mkl_vml_defines.h are or-ed together and used as a parameter to vmlSetMode(). It is the user's responsibility to state a set of names that are not mutually excluding. (E.g. MKL_VML_MODE=VML_FLOAT_CONSISTENT,VML_DOUBLE_CONSISTENT  is incorrect since these two values are not allowed to be set simultaneously). Any word in the list not recognized by the VML library will result in a warning message being printed and the word ignored for the resulting value sent to vmlSetMode().

And, yes I could write it myself, but maybe it could be useful to others too?

/Nils

Distribution with VS2012+MKL

$
0
0

I am trying to distribute my VS2012/MKL code to a separate computer. When on that computer I get a vcomp.dll not found. I understand that by using vcomp.dll, the application is using the MS OpenMP instead of libiomp.

I am trying to get around this dependency on MS OpenMP and searching through forums I made the following changes:

- Added the <intel directory>\compiler\lib\intel64 directory to "VC++ Directories"->Reference Directories and Library Directories

- Added libiomp5md.lib to Linker->Additional Dependencies

- Added vcomp.lib to Linker->Ignore Specific Default Libraries

- Added the <intel directory>\compiler\lib\intel64 to my Linker->Additional Library Directories for good measure.

Each change still loads vcomp110.dll. Is additional steps I am missing to force VS to use libiomp instead of vcomp?

I cannot provide reproduce-able code to attach.

Thanks

 

MKL FFTW interface performance

$
0
0

   Hi,

   I am trying to reproduce the results of the MKL FFTW interface in this report:

http://download-software.intel.com/sites/default/files/article/165868/in...

   Does anybody know where I can get the source code used in that report?

   So far, I have been running the source code in this webpage:

http://numbercrunch.de/blog/2010/03/parallel-fft-performance/comment-pag...

but the MKL FFTW interface shows really poor performance. I wonder if there is something I am missing.

   Thanks.

PARDISO linking in a c++ project

$
0
0

I have been using MKL PARDISO in C language projects for many years.  I works nicely.  Recently, I tried to include PARDISO in C++ project.  But it does not work.  The linker gives a message that PARDISO and omp_get_max_threads are undefined symbols."  I included the following libraries: MKL_C, MKL_IA32, MKL_lapack, MKL_solver and libguide40.  What am I missing?

Is MKL plan to implement the parallel sparse blas?

$
0
0

I know the MKL can deal with the sparse blas. It is very useful for the finite element coding. However, for the parallel computing, it needs parallel sparse blas. So, Is MKL plans to implement the parallel sparse blas? It will be very useful for the parallel finite element simulation.

Furthermore, I found in the forum that the MKL will implement the MPI based pardiso in version 11.2. So, when it will be released?

Cannot run SVD on Armadillo linked with MKL

$
0
0

Hi, I am trying to run an SVD on a 19016x19016 matrix on my Mac OSX Mavericks with Armadillo linked to Intel MKL. But I get the following error: ./example SVD Start: 19016 19016 0.000000 ** On entry to DGESDD, parameter number 12 had an illegal value error: svd(): failed to converge Here is my code: #define ARMA_DONT_USE_WRAPPER #include mat U, V, A; vec s; svd(U,s,V,A); I make the program with: g++-4.2 -O3 -framework Accelerate example.cpp mmio.cpp -o example

mkl_serv_set_xerbla_interface

$
0
0

Hi,

I just bumped into this error message while attempting to run a newly-built code: "Entry Point Not Found - The procedure entry point mkl_serv_set_xerbla_interface" could not be located in the dynamic link library mkl_intel_thread.dll".

Any suggestions to fix this would be appreciated! The code is built with XE 2013 SP1 and the MKL libraries that come with it (Fortran/Libraries/Use Intel Math Kernel Library="Parallel (/Qmkl:parallel)". The runtime library is "Multithread DLL". It also USEs the LAPACK_95 and BLAS_95 modules (Linker/Input/Additional Dependencies = "mkl_blas95_lp64.lib mkl_lapack95_lp64.lib"), not sure if this is relevant. The platform is a Windows 7 64-bit workstation.

Thanks in advance for your help,

Olivier

 


Problem with MKL and MEX

$
0
0

I am trying to write a mex programm but it gives errors. When I use the VS2012 the code runs, but when I converted it to mex it didn't work. The code gives errors when it tries to use the mkl library (function dcopy, dgemm etc).

I have installed the Composer XE 2013 SP1 and I use this command to compile the code

mex -largeArrayDims ads.c -I"C:\Program Files (x86)\Intel\Composer XE 2013 SP1\mkl\include" -L"C:\Program Files (x86)\Intel\Composer XE 2013 SP1\mkl\lib\intel64" -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blas95_lp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_blas95_lp64

 

For example I tried to use the function dcopy and the code and erros are:

#include <math.h>               
#include <stdio.h>             
#include <stdlib.h>           
#include <mkl.h>
#include "mkl_vml.h"
#include "mex.h"
#include "matrix.h"
#include "mkl_vsl.h"


void mexFunction(int nlhs,  mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    double a[4]={1,2,3,4};
    double b[4];
    cblas_dcopy(4, a, 1, b, 1);

    return;
}

Errors:

>> mex -largeArrayDims ads.c -I"C:\Program Files (x86)\Intel\Composer XE 2013 SP1\mkl\include" -L"C:\Program Files (x86)\Intel\Composer XE 2013 SP1\mkl\lib\intel64" -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blas95_lp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_blas95_lp64

   Creating library C:\Users\F9E9~1\AppData\Local\Temp\mex_VlIT6Y\templib.x and object C:\Users\F9E9~1\AppData\Local\Temp\mex_VlIT6Y\templib.exp

mkl_intel_thread.lib(mkl_threading.obj) : error LNK2019: unresolved external symbol omp_in_parallel referenced in function mkl_serv_get_max_threads

mkl_intel_thread.lib(mkl_threading.obj) : error LNK2019: unresolved external symbol omp_get_max_threads referenced in function mkl_serv_get_max_threads

mkl_intel_thread.lib(mkl_threading.obj) : error LNK2019: unresolved external symbol omp_get_num_procs referenced in function mkl_serv_get_N_Cores

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol __kmpc_global_thread_num referenced in function mkl_blas_dcopy_omp

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol __kmpc_ok_to_fork referenced in function mkl_blas_dcopy_omp

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol __kmpc_push_num_threads referenced in function mkl_blas_dcopy_omp

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol __kmpc_fork_call referenced in function mkl_blas_dcopy_omp

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol __kmpc_serialized_parallel referenced in function mkl_blas_dcopy_omp

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol __kmpc_end_serialized_parallel referenced in function mkl_blas_dcopy_omp

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol omp_get_thread_num referenced in function mkl_blas_dcopy_omp

mkl_intel_thread.lib(dcopy_omp.obj) : error LNK2019: unresolved external symbol omp_get_num_threads referenced in function mkl_blas_dcopy_omp

ads.mexw64 : fatal error LNK1120: 11 unresolved externals

 

  C:\USERS\F9E9~1\DOCUME~1\MATLAB~1\BIN\MEX.PL: Error: Link of 'ads.mexw64' failed.

 

 

I would like to ask if you could help me to fix the errors.

Thank you very much.

Scalapack PXLAWRITE and PXLAREAD

$
0
0

Hi,

Does MKL Scalapack include the PXLAWRITE and PXLAREAD (X=Z,C,D,S) subroutines?  When I try to use them in my code I get 'undefined reference' link errors.  I do not see them in the MKL documentation, but I've come across a number of scalapack functions not included in the documentation that still exist in the library.

Thanks,

John

feast threading

$
0
0

hi guys,

 

only getting 50% throughput CPU-wise when using FEAST's eigensolver. i am aware that hyperthreading is merely emulation, but isn't it possible to get the solver to still exploit all available cores instead of just the "true" number?

 

i tried OMP_threads and KMP_threads to no avail.

hoping you guys can fix this in a future revision?

thx

 

fortran+fftw=nan (sometimes)

$
0
0

 

Hello,

I run a fortran multithreaded code that uses ffts. I am now trying to switch to

fttw available thru mkl. When I insert the code below the  results more

often than not are nans or infinity even though only the plan call  is executed

and no actuall ffts are carried out using fftw. The standard suspect in such cases

would be out-of-bounds operations.  I cannot at the moment see any out-of-bounds

problems when creating the plan. But here there is still  the problem of the interface.

 

I wonder if i have set up things correcly. The compile and link calls are

ifort  -O3 -r8  -openmp -fpp -parallel -mcmodel=medium -i-dynamic -shared-intel -mkl

This is run on a six core Linux machine.

Has anyone any suggestions as to what may be going on?

Thank you.

--

module FFTW3
    use, intrinsic :: iso_c_binding
    include 'fftw3.f03'
    type(C_PTR), save :: plan_r2c
    real*8, allocatable, SAVE :: WWW1( :, : ), WWW2( :, : )
end module



....

Use FFTW3
Integer :: Irank, ndim(1), InEmbeded(1), OutEmbeded(1)

...

!   Note       NX32 > N1_32 + 4

   allocate( WWW1( NX32*M2, M3  ),  STAT = IERR )
   allocate( WWW2( NX32*M2, M3  ),  STAT = IERR )

  Irank = 1
  ndim(1) = N1_32
  InEmbeded(1)  = NX32
  OutEmbeded(1) = NX32 / 2


 call dfftw_plan_many_dft_r2c( plan_r2c, Irank, ndim(1), M2*M3, WKX_1, InEmbeded(1), 1,    &
                               NX32, WKX_2, OutEmbeded(1), 1, NX32/2, FFTW_MEASURE )

 

 

MKL from C

$
0
0

I am having trouble understanding C interface for MKL. In particular the const modifier. I need to use tridiagonal solver ?dtsvb which should have C interface:

void ddtsvb (const MKL_INT * n, const MKL_INT * nrhs, double * dl, double * d, const double * du, double * b, const MKL_INT * ldb, MKL_INT * info );

However, I have no idea what const MKL_INT * n stands for. Can someone provide a clear example of how the function ddtsvb is called?

Thank you.

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>