Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

lapack95 errors

$
0
0

I am trying to use an eigenvalue routine through the lapack95 interface of the mkl. But I end up with this:

$ ifort Eigen.f90 -mkl

/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include/lapack.f90(28): error #6218: This statement is positioned incorrectly and/or has syntax errors.
MODULE F95_PRECISION
^
/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include/lapack.f90(31): error #6790: This is an invalid statement; an END [PROGRAM]  statement is required.
END MODULE F95_PRECISION
^
/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include/lapack.f90(31): error #6785: This name does not match the unit name.   [F95_PRECISION]
END MODULE F95_PRECISION
-----------^
Eigen.f90(86): error #6785: This name does not match the unit name.   [EIGEN]
END PROGRAM Eigen
------------^
Eigen.f90(23): warning #5427: Program may contain only one main entry routine
IMPLICIT NONE
^
compilation aborted for Eigen.f90 (code 1)

Also, in the program, I have:

PROGRAM Eigen

INCLUDE 'mkl.fi'
INCLUDE 'lapack.f90'

IMPLICIT NONE
.
.
.

Any idea what the trouble might be?
 


vdExp 32 bit vs 64 bit different results

$
0
0

Hi,

I'm new to MKL so please bare with me. We're moving a project from 32 to 64 bit and we've encountered some inconsistencies between 32 & 64 bit vdExp.

I have the following code

double value = 0.027771918023383080;

double result;

vdExp( 1, &value, &result );

In 32 bit mode, the value in result is 1.0281611526482297 whilst when I run it in 64 bit mode I get 1.0281611526482299.

Any ideas why this is? I've double checked and I'm definitely compiling with the correct libraries. I used the Intel MKL link advisor.

Regards

Mark.

 

LAPACKE_dgeev result differences on different CPU architectures

$
0
0

Hi, I am using MKL 11.1 to find eigen vectors with and am having issues with the results changing sign depending on which machine I run this. This gives completely different results in the end for an ellipse fitting algorithm and, thus, failures in automated tests.

Given the following matrix in row major order:

-8747596.5053710938      -316030.00427246094       1615084.5202636719
-5749756.8850097656      -667037.37084960938       632059.98022460938
 52358865.204467773       2874878.4188232422      -8747596.5129394531

The following call is made:

auto squareSize = 3;
auto info = LAPACKE_dgeev( LAPACK_ROW_MAJOR, 'N', 'V', squareSize,
  inputArrayOutput.Data(), squareSize,
  realEigenValues.Data(), imaginaryEigenValues.Data(),
  nullptr, squareSize, rightEigenVectors.Data(), squareSize );

Running this on machines that give the "correct" or expected result will give.

inputAndOutput
-18162659.389763702       -4492507.6042798907      -5290349.3859531470
 0.00000000000000000       214.50030177488844       915701.07938791020
 0.00000000000000000      -0.0070577840503023292    214.50030177488844
real eigenvalues
-18162659.389763702        214.50030177488844       214.50030177488844
imag eigenvalues
0.00000000000000000        80.391669176281070      -80.391669176281070
right eigenvectors
 0.17135077952698607       0.15755967091010539     -6.3641462760426620e-06
 0.091750438224901587     -0.67430368000587571      0.00013608158502213653
-0.98092852310503897       0.72144956765842805      0.00000000000000000

But on a "failing" machine this will give something like:

inputAndOutput
-18162659.389763702        4492507.6042798860      -5290349.3859531470
 0.00000000000000000       214.50030177486741      -915701.07938790950
 0.00000000000000000       0.0070577836735623567    214.50030177486741
real eigenvalues
-18162659.389763702        214.50030177486741       214.50030177486741
imag eigenvalues
 0.00000000000000000       80.391667030653210      -80.391667030653210
right eigenvectors
 0.17135077952698605      -0.15755967091010534      6.3641461061856458e-06
 0.091750438224901559      0.67430368000587682     -0.00013608158139016290
-0.98092852310503875      -0.72144956765842760      0.00000000000000000

As can be seen the sign of the last two right eigen vectors changes. The question then is why? And how can one correct this sign change so the result will always have the same sign? It seems to be correlated to the output in "inputAndOutput" but how?

Or is this in fact a bug and would a later MKL version fix this?

 

Poor (non-threaded) performance of /Qmkl:cluster compared to /Qmkl:parallel

$
0
0

The code below behaves differently when built with the /Qmkl:parallel and /Qmkl:cluster. In both cases the code is built for Win 7 64-bit, using the latest Intel compiler and libraries. It is launched as a mpi process with mpiexec.exe -n 2 (that is, using only two ranks) on a dual 6-core workstation.

When /Qmkl:parallel is used, the call to the MKL functions on rank 0 do take advantage of the 6 OpenMP threads there.
When /Qmkl:cluster is used, only one thread on rank 0 is being used and therefore it is six times slower.

Any idea on how to have threaded behavior of /Qmkl:cluster?

Also, why is LWORK double in the /Qmkl:parallel case??

 

PROGRAM MAIN
USE OMP_LIB
USE MPI
IMPLICIT NONE

INTEGER(KIND=4)             :: N,ALLOC_ERROR,INFO,LWORK,SEED_SIZE,I,IERR
INTEGER(KIND=8)             :: CLOCK_START,CLOCK_STOP,CLOCK_RATE,CLOCK_MAX
INTEGER(KIND=4),ALLOCATABLE :: SEEDS(:)
LOGICAL                     :: MPI_IS_INITIALIZED
REAL(KIND=8)                :: W(1)
REAL(KIND=8),ALLOCATABLE    :: A(:,:),TAU(:),WORK(:)

CALL MPI_INITIALIZED(MPI_IS_INITIALIZED,IERR)
IF (.NOT.MPI_IS_INITIALIZED) THEN
    CALL MPI_INIT(IERR)
END IF

WRITE(*,*) 'I am image ',THIS_IMAGE(),' and I can span ',OMP_GET_MAX_THREADS(),' OpenMP threads.'

IF (THIS_IMAGE()==1) THEN

    N = 3000
    WRITE(*,*) 'N     = ',N
    ALLOCATE(A(N,N),STAT=ALLOC_ERROR)
    IF (ALLOC_ERROR/=0) THEN
        ERROR STOP
    END IF
    CALL RANDOM_SEED(SIZE=SEED_SIZE)
    ALLOCATE(SEEDS(SEED_SIZE))
    SEEDS=123456
    CALL RANDOM_SEED(PUT=SEEDS)
    CALL RANDOM_NUMBER(A)

    ALLOCATE(TAU(N),STAT=ALLOC_ERROR)
    LWORK = -1
    CALL DGEQRF(N,N,A,N,TAU,W,LWORK,INFO)
    WRITE(*,*) 'LWORK = ',W(1)
    LWORK = INT(W(1))
    ALLOCATE(WORK(LWORK),STAT=ALLOC_ERROR)

    CALL SYSTEM_CLOCK(CLOCK_START,CLOCK_RATE,CLOCK_MAX)
    CALL DGEQRF(N,N,A,N,TAU,WORK,LWORK,INFO)
    CALL DORGQR(N,N,N,A,N,TAU,WORK,LWORK,INFO)
    CALL SYSTEM_CLOCK(CLOCK_STOP,CLOCK_RATE,CLOCK_MAX)
    WRITE(*,*) 'INFO  = ',INFO
    WRITE(*,*) 'TIME  = ',(REAL(CLOCK_STOP-CLOCK_START,KIND=8))/REAL(CLOCK_RATE,KIND=8)

END IF

END PROGRAM MAIN

Here is the output when using /Qmkl:cluster

I am image  2  and I can span  6  OpenMP threads.
I am image  1  and I can span  6  OpenMP threads.
N      =  3000
LWORK  =  288096
INFO   =  0
TIME   =  6.38000000000000
A(N,N) =  -2.110006751937421E-002

Here is the output when using /Qmkl:parallel

I am image  2  and I can span  6  OpenMP threads.
I am image  1  and I can span  6  OpenMP threads.
N      =  3000
LWORK  =  742977
INFO   =  0
TIME   =  0.920000000000000
A(N,N) =  -2.110006751937324E-002

Here is the build log (when using /Qmkl:cluster)

Compiling with Intel(R) Visual Fortran Compiler 17.0 [Intel(R) 64]...
ifort /nologo /O2 /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\include" /Qopenmp /standard-semantics /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:dll /threads /Qmkl:cluster /c /Qcoarray:single /Qlocation,link,"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\\bin\amd64" /Qm64 "D:\TEMP\QR_PERFORMANCE\MAIN.F90"
Linking...
Link /OUT:"x64\Release\QR_PERFORMANCE.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\lib\release_mt" /MANIFEST /MANIFESTFILE:"x64\Release\QR_PERFORMANCE.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /IMPLIB:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.lib" impi.lib -qm64 /qoffload-ldopts="-mkl=cluster""x64\Release\MAIN.obj"
Embedding manifest...
mt.exe /nologo /outputresource:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.exe;#1" /manifest "x64\Release\QR_PERFORMANCE.exe.intermediate.manifest"

QR_PERFORMANCE - 0 error(s), 0 warning(s)

 

const errors in mkl_lapack.h header

$
0
0

1) In all variants of zlacrm, zlarcm, clacrm, clarcm, the output C should not be const:

void zlacrm_( const MKL_INT* m, const MKL_INT* n, const MKL_Complex16* a,
              const MKL_INT* lda, const double* b, const MKL_INT* ldb,
              const MKL_Complex16* c, const MKL_INT* ldc, double* rwork );

The LAPACK documentation was also wrong. Bug report sent to them, too.

lapack/SRC> grep 'param.* C' *larcm.f *lacrm.f
clarcm.f:*> \param[in] C
zlarcm.f:*> \param[in] C
clacrm.f:*> \param[in] C
zlacrm.f:*> \param[in] C

should be:
lapack/SRC> grep 'param.* C' *larcm.f *lacrm.f
clarcm.f:*> \param[out] C
zlarcm.f:*> \param[out] C
clacrm.f:*> \param[out] C
zlacrm.f:*> \param[out] C

2) In all variants of [sdcz]larft, the input V should be const:

void zlarft_( const char* direct, const char* storev, const MKL_INT* n,
              const MKL_INT* k, const MKL_Complex16* v, const MKL_INT* ldv,
              const MKL_Complex16* tau, MKL_Complex16* t, const MKL_INT* ldt );

lapack/SRC> grep 'param.* V' *larft.f
clarft.f:*> \param[in] V
dlarft.f:*> \param[in] V
slarft.f:*> \param[in] V
zlarft.f:*> \param[in] V

- Mark Gates, Innovative Computing Laboratory, UTK

?syrdb arguments issue

$
0
0

There seems to be an inconsistency in the documentation of ?syrdb(https://software.intel.com/en-us/node/469030)

I've attached my program which you are free to change and is written in Fortran 90. It generates a random symmetric matrix A and tries to reduce it to the banded matrix B, of specified bandwidth. Make sure to compile it with the -mkl flag

On setting the flag jobz to 'U', I am expected to

  • A is supposed to be overwritten by the banded matrix B. It doesn't seem to be the case as I still get a full matrix.
  • The documentation says it will be overwritten by Qas well, which doesn't make sense as there is one matrix already.
  • Z is written by Q - which is correct as QTAQ gives me a tridiagonal matrix.

I would like access to QB such that QTBAQ= B. Could you look into the algorithm (and my program if necessary) to find out the issue?

 

AttachmentSize
Downloadapplication/octet-streamSymBanRed.f902.26 KB

Problem with mkl_ddnscsr

$
0
0

Hello

I’ve been having some problems using the “mkl_ddnscsr” function. I’ve followed the example that comes with the library but it’s not working properly. I can retrieve the non-zero elements in the dense matrix but the row and column vectors are returned empty (all elements are zeros). Below you can find my code; it’s a simplified version of the example in “dconverters.c”. I also have another doubt, what would be the most efficient way to use this function when working with large matrices and the number of non-zero elements is unknown; one way could be to set the a really high number for the maximum number of non-zero elements but that could lead to pre-allocating large vectors. Any help would be much appreciated.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "mkl_types.h"
#include "mkl_spblas.h"

int main (void)
{

#define M      4
#define N      4
#define LDA    4
#define NZMAX  8
#define NNZ    8
#define MBLK   2
#define NN     2
#define INFO   0
#define MN     16
#define IBASE1 1
#define IBASE2 1
#define LOCAT  2
#define IDIAG 3
#define NDIAG 4
#define INDIA 12
int		m = M, n=N, lda=LDA, nzmax=NZMAX, nnz = NNZ, mblk=MBLK, nn=NN, info=INFO ,mn=MN;
int	    ibase1 = IBASE1, ibase2 = IBASE2, locat = LOCAT, idiag = IDIAG, ndiag = NDIAG;
double	Adns[MN];
double	Acsr[NZMAX];
int		AI[M+1];
int		AJ[NZMAX];
int		i, j;
int		job[8];

job[0]=0;
job[1]=0;
job[2]=1;
job[3]=2;
job[4]=NZMAX;
job[5]=3;

for ( j=0; j<n; j++)
         for ( i=0; i<m; i++)
               Adns[i + lda*j]=0.0;

      Adns[0]=5.0;
      Adns[1]=9.0;
      Adns[4]=8.0;
      Adns[5]=2.0;
      Adns[10]=3.0;
      Adns[11]=1.0;
      Adns[14]=6.0;
      Adns[15]=4.0;

mkl_ddnscsr(job,&m,&n,Adns,&lda,Acsr,AJ,AI,&info);

return 0;
}

 

Regards

lggs

sytrd is not recognized by lapack90

$
0
0

Hi, 

I am trying to use LAPACK's "sytrd" subroutine in my code, but it is not recognized. I am trying the following simple code:

program comp
USE mkl95_lapack
USE mkl95_PRECISION
USE mkl95_BLAS
implicit none
real, dimension(2,2) :: A
integer, dimension(1) :: t
A = reshape((/-5., 2., 2., -2./),(/2,2/))
call sytrd(A, t)
end program comp

This is the error I get:

error #6285: There is no matching specific subroutine for this generic subroutine call.   [SYTRD]      

But when I use another of LAPACK's subroutines like "getrf", everything's fine:

program comp
USE mkl95_lapack
USE mkl95_PRECISION
USE mkl95_BLAS
implicit none
real, dimension(2,2) :: A
integer, dimension(2) :: t
A = reshape((/-5., 2., 2., -2./),(/2,2/))
call getrf(A, t)
end program comp

What might cause this problem?

 

 

 

 


solve system when matrix is banded, symmetric, positive definite matrix.

$
0
0

Hello.

Let Q be a banded, symmetric, positive definite matrix. (The number of rows of Q is 2e+5) . I want to do the following:

  1. Compute the Cholesky factorization, Q=LU, where U=L^{T}
  2. Solve Lw = b
  3. Solve Um=w
  4. Sample z~N(0,1)
  5. Solve Uv=z
  6. Compute x=m+v
  7. Return x

Steps 2 and 3 give the solution of Qm=b. 

I have asked this question here, but then Q was a triagonal, symmetric matrix. I used fuctions  'LAPACKE_dpbtrf, cblas_dtbsv'and solved the problem, 

   /* Colesky factorization */
    info = LAPACKE_dpbtrf(LAPACK_COL_MAJOR, 'U', dim+1, 1, Sigmab, 2 );
    if(info!= 0){mexPrintf( "C++ error: Cholesky failed");  }
  
    /* step 2*/
    cblas_dtbsv(CblasColMajor, CblasUpper, CblasTrans, CblasNonUnit, dim, 1, Sigmab, 2, y1, 1); 
  
    /* step 3*/
    cblas_dtbsv(CblasColMajor, CblasUpper, CblasNoTrans, CblasNonUnit, dim, 1, Sigmab, 2, y1, 1); 
  
    /* step 5 */
    cblas_dtbsv(CblasColMajor, CblasUpper, CblasNoTrans, CblasNonUnit, dim, 1, Sigmab 2, y2, 1);  

I found functions 'LAPACKE_dpbtrf, LAPACKE_dpbtrs'that give the solution of steps 2 and 3.

   info = LAPACKE_dpbtrf(LAPACK_COL_MAJOR, 'L', dim, p, Sigma, p+1);
   if(info!= 0){mexPrintf( "C++ error: Cholesky failed");  }
     
   info = LAPACKE_dpbtrs(LAPACK_COL_MAJOR, 'L', dim, p, NRHS, Sigma, p+1, y1, dim);
   if(info!= 0){mexPrintf( "C++ error: the execution is not successful");  }

 Firstly, I would like to ask if there is a better way to solve this and secondly  I don't know how to find the solution of step 5.

Thank you very much.

 

Newmark Method

$
0
0

is there a routine in MKL to use Newmark's method on Newtons' second equation - undamped matrices from the static structures problem?

ie.  Can I extend the structures program into the time domain?

MKL Format Prototype Package

$
0
0

Hi everybody,

How can I get or ask for MKL FPP evaluation copy ? i just need an email or link

Thanks,

CentOS 6.2

$
0
0

Hello,

What version of MKL can be installed on CentOS 6.2?

Thanks!

XM

MKL Sparse BLAS segfaults due to integer overflow

$
0
0

Hello

I am a very recent user of Intel MKL. I was trying to use Sparse BLAS library (Inspector/Executor routines), when I ran into an obscure segfault. The segfault appeared in CSR matrix transpose operations, and only when the number of rows was large enough (roughly 100M). Here is a more detailed description that I posted on stackoverflow.com: http://stackoverflow.com/questions/37395541/mkl-sparse-blas-segfault-whe... .

Eventually I was able to track it down to obvious integer overflow, while computing memory size for malloc (number of rows is getting multiplied by the number of threads, which was 32 in my case).

It took me a couple of days, so I keep wondering

  1. Did I miss some relevant part of MKL documenation? An explanation about maximum feasible number of rows in a sparse matrix seems essential. Spending time to figure it out from gdb disas somehow doesn't feel right.
  2. Regardless, it sure feels like a bug. I would expect mkl_sparse_convert_csr to return an error status instead of crashing. 

Is it a known issue? Are there any other known limits here that I should be aware of?

Problem with MKL pardiso update 3

$
0
0

I have the following installed on my computer:
Intel Parallel Studio XE 2016 Update 2
Intel Parallel Studio XE 2016 Update 3

I think I have come across a serious bug in the latest MKL Pardiso 11.3 update 3.

Attached is a simple program that reads in a test matrix and right-hand-side.  I have included all the test input files in the attached zip file.
The MKL Pardiso is called to solve the system.
When compiled with Update 2, the program runs fine.  However, when compiled with Update 3, the system is NOT solved and the right-hand side is returned in the solution vector.

 

Roman

 

AttachmentSize
Downloadapplication/zippardiso_prob.zip1.51 MB

Gaussian Random Routines

$
0
0
Some of the routines called in this sample are not in the documentation that I can find, CHECKVSLERROR as an example. Any ideas where I find details.

John
===============================================================================
! Copyright 2003-2016 Intel Corporation All Rights Reserved.
!
! The source code,  information  and material  ("Material") contained  herein is
! owned by Intel Corporation or its  suppliers or licensors,  and  title to such
! Material remains with Intel  Corporation or its  suppliers or  licensors.  The
! Material  contains  proprietary  information  of  Intel or  its suppliers  and
! licensors.  The Material is protected by  worldwide copyright  laws and treaty
! provisions.  No part  of  the  Material   may  be  used,  copied,  reproduced,
! modified, published,  uploaded, posted, transmitted,  distributed or disclosed
! in any way without Intel's prior express written permission.  No license under
! any patent,  copyright or other  intellectual property rights  in the Material
! is granted to  or  conferred  upon  you,  either   expressly,  by implication,
! inducement,  estoppel  or  otherwise.  Any  license   under such  intellectual
! property rights must be express and approved by Intel in writing.
!
! Unless otherwise agreed by Intel in writing,  you may not remove or alter this
! notice or  any  other  notice   embedded  in  Materials  by  Intel  or Intel's
! suppliers or licensors in any way.
!===============================================================================

!  Content:
!    vdRngGaussianMV  Example Program Text
!*******************************************************************************

      include 'mkl_vsl.f90'
      include "errcheck.inc"
      include "statcheck.inc"

      program MKL_VSL_TEST

      USE MKL_VSL_TYPE
      USE MKL_VSL

      integer(kind=4) i
      integer(kind=4) errcode

      integer(kind=4) nn
      integer ndim,info
      integer n

      parameter(n=1000,nn=1000,ndim=3)

      integer brng,method,seed
      integer me

      real(kind=8) c(ndim,ndim),t(ndim,ndim),a(ndim)
      real(kind=8) r(ndim,n)
      real(kind=8) dbS(ndim),dbS2(ndim),dbMean(ndim),dbVar(ndim)
      real(kind=8) dbCovXY,dbCovXZ,dbCovYZ

      real(kind=8) S(ndim),D2(ndim),Q(ndim)
      real(kind=8) DeltaM(ndim),DeltaD(ndim)

      TYPE (VSL_STREAM_STATE) :: stream

      brng=VSL_BRNG_MCG31
      seed=7777777
      method=VSL_RNG_METHOD_GAUSSIANMV_BOXMULLER2
      me=VSL_MATRIX_STORAGE_FULL

!     Variance-covariance matrix for test
!     (should be symmetric,positive-definite)

!     This is full storage for dpotrf subroutine
      c(1,1)=16.0D0
      c(1,2)=8.0D0
      c(1,3)=4.0D0

      c(2,1)=8.0D0
      c(2,2)=13.0D0
      c(2,3)=17.0D0

      c(3,1)=4.0D0
      c(3,2)=17.0D0
      c(3,3)=62.0D0

      a(1)=3.0D0
      a(2)=5.0D0
      a(3)=2.0D0

      t  = c

      print *,'Variance-covariance matrix C'
      write (*,'(3F7.3)') c
      print *,''

      print *,'Mean vector a:'
      write (*,'(3F7.3)') a
      print *,''

      print *,'VSL_MATRIX_STORAGE_FULL'
      print *,'-----------------------'
      print *,''

      call dpotrf('U',ndim,t,ndim,info)

!     Stream initialization
      errcode=vslnewstream(stream,brng,seed)
      call CheckVslError(errcode)

!     Generating random numbers
!     from multivariate normal distribution
      errcode=vdrnggaussianmv(method,stream,n,r,ndim,me,a,t)
      call CheckVslError(errcode)

!     Printing random numbers
      print 11,' Results (first ',nn,' of ',n,')'
      print *,'--------------------------'
11    format(A,I5,A,I5,A)

      do i=1,nn
        print 12,' r(',i,')=(',r(:,i),')'
      end do
12    format(A,I5,A,3F8.3,A)
      print *,''

      call dCalculateGaussianMVSampleCharacteristics(ndim, n, r,        &
     &      dbS, dbS2, dbMean, dbVar, dbCovXY, dbCovXZ, dbCovYZ)

!     Printing
      print *,'Sample characteristics:'
      print *,'-----------------------'
      print *,'      Sample             Theory'
      print 13,' Mean :(',dbMean(1),dbMean(2),dbMean(3),                &
     &         ')  (',a(1),a(2),a(3),')'
      print 13,' Var. :(',dbVar(1),dbVar(2),dbVar(3),                   &
     &         ')  (',c(1,1),c(2,2),c(3,3),')'
      print 14,' CovXY: ',dbCovXY,'          ',c(1,2)
      print 14,' CovXZ: ',dbCovXZ,'          ',c(1,3)
      print 14,' CovYZ: ',dbCovYZ,'          ',c(2,3)
13    format(A,F5.1,F5.1,F5.1,A,F5.1,F5.1,F5.1,A)
14    format(A,F6.1,A,F6.1)
      print *,''

      errcode=dGaussianMVCheckResults(ndim, n, a, c, dbMean, dbVar, S,  &
     &      D2, Q, DeltaM, DeltaD)

      if (errcode /= 0) then
        print *,"Error: sample moments"
        print *,"disagree with theory"
        print 15, "    DeltaM: ", DeltaM(1), DeltaM(2), DeltaM(3)
        print 15, "    DeltaD: ", DeltaD(1), DeltaD(2), DeltaD(3)
        print *,  "   ( at least one of the Deltas > 3.0) "
        stop 1
      else
        print *,"Sample moments"
        print *,"agree with theory"
        print 15, "    DeltaM: ", DeltaM(1), DeltaM(2), DeltaM(3)
        print 15, "    DeltaD: ", DeltaD(1), DeltaD(2), DeltaD(3)
        print *,  "   ( all Deltas < 3.0) "
      end if
15    format(A,F7.3,F7.3,F7.3)
      print *,''

!     Stream finalization
      errcode=vslDeleteStream(stream)
      call CheckVslError(errcode)

      end

 


Any known issues with dcopy ( MKL 11.2.4) in multithread environment?

$
0
0

My software seems to be 'randomly' deadlocking inside dcopy. I have lots of threads running. Is this a known issue? Running Intel Inspector XE always shows a litany of data race issues inside MKL but I have always been told 'don't worry, they are ok'.

     ntdll.dll!000000007750d3fa()     
     [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]    
     KernelBase.dll!000007fefd4110ac()     
     libiomp5md.dll!000007fed0c747a0()     
     libiomp5md.dll!000007fed0c08977()     
     libiomp5md.dll!000007fed0c0ce40()     
     libiomp5md.dll!000007fed0c3d7d1()     
     libiomp5md.dll!000007fed0c0e736()     
     mkl_intel_thread.dll!000007fea7a9baa5()     
     mkl_intel_thread.dll!000007fea7a0b611()     
     xxxxx.dll!dcopy()  + 0x7a bytes    

 

ECCN for MKL and TBB

$
0
0

Hi,

Could you please tell me what is the ECCN for MKL 10.3.12 and TBB 4.3 update 6 ?

Thank you for your help.

solving eigenvalue problem

$
0
0

I am trying to solve the following eigenvalue problem using lapack95:

view sourceprint 

program comp
USE mkl95_lapack
USE mkl95_PRECISION
USE mkl95_BLAS
implicit none
real, dimension(2,2) :: A
real, dimension(1) :: t
real, dimension(2) :: c
A = reshape((/-5., 2., 2., -2./),(/2,2/))
call sytrd(A, t)
call orgtr(A, t)
call rsteqr(c, t, A)
write(*,*) c
end program comp

 

But the results aren't correct. I only can deduce this implementation from the program's help for lapack95. What would be the correct way for doing this implementation?

MKL with TBB on OSX

$
0
0

Dear all,

I am new to the forum, and of course, to MKL (though I've used TBB before). I am using the MKL Link Helper to compile and link the first C example dgemm_threading_effect_example.c, but I cannot figure how to use TBB.

I know it is possible to use just TBB without OpenMP (which I don't have, being on a Mac), but it seems that I need to link the mkl_sequential library, and it seems no threads can be used.

Below you can find the example with my few added lines of code, and here are my linker switches:

-L/usr/local/lib -ltbb -ltbbmalloc -L/opt/intel/compilers_and_libraries_2016/mac/mkl/lib -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential

Thanks for any help you can give me!
     Franco

#include <stdio.h>
#include <stdlib.h>
#include "mkl.h"

#include <tbb/task_scheduler_init.h>

/* Consider adjusting LOOP_COUNT based on the performance of your computer */
/* to make sure that total run time is at least 1 second */
#define LOOP_COUNT 10

int main()
{
    double *A, *B, *C;
    int m, n, p, i, j, r, max_threads;
    double alpha, beta;
    double s_initial, s_elapsed;

    printf ("\n This example demonstrates threading impact on computing real matrix product \n"" C=alpha*A*B+beta*C using Intel(R) MKL function dgemm, where A, B, and C are \n"" matrices and alpha and beta are double precision scalars \n\n");

    m = 2000, p = 200, n = 1000;
    printf (" Initializing data for matrix multiplication C=A*B for matrix \n"" A(%ix%i) and matrix B(%ix%i)\n\n", m, p, p, n);
    alpha = 1.0; beta = 0.0;

    printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n"" performance \n\n");
    A = (double *)mkl_malloc( m*p*sizeof( double ), 64 );
    B = (double *)mkl_malloc( p*n*sizeof( double ), 64 );
    C = (double *)mkl_malloc( m*n*sizeof( double ), 64 );
    if (A == NULL || B == NULL || C == NULL) {
        printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n");
        mkl_free(A);
        mkl_free(B);
        mkl_free(C);
        return 1;
    }

    printf (" Intializing matrix data \n\n");
    for (i = 0; i < (m*p); i++) {
        A[i] = (double)(i+1);
    }

    for (i = 0; i < (p*n); i++) {
        B[i] = (double)(-i-1);
    }

    for (i = 0; i < (m*n); i++) {
        C[i] = 0.0;
    }

    // HERE I TRY BUT IT'S ALWAYS ONE SINGLE THREAD
    tbb::task_scheduler_init scheduler(4);
    mkl_set_num_threads(4);
    mkl_set_num_threads_local(4);

    printf (" Finding max number of threads Intel(R) MKL can use for parallel runs \n\n");

    // HERE I ALWAYS GET ONE
    max_threads = mkl_get_max_threads();

    printf (" Running Intel(R) MKL from 1 to %i threads \n\n", max_threads);
    for (i = 1; i <= max_threads; i++) {
        for (j = 0; j < (m*n); j++)
            C[j] = 0.0;

        printf (" Requesting Intel(R) MKL to use %i thread(s) \n\n", i);
        mkl_set_num_threads(i);

        printf (" Making the first run of matrix product using Intel(R) MKL dgemm function \n"" via CBLAS interface to get stable run time measurements \n\n");
        cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                    m, n, p, alpha, A, p, B, n, beta, C, n);

        printf (" Measuring performance of matrix product using Intel(R) MKL dgemm function \n"" via CBLAS interface on %i thread(s) \n\n", i);
        s_initial = dsecnd();
        for (r = 0; r < LOOP_COUNT; r++) {
            cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                        m, n, p, alpha, A, p, B, n, beta, C, n);
        }
        s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;

        printf (" == Matrix multiplication using Intel(R) MKL dgemm completed ==\n"" == at %.5f milliseconds using %d thread(s) ==\n\n", (s_elapsed * 1000), i);
    }

    printf (" Deallocating memory \n\n");
    mkl_free(A);
    mkl_free(B);
    mkl_free(C);

    if (s_elapsed < 0.9/LOOP_COUNT) {
        s_elapsed=1.0/LOOP_COUNT/s_elapsed;
        i=(int)(s_elapsed*LOOP_COUNT)+1;
        printf(" It is highly recommended to define LOOP_COUNT for this example on your \n"" computer as %i to have total execution time about 1 second for reliability \n"" of measurements\n\n", i);
    }

    printf (" Example completed. \n\n");
    return 0;
}

 

Help with vdrnggaussian: floating divide by zero

$
0
0

Hi all,

I have been using the vdrnggaussian routine to generate normally distributed random numbers. Once in a while however, I get the following error:

 forrtl: error (73): floating divide by zero

I have verified this occurs within the call to vdrnggaussian using write statements before and after. Here is my code:

 

SUBROUTINE reac_diff

! MKL_VSL module included in other routine.

 

        USE MKL_VSL
        USE MKL_VSL_TYPE

 

...

TYPE(VSL_STREAM_STATE) :: fstream

INTEGER, PARAMETER :: Ne=20480

INTEGER :: ferror, rdSeed !rdSeed randomly generated from 1 to 10,000

REAL :: rdmean=0., rdstd=1.

REAL, DIMENSION(Ne) :: frand

        ferror=vslnewstream( fstream,VSL_BRNG_MT19937, rdSeed )

        WRITE(*,*) 'in vdr'
        ferror = vdrnggaussian(VSL_RNG_METHOD_GAUSSIAN_BOXMULLER,fstream,Ne,frand,rdmean,rdstd )    !error occurs in here
        WRITE(*,*) 'out vdr'

 

...

END SUBROUTINE reac_diff

 

I compile with the -r8 flag so the reals are double precision. Compiling with -logo gives:

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.0.080 Build 20130728
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.
FOR NON-COMMERCIAL USE ONLY

 Intel(R) Fortran 14.0-1565
GNU ld (GNU Binutils for Ubuntu) 2.22

 

and using MKL_GET_VERSION_STRING gives

 Intel(R) Math Kernel Library Version 11.1.0 Product Build 20130711 for Intel(R)
  64 architecture applications

 

Any ideas how to prevent the error?

 

 

 

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>