lapack95 errors

May 17, 2016, 2:40 pm

Latest and popular articles on Intel Technologies

≫ Next: vdExp 32 bit vs 64 bit different results

≪ Previous: Erratic behavior of cluster_sparse_solver

I am trying to use an eigenvalue routine through the lapack95 interface of the mkl. But I end up with this:

$ ifort Eigen.f90 -mkl

/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include/lapack.f90(28): error #6218: This statement is positioned incorrectly and/or has syntax errors.
MODULE F95_PRECISION
^
/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include/lapack.f90(31): error #6790: This is an invalid statement; an END [PROGRAM]  statement is required.
END MODULE F95_PRECISION
^
/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include/lapack.f90(31): error #6785: This name does not match the unit name.   [F95_PRECISION]
END MODULE F95_PRECISION
-----------^
Eigen.f90(86): error #6785: This name does not match the unit name.   [EIGEN]
END PROGRAM Eigen
------------^
Eigen.f90(23): warning #5427: Program may contain only one main entry routine
IMPLICIT NONE
^
compilation aborted for Eigen.f90 (code 1)

Also, in the program, I have:

PROGRAM Eigen

INCLUDE 'mkl.fi'
INCLUDE 'lapack.f90'

IMPLICIT NONE
.
.
.

Any idea what the trouble might be?

↧

vdExp 32 bit vs 64 bit different results

May 18, 2016, 4:34 am

Latest and popular articles on Intel Technologies

≫ Next: LAPACKE_dgeev result differences on different CPU architectures

≪ Previous: lapack95 errors

Hi,

I'm new to MKL so please bare with me. We're moving a project from 32 to 64 bit and we've encountered some inconsistencies between 32 & 64 bit vdExp.

I have the following code

double value = 0.027771918023383080;
double result;
vdExp( 1, &value, &result );

In 32 bit mode, the value in result is 1.0281611526482297 whilst when I run it in 64 bit mode I get 1.0281611526482299.

Any ideas why this is? I've double checked and I'm definitely compiling with the correct libraries. I used the Intel MKL link advisor.

Regards

Mark.

↧

LAPACKE_dgeev result differences on different CPU architectures

May 19, 2016, 1:49 am

Latest and popular articles on Intel Technologies

≫ Next: Poor (non-threaded) performance of /Qmkl:cluster compared to /Qmkl:parallel

≪ Previous: vdExp 32 bit vs 64 bit different results

Hi, I am using MKL 11.1 to find eigen vectors with and am having issues with the results changing sign depending on which machine I run this. This gives completely different results in the end for an ellipse fitting algorithm and, thus, failures in automated tests.

Given the following matrix in row major order:

-8747596.5053710938      -316030.00427246094       1615084.5202636719
-5749756.8850097656      -667037.37084960938       632059.98022460938
 52358865.204467773       2874878.4188232422      -8747596.5129394531

The following call is made:

auto squareSize = 3;
auto info = LAPACKE_dgeev( LAPACK_ROW_MAJOR, 'N', 'V', squareSize,
  inputArrayOutput.Data(), squareSize,
  realEigenValues.Data(), imaginaryEigenValues.Data(),
  nullptr, squareSize, rightEigenVectors.Data(), squareSize );

Running this on machines that give the "correct" or expected result will give.

inputAndOutput
-18162659.389763702       -4492507.6042798907      -5290349.3859531470
 0.00000000000000000       214.50030177488844       915701.07938791020
 0.00000000000000000      -0.0070577840503023292    214.50030177488844
real eigenvalues
-18162659.389763702        214.50030177488844       214.50030177488844
imag eigenvalues
0.00000000000000000        80.391669176281070      -80.391669176281070
right eigenvectors
 0.17135077952698607       0.15755967091010539     -6.3641462760426620e-06
 0.091750438224901587     -0.67430368000587571      0.00013608158502213653
-0.98092852310503897       0.72144956765842805      0.00000000000000000

But on a "failing" machine this will give something like:

inputAndOutput
-18162659.389763702        4492507.6042798860      -5290349.3859531470
 0.00000000000000000       214.50030177486741      -915701.07938790950
 0.00000000000000000       0.0070577836735623567    214.50030177486741
real eigenvalues
-18162659.389763702        214.50030177486741       214.50030177486741
imag eigenvalues
 0.00000000000000000       80.391667030653210      -80.391667030653210
right eigenvectors
 0.17135077952698605      -0.15755967091010534      6.3641461061856458e-06
 0.091750438224901559      0.67430368000587682     -0.00013608158139016290
-0.98092852310503875      -0.72144956765842760      0.00000000000000000

As can be seen the sign of the last two right eigen vectors changes. The question then is why? And how can one correct this sign change so the result will always have the same sign? It seems to be correlated to the output in "inputAndOutput" but how?

Or is this in fact a bug and would a later MKL version fix this?

↧

Poor (non-threaded) performance of /Qmkl:cluster compared to /Qmkl:parallel

May 19, 2016, 1:28 pm

Latest and popular articles on Intel Technologies

≫ Next: const errors in mkl_lapack.h header

≪ Previous: LAPACKE_dgeev result differences on different CPU architectures

The code below behaves differently when built with the /Qmkl:parallel and /Qmkl:cluster. In both cases the code is built for Win 7 64-bit, using the latest Intel compiler and libraries. It is launched as a mpi process with mpiexec.exe -n 2 (that is, using only two ranks) on a dual 6-core workstation.

When /Qmkl:parallel is used, the call to the MKL functions on rank 0 do take advantage of the 6 OpenMP threads there.
When /Qmkl:cluster is used, only one thread on rank 0 is being used and therefore it is six times slower.

Any idea on how to have threaded behavior of /Qmkl:cluster?

Also, why is LWORK double in the /Qmkl:parallel case??

PROGRAM MAIN
USE OMP_LIB
USE MPI
IMPLICIT NONE

INTEGER(KIND=4)             :: N,ALLOC_ERROR,INFO,LWORK,SEED_SIZE,I,IERR
INTEGER(KIND=8)             :: CLOCK_START,CLOCK_STOP,CLOCK_RATE,CLOCK_MAX
INTEGER(KIND=4),ALLOCATABLE :: SEEDS(:)
LOGICAL                     :: MPI_IS_INITIALIZED
REAL(KIND=8)                :: W(1)
REAL(KIND=8),ALLOCATABLE    :: A(:,:),TAU(:),WORK(:)

CALL MPI_INITIALIZED(MPI_IS_INITIALIZED,IERR)
IF (.NOT.MPI_IS_INITIALIZED) THEN
    CALL MPI_INIT(IERR)
END IF

WRITE(*,*) 'I am image ',THIS_IMAGE(),' and I can span ',OMP_GET_MAX_THREADS(),' OpenMP threads.'

IF (THIS_IMAGE()==1) THEN

    N = 3000
    WRITE(*,*) 'N     = ',N
    ALLOCATE(A(N,N),STAT=ALLOC_ERROR)
    IF (ALLOC_ERROR/=0) THEN
        ERROR STOP
    END IF
    CALL RANDOM_SEED(SIZE=SEED_SIZE)
    ALLOCATE(SEEDS(SEED_SIZE))
    SEEDS=123456
    CALL RANDOM_SEED(PUT=SEEDS)
    CALL RANDOM_NUMBER(A)

    ALLOCATE(TAU(N),STAT=ALLOC_ERROR)
    LWORK = -1
    CALL DGEQRF(N,N,A,N,TAU,W,LWORK,INFO)
    WRITE(*,*) 'LWORK = ',W(1)
    LWORK = INT(W(1))
    ALLOCATE(WORK(LWORK),STAT=ALLOC_ERROR)

    CALL SYSTEM_CLOCK(CLOCK_START,CLOCK_RATE,CLOCK_MAX)
    CALL DGEQRF(N,N,A,N,TAU,WORK,LWORK,INFO)
    CALL DORGQR(N,N,N,A,N,TAU,WORK,LWORK,INFO)
    CALL SYSTEM_CLOCK(CLOCK_STOP,CLOCK_RATE,CLOCK_MAX)
    WRITE(*,*) 'INFO  = ',INFO
    WRITE(*,*) 'TIME  = ',(REAL(CLOCK_STOP-CLOCK_START,KIND=8))/REAL(CLOCK_RATE,KIND=8)

END IF

END PROGRAM MAIN

Here is the output when using /Qmkl:cluster

I am image  2  and I can span  6  OpenMP threads.
I am image  1  and I can span  6  OpenMP threads.
N      =  3000
LWORK  =  288096
INFO   =  0
TIME   =  6.38000000000000
A(N,N) =  -2.110006751937421E-002

Here is the output when using /Qmkl:parallel

I am image  2  and I can span  6  OpenMP threads.
I am image  1  and I can span  6  OpenMP threads.
N      =  3000
LWORK  =  742977
INFO   =  0
TIME   =  0.920000000000000
A(N,N) =  -2.110006751937324E-002

Here is the build log (when using /Qmkl:cluster)

Compiling with Intel(R) Visual Fortran Compiler 17.0 [Intel(R) 64]...
ifort /nologo /O2 /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\include" /Qopenmp /standard-semantics /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:dll /threads /Qmkl:cluster /c /Qcoarray:single /Qlocation,link,"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\\bin\amd64" /Qm64 "D:\TEMP\QR_PERFORMANCE\MAIN.F90"
Linking...
Link /OUT:"x64\Release\QR_PERFORMANCE.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\lib\release_mt" /MANIFEST /MANIFESTFILE:"x64\Release\QR_PERFORMANCE.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /IMPLIB:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.lib" impi.lib -qm64 /qoffload-ldopts="-mkl=cluster""x64\Release\MAIN.obj"
Embedding manifest...
mt.exe /nologo /outputresource:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.exe;#1" /manifest "x64\Release\QR_PERFORMANCE.exe.intermediate.manifest"

QR_PERFORMANCE - 0 error(s), 0 warning(s)

↧

const errors in mkl_lapack.h header

May 19, 2016, 10:10 pm

Latest and popular articles on Intel Technologies

≫ Next: ?syrdb arguments issue

≪ Previous: Poor (non-threaded) performance of /Qmkl:cluster compared to /Qmkl:parallel

1) In all variants of zlacrm, zlarcm, clacrm, clarcm, the output C should not be const:

void zlacrm_( const MKL_INT* m, const MKL_INT* n, const MKL_Complex16* a,
const MKL_INT* lda, const double* b, const MKL_INT* ldb,
~~const~~ MKL_Complex16* c, const MKL_INT* ldc, double* rwork );

The LAPACK documentation was also wrong. Bug report sent to them, too.

lapack/SRC> grep 'param.* C' *larcm.f *lacrm.f
clarcm.f:*> \param[in] C
zlarcm.f:*> \param[in] C
clacrm.f:*> \param[in] C
zlacrm.f:*> \param[in] C

should be:
lapack/SRC> grep 'param.* C' *larcm.f *lacrm.f
clarcm.f:*> \param[out] C
zlarcm.f:*> \param[out] C
clacrm.f:*> \param[out] C
zlacrm.f:*> \param[out] C

2) In all variants of [sdcz]larft, the input V should be const:

void zlarft_( const char* direct, const char* storev, const MKL_INT* n,
const MKL_INT* k, const MKL_Complex16* v, const MKL_INT* ldv,
const MKL_Complex16* tau, MKL_Complex16* t, const MKL_INT* ldt );

lapack/SRC> grep 'param.* V' *larft.f
clarft.f:*> \param[in] V
dlarft.f:*> \param[in] V
slarft.f:*> \param[in] V
zlarft.f:*> \param[in] V

- Mark Gates, Innovative Computing Laboratory, UTK

↧

?syrdb arguments issue

May 20, 2016, 12:03 am

Latest and popular articles on Intel Technologies

≫ Next: Problem with mkl_ddnscsr

≪ Previous: const errors in mkl_lapack.h header

There seems to be an inconsistency in the documentation of ?syrdb(https://software.intel.com/en-us/node/469030)

I've attached my program which you are free to change and is written in Fortran 90. It generates a random symmetric matrix A and tries to reduce it to the banded matrix B, of specified bandwidth. Make sure to compile it with the -mkl flag

On setting the flag jobz to 'U', I am expected to

A is supposed to be overwritten by the banded matrix B. It doesn't seem to be the case as I still get a full matrix.
The documentation says it will be overwritten by Q_Bas well, which doesn't make sense as there is one matrix already.
Z is written by Q - which is correct as Q^TAQ gives me a tridiagonal matrix.

I would like access to Q_B such that Q^T_BAQ_B= B. Could you look into the algorithm (and my program if necessary) to find out the issue?

Attachment	Size
Download SymBanRed.f90	2.26 KB

↧

Problem with mkl_ddnscsr

May 20, 2016, 3:09 am

Latest and popular articles on Intel Technologies

≫ Next: sytrd is not recognized by lapack90

≪ Previous: ?syrdb arguments issue

Hello

I’ve been having some problems using the “mkl_ddnscsr” function. I’ve followed the example that comes with the library but it’s not working properly. I can retrieve the non-zero elements in the dense matrix but the row and column vectors are returned empty (all elements are zeros). Below you can find my code; it’s a simplified version of the example in “dconverters.c”. I also have another doubt, what would be the most efficient way to use this function when working with large matrices and the number of non-zero elements is unknown; one way could be to set the a really high number for the maximum number of non-zero elements but that could lead to pre-allocating large vectors. Any help would be much appreciated.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "mkl_types.h"
#include "mkl_spblas.h"

int main (void)
{

#define M      4
#define N      4
#define LDA    4
#define NZMAX  8
#define NNZ    8
#define MBLK   2
#define NN     2
#define INFO   0
#define MN     16
#define IBASE1 1
#define IBASE2 1
#define LOCAT  2
#define IDIAG 3
#define NDIAG 4
#define INDIA 12
int		m = M, n=N, lda=LDA, nzmax=NZMAX, nnz = NNZ, mblk=MBLK, nn=NN, info=INFO ,mn=MN;
int	    ibase1 = IBASE1, ibase2 = IBASE2, locat = LOCAT, idiag = IDIAG, ndiag = NDIAG;
double	Adns[MN];
double	Acsr[NZMAX];
int		AI[M+1];
int		AJ[NZMAX];
int		i, j;
int		job[8];

job[0]=0;
job[1]=0;
job[2]=1;
job[3]=2;
job[4]=NZMAX;
job[5]=3;

for ( j=0; j<n; j++)
         for ( i=0; i<m; i++)
               Adns[i + lda*j]=0.0;

      Adns[0]=5.0;
      Adns[1]=9.0;
      Adns[4]=8.0;
      Adns[5]=2.0;
      Adns[10]=3.0;
      Adns[11]=1.0;
      Adns[14]=6.0;
      Adns[15]=4.0;

mkl_ddnscsr(job,&m,&n,Adns,&lda,Acsr,AJ,AI,&info);

return 0;
}

Regards

lggs

↧

sytrd is not recognized by lapack90

May 20, 2016, 7:23 am

Latest and popular articles on Intel Technologies

≫ Next: solve system when matrix is banded, symmetric, positive definite matrix.

≪ Previous: Problem with mkl_ddnscsr

Hi,

I am trying to use LAPACK's "sytrd" subroutine in my code, but it is not recognized. I am trying the following simple code:

program comp
USE mkl95_lapack
USE mkl95_PRECISION
USE mkl95_BLAS
implicit none
real, dimension(2,2) :: A
integer, dimension(1) :: t
A = reshape((/-5., 2., 2., -2./),(/2,2/))
call sytrd(A, t)
end program comp

This is the error I get:

error #6285: There is no matching specific subroutine for this generic subroutine call. [SYTRD]

But when I use another of LAPACK's subroutines like "getrf", everything's fine:

program comp
USE mkl95_lapack
USE mkl95_PRECISION
USE mkl95_BLAS
implicit none
real, dimension(2,2) :: A
integer, dimension(2) :: t
A = reshape((/-5., 2., 2., -2./),(/2,2/))
call getrf(A, t)
end program comp

What might cause this problem?

↧

solve system when matrix is banded, symmetric, positive definite matrix.

May 20, 2016, 8:25 am

Latest and popular articles on Intel Technologies

≫ Next: Newmark Method

≪ Previous: sytrd is not recognized by lapack90

Hello.

Let Q be a banded, symmetric, positive definite matrix. (The number of rows of Q is 2e+5) . I want to do the following:

Compute the Cholesky factorization, Q=LU, where U=L^{T}
Solve Lw = b
Solve Um=w
Sample z~N(0,1)
Solve Uv=z
Compute x=m+v
Return x

Steps 2 and 3 give the solution of Qm=b.

I have asked this question here, but then Q was a triagonal, symmetric matrix. I used fuctions 'LAPACKE_dpbtrf, cblas_dtbsv'and solved the problem,

   /* Colesky factorization */
    info = LAPACKE_dpbtrf(LAPACK_COL_MAJOR, 'U', dim+1, 1, Sigmab, 2 );
    if(info!= 0){mexPrintf( "C++ error: Cholesky failed");  }
  
    /* step 2*/
    cblas_dtbsv(CblasColMajor, CblasUpper, CblasTrans, CblasNonUnit, dim, 1, Sigmab, 2, y1, 1); 
  
    /* step 3*/
    cblas_dtbsv(CblasColMajor, CblasUpper, CblasNoTrans, CblasNonUnit, dim, 1, Sigmab, 2, y1, 1); 
  
    /* step 5 */
    cblas_dtbsv(CblasColMajor, CblasUpper, CblasNoTrans, CblasNonUnit, dim, 1, Sigmab 2, y2, 1);

I found functions 'LAPACKE_dpbtrf, LAPACKE_dpbtrs'that give the solution of steps 2 and 3.

   info = LAPACKE_dpbtrf(LAPACK_COL_MAJOR, 'L', dim, p, Sigma, p+1);
   if(info!= 0){mexPrintf( "C++ error: Cholesky failed");  }
     
   info = LAPACKE_dpbtrs(LAPACK_COL_MAJOR, 'L', dim, p, NRHS, Sigma, p+1, y1, dim);
   if(info!= 0){mexPrintf( "C++ error: the execution is not successful");  }

Firstly, I would like to ask if there is a better way to solve this and secondly I don't know how to find the solution of step 5.

Thank you very much.

↧

Newmark Method

May 21, 2016, 9:48 am

Latest and popular articles on Intel Technologies

≫ Next: MKL Format Prototype Package

≪ Previous: solve system when matrix is banded, symmetric, positive definite matrix.

is there a routine in MKL to use Newmark's method on Newtons' second equation - undamped matrices from the static structures problem?

ie. Can I extend the structures program into the time domain?

↧

MKL Format Prototype Package

May 21, 2016, 1:03 pm

Latest and popular articles on Intel Technologies

≫ Next: CentOS 6.2

≪ Previous: Newmark Method

Hi everybody,

How can I get or ask for MKL FPP evaluation copy ? i just need an email or link

Thanks,

↧

CentOS 6.2

May 23, 2016, 2:01 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL Sparse BLAS segfaults due to integer overflow

≪ Previous: MKL Format Prototype Package

Hello,

What version of MKL can be installed on CentOS 6.2?

Thanks!

↧

MKL Sparse BLAS segfaults due to integer overflow

May 25, 2016, 12:03 pm

Latest and popular articles on Intel Technologies

≫ Next: Problem with MKL pardiso update 3

≪ Previous: CentOS 6.2

Hello

I am a very recent user of Intel MKL. I was trying to use Sparse BLAS library (Inspector/Executor routines), when I ran into an obscure segfault. The segfault appeared in CSR matrix transpose operations, and only when the number of rows was large enough (roughly 100M). Here is a more detailed description that I posted on stackoverflow.com: http://stackoverflow.com/questions/37395541/mkl-sparse-blas-segfault-whe... .

Eventually I was able to track it down to obvious integer overflow, while computing memory size for malloc (number of rows is getting multiplied by the number of threads, which was 32 in my case).

It took me a couple of days, so I keep wondering

Did I miss some relevant part of MKL documenation? An explanation about maximum feasible number of rows in a sparse matrix seems essential. Spending time to figure it out from gdb disas somehow doesn't feel right.
Regardless, it sure feels like a bug. I would expect mkl_sparse_convert_csr to return an error status instead of crashing.

Is it a known issue? Are there any other known limits here that I should be aware of?

↧

Problem with MKL pardiso update 3

May 25, 2016, 1:31 pm

Latest and popular articles on Intel Technologies

≫ Next: Gaussian Random Routines

≪ Previous: MKL Sparse BLAS segfaults due to integer overflow

I have the following installed on my computer:
Intel Parallel Studio XE 2016 Update 2
Intel Parallel Studio XE 2016 Update 3

I think I have come across a serious bug in the latest MKL Pardiso 11.3 update 3.

Attached is a simple program that reads in a test matrix and right-hand-side. I have included all the test input files in the attached zip file.
The MKL Pardiso is called to solve the system.
When compiled with Update 2, the program runs fine. However, when compiled with Update 3, the system is NOT solved and the right-hand side is returned in the solution vector.

Roman

Attachment	Size
Download pardiso_prob.zip	1.51 MB

↧

Gaussian Random Routines

May 26, 2016, 9:37 am

Latest and popular articles on Intel Technologies

≫ Next: Any known issues with dcopy ( MKL 11.2.4) in multithread environment?

≪ Previous: Problem with MKL pardiso update 3

Some of the routines called in this sample are not in the documentation that I can find, CHECKVSLERROR as an example. Any ideas where I find details.

John
===============================================================================
! Copyright 2003-2016 Intel Corporation All Rights Reserved.
!
! The source code,  information  and material  ("Material") contained  herein is
! owned by Intel Corporation or its  suppliers or licensors,  and  title to such
! Material remains with Intel  Corporation or its  suppliers or  licensors.  The
! Material  contains  proprietary  information  of  Intel or  its suppliers  and
! licensors.  The Material is protected by  worldwide copyright  laws and treaty
! provisions.  No part  of  the  Material   may  be  used,  copied,  reproduced,
! modified, published,  uploaded, posted, transmitted,  distributed or disclosed
! in any way without Intel's prior express written permission.  No license under
! any patent,  copyright or other  intellectual property rights  in the Material
! is granted to  or  conferred  upon  you,  either   expressly,  by implication,
! inducement,  estoppel  or  otherwise.  Any  license   under such  intellectual
! property rights must be express and approved by Intel in writing.
!
! Unless otherwise agreed by Intel in writing,  you may not remove or alter this
! notice or  any  other  notice   embedded  in  Materials  by  Intel  or Intel's
! suppliers or licensors in any way.
!===============================================================================

!  Content:
!    vdRngGaussianMV  Example Program Text
!*******************************************************************************

      include 'mkl_vsl.f90'
      include "errcheck.inc"
      include "statcheck.inc"

      program MKL_VSL_TEST

      USE MKL_VSL_TYPE
      USE MKL_VSL

      integer(kind=4) i
      integer(kind=4) errcode

      integer(kind=4) nn
      integer ndim,info
      integer n

      parameter(n=1000,nn=1000,ndim=3)

      integer brng,method,seed
      integer me

      real(kind=8) c(ndim,ndim),t(ndim,ndim),a(ndim)
      real(kind=8) r(ndim,n)
      real(kind=8) dbS(ndim),dbS2(ndim),dbMean(ndim),dbVar(ndim)
      real(kind=8) dbCovXY,dbCovXZ,dbCovYZ

      real(kind=8) S(ndim),D2(ndim),Q(ndim)
      real(kind=8) DeltaM(ndim),DeltaD(ndim)

      TYPE (VSL_STREAM_STATE) :: stream

      brng=VSL_BRNG_MCG31
      seed=7777777
      method=VSL_RNG_METHOD_GAUSSIANMV_BOXMULLER2
      me=VSL_MATRIX_STORAGE_FULL

!     Variance-covariance matrix for test
!     (should be symmetric,positive-definite)

!     This is full storage for dpotrf subroutine
      c(1,1)=16.0D0
      c(1,2)=8.0D0
      c(1,3)=4.0D0

      c(2,1)=8.0D0
      c(2,2)=13.0D0
      c(2,3)=17.0D0

      c(3,1)=4.0D0
      c(3,2)=17.0D0
      c(3,3)=62.0D0

      a(1)=3.0D0
      a(2)=5.0D0
      a(3)=2.0D0

      t  = c

      print *,'Variance-covariance matrix C'
      write (*,'(3F7.3)') c
      print *,''

      print *,'Mean vector a:'
      write (*,'(3F7.3)') a
      print *,''

      print *,'VSL_MATRIX_STORAGE_FULL'
      print *,'-----------------------'
      print *,''

      call dpotrf('U',ndim,t,ndim,info)

!     Stream initialization
      errcode=vslnewstream(stream,brng,seed)
      call CheckVslError(errcode)

!     Generating random numbers
!     from multivariate normal distribution
      errcode=vdrnggaussianmv(method,stream,n,r,ndim,me,a,t)
      call CheckVslError(errcode)

!     Printing random numbers
      print 11,' Results (first ',nn,' of ',n,')'
      print *,'--------------------------'
11    format(A,I5,A,I5,A)

      do i=1,nn
        print 12,' r(',i,')=(',r(:,i),')'
      end do
12    format(A,I5,A,3F8.3,A)
      print *,''

      call dCalculateGaussianMVSampleCharacteristics(ndim, n, r,        &
     &      dbS, dbS2, dbMean, dbVar, dbCovXY, dbCovXZ, dbCovYZ)

!     Printing
      print *,'Sample characteristics:'
      print *,'-----------------------'
      print *,'      Sample             Theory'
      print 13,' Mean :(',dbMean(1),dbMean(2),dbMean(3),                &
     &         ')  (',a(1),a(2),a(3),')'
      print 13,' Var. :(',dbVar(1),dbVar(2),dbVar(3),                   &
     &         ')  (',c(1,1),c(2,2),c(3,3),')'
      print 14,' CovXY: ',dbCovXY,'          ',c(1,2)
      print 14,' CovXZ: ',dbCovXZ,'          ',c(1,3)
      print 14,' CovYZ: ',dbCovYZ,'          ',c(2,3)
13    format(A,F5.1,F5.1,F5.1,A,F5.1,F5.1,F5.1,A)
14    format(A,F6.1,A,F6.1)
      print *,''

      errcode=dGaussianMVCheckResults(ndim, n, a, c, dbMean, dbVar, S,  &
     &      D2, Q, DeltaM, DeltaD)

      if (errcode /= 0) then
        print *,"Error: sample moments"
        print *,"disagree with theory"
        print 15, "    DeltaM: ", DeltaM(1), DeltaM(2), DeltaM(3)
        print 15, "    DeltaD: ", DeltaD(1), DeltaD(2), DeltaD(3)
        print *,  "   ( at least one of the Deltas > 3.0) "
        stop 1
      else
        print *,"Sample moments"
        print *,"agree with theory"
        print 15, "    DeltaM: ", DeltaM(1), DeltaM(2), DeltaM(3)
        print 15, "    DeltaD: ", DeltaD(1), DeltaD(2), DeltaD(3)
        print *,  "   ( all Deltas < 3.0) "
      end if
15    format(A,F7.3,F7.3,F7.3)
      print *,''

!     Stream finalization
      errcode=vslDeleteStream(stream)
      call CheckVslError(errcode)

      end

↧

Any known issues with dcopy ( MKL 11.2.4) in multithread environment?

May 26, 2016, 2:45 pm

Latest and popular articles on Intel Technologies

≫ Next: ECCN for MKL and TBB

≪ Previous: Gaussian Random Routines

My software seems to be 'randomly' deadlocking inside dcopy. I have lots of threads running. Is this a known issue? Running Intel Inspector XE always shows a litany of data race issues inside MKL but I have always been told 'don't worry, they are ok'.

     ntdll.dll!000000007750d3fa()
    [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]
    KernelBase.dll!000007fefd4110ac()
    libiomp5md.dll!000007fed0c747a0()
    libiomp5md.dll!000007fed0c08977()
    libiomp5md.dll!000007fed0c0ce40()
    libiomp5md.dll!000007fed0c3d7d1()
    libiomp5md.dll!000007fed0c0e736()
    mkl_intel_thread.dll!000007fea7a9baa5()
    mkl_intel_thread.dll!000007fea7a0b611()
     xxxxx.dll!dcopy() + 0x7a bytes

↧

ECCN for MKL and TBB

May 27, 2016, 1:22 am

Latest and popular articles on Intel Technologies

≫ Next: solving eigenvalue problem

≪ Previous: Any known issues with dcopy ( MKL 11.2.4) in multithread environment?

Hi,

Could you please tell me what is the ECCN for MKL 10.3.12 and TBB 4.3 update 6 ?

Thank you for your help.

↧

solving eigenvalue problem

May 29, 2016, 10:58 am

Latest and popular articles on Intel Technologies

≫ Next: MKL with TBB on OSX

≪ Previous: ECCN for MKL and TBB

I am trying to solve the following eigenvalue problem using lapack95:

view source print

program comp
USE mkl95_lapack
USE mkl95_PRECISION
USE mkl95_BLAS
implicit none
real, dimension(2,2) :: A
real, dimension(1) :: t
real, dimension(2) :: c
A = reshape((/-5., 2., 2., -2./),(/2,2/))
call sytrd(A, t)
call orgtr(A, t)
call rsteqr(c, t, A)
write(*,*) c
end program comp

But the results aren't correct. I only can deduce this implementation from the program's help for lapack95. What would be the correct way for doing this implementation?

↧

MKL with TBB on OSX

May 30, 2016, 6:35 am

Latest and popular articles on Intel Technologies

≫ Next: Help with vdrnggaussian: floating divide by zero

≪ Previous: solving eigenvalue problem

Dear all,

I am new to the forum, and of course, to MKL (though I've used TBB before). I am using the MKL Link Helper to compile and link the first C example dgemm_threading_effect_example.c, but I cannot figure how to use TBB.

I know it is possible to use just TBB without OpenMP (which I don't have, being on a Mac), but it seems that I need to link the mkl_sequential library, and it seems no threads can be used.

Below you can find the example with my few added lines of code, and here are my linker switches:

-L/usr/local/lib -ltbb -ltbbmalloc -L/opt/intel/compilers_and_libraries_2016/mac/mkl/lib -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential

Thanks for any help you can give me!
Franco

#include <stdio.h>
#include <stdlib.h>
#include "mkl.h"

#include <tbb/task_scheduler_init.h>

/* Consider adjusting LOOP_COUNT based on the performance of your computer */
/* to make sure that total run time is at least 1 second */
#define LOOP_COUNT 10

int main()
{
    double *A, *B, *C;
    int m, n, p, i, j, r, max_threads;
    double alpha, beta;
    double s_initial, s_elapsed;

    printf ("\n This example demonstrates threading impact on computing real matrix product \n"" C=alpha*A*B+beta*C using Intel(R) MKL function dgemm, where A, B, and C are \n"" matrices and alpha and beta are double precision scalars \n\n");

    m = 2000, p = 200, n = 1000;
    printf (" Initializing data for matrix multiplication C=A*B for matrix \n"" A(%ix%i) and matrix B(%ix%i)\n\n", m, p, p, n);
    alpha = 1.0; beta = 0.0;

    printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n"" performance \n\n");
    A = (double *)mkl_malloc( m*p*sizeof( double ), 64 );
    B = (double *)mkl_malloc( p*n*sizeof( double ), 64 );
    C = (double *)mkl_malloc( m*n*sizeof( double ), 64 );
    if (A == NULL || B == NULL || C == NULL) {
        printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n");
        mkl_free(A);
        mkl_free(B);
        mkl_free(C);
        return 1;
    }

    printf (" Intializing matrix data \n\n");
    for (i = 0; i < (m*p); i++) {
        A[i] = (double)(i+1);
    }

    for (i = 0; i < (p*n); i++) {
        B[i] = (double)(-i-1);
    }

    for (i = 0; i < (m*n); i++) {
        C[i] = 0.0;
    }

    // HERE I TRY BUT IT'S ALWAYS ONE SINGLE THREAD
    tbb::task_scheduler_init scheduler(4);
    mkl_set_num_threads(4);
    mkl_set_num_threads_local(4);

    printf (" Finding max number of threads Intel(R) MKL can use for parallel runs \n\n");

    // HERE I ALWAYS GET ONE
    max_threads = mkl_get_max_threads();

    printf (" Running Intel(R) MKL from 1 to %i threads \n\n", max_threads);
    for (i = 1; i <= max_threads; i++) {
        for (j = 0; j < (m*n); j++)
            C[j] = 0.0;

        printf (" Requesting Intel(R) MKL to use %i thread(s) \n\n", i);
        mkl_set_num_threads(i);

        printf (" Making the first run of matrix product using Intel(R) MKL dgemm function \n"" via CBLAS interface to get stable run time measurements \n\n");
        cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                    m, n, p, alpha, A, p, B, n, beta, C, n);

        printf (" Measuring performance of matrix product using Intel(R) MKL dgemm function \n"" via CBLAS interface on %i thread(s) \n\n", i);
        s_initial = dsecnd();
        for (r = 0; r < LOOP_COUNT; r++) {
            cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                        m, n, p, alpha, A, p, B, n, beta, C, n);
        }
        s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;

        printf (" == Matrix multiplication using Intel(R) MKL dgemm completed ==\n"" == at %.5f milliseconds using %d thread(s) ==\n\n", (s_elapsed * 1000), i);
    }

    printf (" Deallocating memory \n\n");
    mkl_free(A);
    mkl_free(B);
    mkl_free(C);

    if (s_elapsed < 0.9/LOOP_COUNT) {
        s_elapsed=1.0/LOOP_COUNT/s_elapsed;
        i=(int)(s_elapsed*LOOP_COUNT)+1;
        printf(" It is highly recommended to define LOOP_COUNT for this example on your \n"" computer as %i to have total execution time about 1 second for reliability \n"" of measurements\n\n", i);
    }

    printf (" Example completed. \n\n");
    return 0;
}

↧

Help with vdrnggaussian: floating divide by zero

May 30, 2016, 9:54 am

Latest and popular articles on Intel Technologies

≫ Next: FFT and MKL Problems

≪ Previous: MKL with TBB on OSX

Hi all,

I have been using the vdrnggaussian routine to generate normally distributed random numbers. Once in a while however, I get the following error:

forrtl: error (73): floating divide by zero

I have verified this occurs within the call to vdrnggaussian using write statements before and after. Here is my code:

SUBROUTINE reac_diff

! MKL_VSL module included in other routine.

USE MKL_VSL
USE MKL_VSL_TYPE

...

TYPE(VSL_STREAM_STATE) :: fstream

INTEGER, PARAMETER :: Ne=20480

INTEGER :: ferror, rdSeed !rdSeed randomly generated from 1 to 10,000

REAL :: rdmean=0., rdstd=1.

REAL, DIMENSION(Ne) :: frand

ferror=vslnewstream( fstream,VSL_BRNG_MT19937, rdSeed )

WRITE(*,*) 'in vdr'
ferror = vdrnggaussian(VSL_RNG_METHOD_GAUSSIAN_BOXMULLER,fstream,Ne,frand,rdmean,rdstd ) !error occurs in here
WRITE(*,*) 'out vdr'

...

END SUBROUTINE reac_diff

I compile with the -r8 flag so the reals are double precision. Compiling with -logo gives:

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.0.080 Build 20130728
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY

Intel(R) Fortran 14.0-1565
GNU ld (GNU Binutils for Ubuntu) 2.22

and using MKL_GET_VERSION_STRING gives

Intel(R) Math Kernel Library Version 11.1.0 Product Build 20130711 for Intel(R)
64 architecture applications

Any ideas how to prevent the error?

↧