Need 64-bit version of mkl_link_tool in the DevCloud

March 27, 2020, 7:59 am

Latest and popular articles on Intel Technologies

≫ Next: fatal error LNK 1104 can not open mkl cdft.core.lib

≪ Previous: A CMake config (MKLConfig.cmake) for IntelMKL

I have a project on the DevCloud that uses MKL and is built with CMake. For convenience, CMake invokes `mkl_link_tool` to construct the link line. However, `mkl_link_tool` happens to be a 32-bit executable:

/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux$ file mkl/tools/mkl_link_tool
mkl/tools/mkl_link_tool: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-, for GNU/Linux 2.6.18, BuildID[sha1]=ec312866645ce227a0fd3b0aeabe20b1c9d7ba42, not stripped

and Ubuntu does not let it run. `bash` gives a very misleading error message, saying that the file could not be found.

The cure is to install the 32-bit compatibility layer package:

`sudo apt-get install lib32stdc++6`

but hey, I have no sudo privileges on the DevCloud :-)

Is there a quick workaround at the user level?

BTW, it's high time for `mkl_link_tool` grew up and gained an additional set of 32 bits which I am sure it deserves .:-)

Cheers, AA

TCE Level:

Level 1

TCE Open Date:

Friday, March 27, 2020 - 07:53

↧

fatal error LNK 1104 can not open mkl cdft.core.lib

March 29, 2020, 4:49 am

Latest and popular articles on Intel Technologies

≫ Next: Linear Regression

≪ Previous: Need 64-bit version of mkl_link_tool in the DevCloud

Dear all,

I am using parallel studio 2019.5.068 togetherr with mkl cluster in visual studio2015 community. I tried to run a fortran 90 program which employ : module.f90, fftw3.f03, omp.lib.f90 files among others and a makefile but after building using visual studio I am getting the message fatal LNK1104 can not open mkl cdft.core.lib and when I checked the fft folder in MKL there was no mkl cdft.core.lib but mkl .core.lib, mkl cdft.core dll.lib, and mkl .core dll.lib. Attached is the build log.

Thanks.

Attachment	Size
Download New Microsoft Word Document.pdf	166.81 KB

↧

Linear Regression

March 29, 2020, 3:02 pm

Latest and popular articles on Intel Technologies

≫ Next: Routine to compute A=xy'?

≪ Previous: fatal error LNK 1104 can not open mkl cdft.core.lib

I am running into problems with linear regression -- I am trying GESL and it has a linking error -- any ideas

Attachment	Size
Download Capture.PNG	53.77 KB
Download CaptureA.PNG	68.68 KB

↧

Routine to compute A=xy'?

March 27, 2020, 10:23 am

Latest and popular articles on Intel Technologies

≫ Next: Problem with LAPACK subroutine ZHEEVR, input array "isuppz" accessed despite documentation saying otherwise

≪ Previous: Linear Regression

The BLAS Level 2 routine cblas_?ger computes A := alpha*x*y'+ A. Is there a simpler routine that just calculates A := alpha*x*y'?

Setting A=0 offers the same results, but does it provide good performance too? i.e. am I wasting computation in doing the additions?

↧

Problem with LAPACK subroutine ZHEEVR, input array "isuppz" accessed despite documentation saying otherwise

March 29, 2020, 6:09 am

Latest and popular articles on Intel Technologies

≫ Next: MACOS FFTW Interface

≪ Previous: Routine to compute A=xy'?

Hello,

I think I have run into an inconsistency between the documented behaviour of LAPACK subroutine ZHEEVR and the observed behaviour.

According to the documentation here:

https://software.intel.com/en-us/mkl-developer-reference-fortran-heevr

the input array "isuppz" is "[r]eferenced only if eigenvectors are needed (jobz = 'V') and all eigenvalues are needed."

However, it appears that the array is accessed even if

1. the "jobz" parameter is 'N' (that is, only the eigenvalues are requested, while the eigenvectors are not); or
2. the "jobz" parameter is 'V', and only a proper subset of eigenvalue/vectors is requested, as specified by the "il" and "iu" parameters.

The uploaded files "demos1.f" and "demos2.f" demonstrate these problems respectively. In each demo, the subroutine zheevr() is called on the 6x6 identity matrix. Before the call, the input "isuppz" array is filled with an integer pattern. If the implementation matches the documentation, these initial values should not change upon return from zheevr(). But here, the output shows that the isuppz array did get overwritten by zheevr().

That is, the expected output, for both demo programmes, should be:

m: 3
isuppz:
111111 222222 333333 444444 555555 666666 777777 888888 999999 101010 101101 102102

but the actual output was:

m: 3
isuppz:
0 0 0 0 0 0 777777 888888 999999 101010 101101 102102

This means that the array may get overwritten when it is not expected to be. In the case when the caller expects isuppz to be not referenced, and therefore passes a small array (maybe to save memory space), the call to zheevr may cause a memory error or buffer overflow.

I wish to bring this observation to the attention of the Intel staff here. Thanks!

Zoë

- Operating system and version
-- macOS 10.15.3

- Library version
-- MKL 2020.0 (distributed with miniconda)

- Compiler version
-- gfortran 9.3.0

- Steps to reproduce the error
-- The Fortran77 code (attached) was compiled and linked with "-lmkl_rt"

Attachment	Size
Download demos1.f	1.62 KB
Download demos2.f	1.62 KB

↧

MACOS FFTW Interface

March 31, 2020, 12:54 am

Latest and popular articles on Intel Technologies

≫ Next: [spBLAS] problem on defining MPI_INT as long int

≪ Previous: Problem with LAPACK subroutine ZHEEVR, input array "isuppz" accessed despite documentation saying otherwise

Hello,

i installed m_fcompxe_2020.015.dmg on my mac, one month testing version.

I want to make the fftw2 and fftw3 interfaces, but in the directory interfaces/fftw2x_cdft is the directory wrapper, only.

There and inside the fftw3 directory isn't any makefile. Is this correct for this version?

Wow i can make this interfaces?

Best regards,

Axel

↧

[spBLAS] problem on defining MPI_INT as long int

March 20, 2020, 1:05 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® MKL version 2020 Update 1 is now available

≪ Previous: MACOS FFTW Interface

Hello,

I am new to Intel and I face a strange problem on defining MPI_INT as long int.

Problem: I add

#define MKL_INT long int

at the very beginning of the provided example <mklroot>/examples/examples_core_c.tgz/spblasc/source/sparse_trsv.c, and the output data for mkl_sparse_d_trsv is [1.0, 5.0, 3.0, 4.0, -13.0], which is incorrect, while no issue is raised when compling.

If I don't add that #define sentence, everything is fine and the output data is the correct answer [1.0, 7.0, 1.0, 6.0, -55.0].

The working environment:

Ubuntu 18.04.1 with x86_64 GNU/Linux kenel 4.18.0-25-generic
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications with Processor optimization: Intel(R) AVX2 enabled processors
gcc-4.9.3 (Ubuntu 4.9.3-13ubuntu2)
compiling command: gcc sparse_trsv.c -lmkl_rt -I<mklroot>/include -L<mklroot>/lib/intel64 -Wall
CPU if helpful: Intel® Core™ i7-8700

Any advice or solution is welcome.

↧

Intel® MKL version 2020 Update 1 is now available

April 1, 2020, 2:13 am

Latest and popular articles on Intel Technologies

≫ Next: MKL 2020.1, VS2019 linking bug

≪ Previous: [spBLAS] problem on defining MPI_INT as long int

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance.

Intel MKL 2020 Update 1 packages are now ready for download.

Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio. Please visit the Intel® Math Kernel Library Product Page.

Please see What's new in Intel MKL 2020 and in MKL 2020 Update 1 follow this link

and here is the link to the MKL_2020 Bug Fix List

↧

MKL 2020.1, VS2019 linking bug

April 1, 2020, 6:50 am

Latest and popular articles on Intel Technologies

≫ Next: Results of LAPACKE_dgesvd

≪ Previous: Intel® MKL version 2020 Update 1 is now available

I updated to version 2020.1 and got the following error recompiling my dll

Error	MSB3073	The command "mkl_link_tool.exe -libs -c ms_c -a intel64 -l static 2> NUL" exited with code 9009.	LNM_Lapack	C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\Platforms\x64\PlatformToolsets\v142\ImportBefore\Intel.Libs.MKL.v142.targets	64

line 64 in the Intel.Libs.MKL.v142.targets file corresponds to

<Exec ConsoleToMSBuild="true" EchoOff="true" StandardOutputImportance="low" Command="mkl_link_tool.exe $(MKLArguments) 2&gt; NUL" WorkingDirectory="$(MKLProductDir)\mkl\tools">
    <Output TaskParameter="ConsoleOutput" ItemName="_MKLLibraries" />
</Exec>

So VS is trying to start mkl_link_tool in the directory "$(MKLProductDir)\mkl\tools". The problem is that there is no such file in the directory. mkl_link_tool is located in the bin\ directory. Making a copy of the executable in the tool\ solves the problem.

↧

Results of LAPACKE_dgesvd

March 31, 2020, 2:51 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL 2020 update contains upgrade to LAPACK 3.8?

≪ Previous: MKL 2020.1, VS2019 linking bug

Hi MKL gurus,

I have in my code a call to LAPACKE_dgesvd function. This code is covered by autotest. Upon compiler migration we decided to upgrade MKL too from 11.3.4 to 2019.0.5.

And tests became red. After deep investigation I found that this function is not more returning the same U & V matrices.

I extracted the code and make it run in a separate env/project and same observation. the observation is the first column of U and first row of V have opposite sign

Could you please tell my what I'm doing wrong there ? or how should I use the new version to have the old results ?

I attached a simple project allowing to easily reproduce the issue. here is the code if you are not using visual studio

// MKL.cpp : This file contains the 'main' function. Program execution begins and ends there.
//

#include <iostream>
#include <algorithm>
#include <mkl.h>

int main()
{
    const int rows(3), cols(3);
	double covarMatrix[rows*cols] = { 0.9992441421012894, -0.6088405718211041, -0.4935146797825398,
                            -0.6088405718211041, 0.9992441421012869, -0.3357678733652218, 
                            -0.4935146797825398, -0.3357678733652218, 0.9992441421012761};
    double U[rows*rows] = { -1,-1,-1,
                    -1,-1,-1,
                    -1,-1,-1 };
	double V[cols*cols] = { -1,-1,-1,
					-1,-1,-1,
					-1,-1,-1 };
    double superb[std::min(rows, cols) - 1];
    double eigenValues[std::max(rows, cols)];

	MKL_INT info = LAPACKE_dgesvd(LAPACK_ROW_MAJOR, 'A', 'A',
		rows, cols, covarMatrix, cols, eigenValues, U, rows, V, cols, superb);

	if (info > 0)
        std::cout << "not converged!\n";

    std::cout << "U\n";
    for (int row(0); row < rows; ++row)
    {
        for (int col(0); col < rows; ++col)
            std::cout << U[row * rows + col] << "";
        std::cout << std::endl;
    }

	std::cout << "V\n";
	for (int row(0); row < cols; ++row)
	{
		for (int col(0); col < cols; ++col)
			std::cout << V[row * rows + col] << "";
		std::cout << std::endl;
	}
       
    std::cout << "Converged!\n";
}

Here is more numerical explanations :

A = 0.9992441421012894, -0.6088405718211041, -0.4935146797825398,
-0.6088405718211041, 0.9992441421012869, -0.3357678733652218,
-0.4935146797825398, -0.3357678733652218, 0.9992441421012761

results on :

11.3.4 2019.0.5 & 2020.1.216

-0.765774 -0.13397 0.629 0.765774 -0.13397 0.629
0.575268 -0.579935 0.576838 -0.575268 -0.579935 0.576838
0.2875 0.803572 0.521168 -0.2875 0.803572 0.521168

-0.765774 0.575268 0.2875   0.765774   -0.575268 -0.2875
-0.13397 -0.579935 0.803572   -0.13397 -0.579935 0.803572
0.629 0.576838 0.521168   0.629 0.576838 0.521168

I tested using scipy and the result is identical as on 11.3.4 version.

from scipy import linalg
from numpy import array

A = array([[0.9992441421012894, -0.6088405718211041, -0.4935146797825398], [-0.6088405718211041, 0.9992441421012869, -0.3357678733652218], [-0.4935146797825398, -0.3357678733652218, 0.9992441421012761]])
print(A)
u,s,vt,info = linalg.lapack.dgesvd(A)
print(u)
print(s)
print(vt)
print(info)

Thanks for your help and best regards

Mokhtar

Attachment	Size
Download SimpleProjects.zip	4.98 KB

↧

MKL 2020 update contains upgrade to LAPACK 3.8?

April 1, 2020, 10:55 am

Latest and popular articles on Intel Technologies

≫ Next: Create a random matrix with MKL library

≪ Previous: Results of LAPACKE_dgesvd

I notice the version of LAPACK now is 3.8.0. MKL 2020 shows 3.7.0. I cannot find a release note about it.

Obviously I may have make a mistake in my code. But just want to clarify the situation.

↧

Create a random matrix with MKL library

April 1, 2020, 6:26 pm

Latest and popular articles on Intel Technologies

≫ Next: Extract internal data from sparce_matrix_t

≪ Previous: MKL 2020 update contains upgrade to LAPACK 3.8?

When I use vdRngUniform() routine to create a random matrix, the maximum size matrix I can use only 40000 x 40000. Do we have any other routine from MKL to create with the bigger size? I can create by the normal way on C but it affects performance. So please guide me with other routines from MKL

Thanks.

↧

Extract internal data from sparce_matrix_t

April 5, 2020, 4:21 am

Latest and popular articles on Intel Technologies

≫ Next: 2D Convolution methode

≪ Previous: Create a random matrix with MKL library

Hello

I am currently using Intel MKL 2019. Because some of the routines in Sparce BLAS are currently deprecated, i am using corresponding functions from Inspector-Executor Sparce BLAS.
Theese routines make use of data in sparce_matrix_t format. I found a routine mkl_sparce_?_create_csr to construct such a structure. But due to the format of my task I need to recieve in the result the three (optionally -- four) vectors of the CSR matrix separately. Hence, i need a procedure reverse to create_csr, extracting internal data from matrix handle sparce_matrix_t.
Does such procedure exist, and where can i find it?

Thanks in advance.

↧

2D Convolution methode

April 6, 2020, 7:23 pm

Latest and popular articles on Intel Technologies

≫ Next: intel mkl parallel in visual studio

≪ Previous: Extract internal data from sparce_matrix_t

Hi,

Matlab supports three methods when using conv2. As you can see from the reference site below, Full, Same, and Valid methods are supported.

https://johnloomis.org/ece563/notes/filter/conv/convolution.html

Is it possible to add an option like this using vsldConvExec? Or if there is any other way, please advise.

↧

intel mkl parallel in visual studio

April 8, 2020, 8:08 am

Latest and popular articles on Intel Technologies

≫ Next: dtrnlspbc_init fails with TR_INVALID_OPTION

≪ Previous: 2D Convolution methode

Hi,

I have developed a small app in visual studio with intel mkl set to parallel and openmp language support to yes. it works correctly.

Now i want to package the software. But .exe file does'nt execute, it gives exception. I have copied mkl_core.dll , mkl_intel_thread.dll , libiomp5md.dll to the folder which contains .exe as it was showing that these files were missing. I think it doesn't link to mkl.h or openmp.

Can you please help me with the procedure how i can make .exe execute independently .

Thank you in advance.

↧

dtrnlspbc_init fails with TR_INVALID_OPTION

April 8, 2020, 9:31 am

Latest and popular articles on Intel Technologies

≫ Next: Access violation error when running 64bit application linking with MKL Pardiso

≪ Previous: intel mkl parallel in visual studio

It appears that the trust region optimizer can only handle cases when number of function arguments is equal to number of function values (square Jacobian).

In particular, in the code below (this is a slightly modified fragment of optimization example from the MKL library package), the initialization function dtrnlspbc_init fails (returns TR_INVALID_OPTION(1502)) whenever m != n. In particular, if n = 3, m = 1 (a scalar function of 3 variables). Note, that this initialization function does not know anything about objective function, it succeeds whenever m = n, like m = 3, n =3 or m = 5, n = 5. It fails if n != n.

Is this expected behavior?

If it is not, what am I doing wrong?

Thanks.

int main()
{
   /* user’s objective function */
   extern void extended_powell(MKL_INT *, MKL_INT *, double *, double *, void *);
   /* n - number of function variables
   m - dimension of function value */
   MKL_INT n = 3, m = 1;
   /* precisions for stop-criteria (see manual for more details) */
   double eps[6];
   /* precision of the Jacobian matrix calculation */
   double jac_eps;
   /* solution vector. contains values x for f(x) */
   double *x = NULL;
   /* iter1 - maximum number of iterations
   iter2 - maximum number of iterations of calculation of trial-step */
   MKL_INT iter1 = 1000, iter2 = 100;
   /* initial step bound */
   double rs = 0.0;
   /* reverse communication interface parameter */
   MKL_INT RCI_Request;
   /* controls of rci cycle */
   MKL_INT successful;
   /* function (f(x)) value vector */
   double *fvec = NULL;
   /* jacobi matrix */
   double *fjac = NULL;
   /* lower and upper bounds */
   double *LW = NULL, *UP = NULL;
   /* number of iterations */
   MKL_INT iter;
   /* number of stop-criterion */
   MKL_INT st_cr;
   /* initial and final residuals */
   double r1, r2;
   /* TR solver handle */
   _TRNSPBC_HANDLE_t handle;
   /* cycle’s counter */
   MKL_INT i;
   /* results of input parameter checking */
   MKL_INT info[6];
   /* memory allocation flags */
   MKL_INT mem_error, error;

   /*Additional users data */
   u_data m_data;
   m_data.a = 1;
   m_data.sum = 0;

   error = 0;
   /* memory allocation */
   mem_error = 1;
   x = (double *)mkl_malloc(sizeof(double) * n, 64);
if (x == NULL) goto end;
   fvec = (double *)mkl_malloc(sizeof(double) * m, 64);
if (fvec == NULL) goto end;
   fjac = (double *)mkl_malloc(sizeof(double) * m * n, 64);
if (fjac == NULL) goto end;
   LW = (double *)mkl_malloc(sizeof(double) * n, 64);
if (LW == NULL) goto end;
   UP = (double *)mkl_malloc(sizeof(double) * n, 64);
if (UP == NULL) goto end;
   /* memory allocated correctly */
   mem_error = 0;
   /* set precisions for stop-criteria */
   for (i = 0; i < 6; i++)
   {
       eps[i] = 0.00001;
   }
   /* set precision of the Jacobian matrix calculation */
   jac_eps = 0.00000001;
   /* set the initial guess */
for (i = 0; i < n; i++)
       x[i] = 0;
   /* set initial values */
   for (i = 0; i < m; i++)
       fvec[i] = 0.0;
   for (i = 0; i < m * n; i++)
       fjac[i] = 0.0;
   /* set bounds */
   for (i = 0; i < n; i++)
   {
       LW[i] = -1.;
       UP[i] = 1.0;
   }

   /* initialize solver (allocate memory, set initial values)
   handle in/out: TR solver handle
   n in: number of function variables
   m in: dimension of function value
   x in: solution vector. contains values x for f(x)
   LW in: lower bound
   UP in: upper bound
   eps in: precisions for stop-criteria
   iter1 in: maximum number of iterations
   iter2 in: maximum number of iterations of calculation of trial-step
   rs in: initial step bound */
   MKL_INT st = dtrnlspbc_init(&handle, &n, &m, x, LW, UP, eps, &iter1, &iter2, &rs);

↧

Access violation error when running 64bit application linking with MKL Pardiso

April 9, 2020, 12:08 am

Latest and popular articles on Intel Technologies

≫ Next: About COO with duplicate entries, and MKL_sparse_export_csr

≪ Previous: dtrnlspbc_init fails with TR_INVALID_OPTION

Hello everyone!

I am a PhD student in Computational Mechanic, and now I am using MKL Pardiso to solve large unsymmetric sparse matrix in my FEM codes. When I want to build a 64bit version application, a fatal error comes out "forrt1: severe(157) Program Exception-Access Violation".

The compiler and library I use are Intel Fortran Compiler and MKL in Intel Composer XE2013, and the IDE is Microsoft Visual Studio 2012.

I use Configuration Manager to change the platform 32bit or 64bit. I link MKL with my code automatically through selecting "Parallel" in Project properties(Fortran->Libraries->Use Intel Math Kernel Library: Parallel).

I directly use the pardiso_symm_f90.f90 file in MKL examples. The confusion I encounter is that when I use original data, like a, ia, ja, b, both 32bit and 64bit application can successfully executed. However, when I use my own data (I input a, ia, ja, b by reading disk files), 32bit application runs successfully with no error. But when I built 64bit application, it gives me the fatal error-Access Violation.

The error window captured picture, the code and relevant data are attached.

Actually, I have gone through almost all the relevant topics about this problem, unfortunately I don't find the solution!

Can anyone help me with this problem? Thanks very much in advance.

Regards,

Eric

Thanks again!

Attachment	Size
Download Error_window.jpg	106.34 KB
Download a.txt	30 MB
Download b.txt	792.19 KB
Download ia.txt	426.58 KB
Download Pardiso_test.f90	5.1 KB
Download ja.txt	16.15 MB

↧

About COO with duplicate entries, and MKL_sparse_export_csr

April 13, 2020, 1:42 am

Latest and popular articles on Intel Technologies

≫ Next: mkl error in p?geevx - fortran

≪ Previous: Access violation error when running 64bit application linking with MKL Pardiso

Hello everyone,

Because I am working FEM, global stiffness matrix is sparse, which can be easily assembled in a COO format. But it's noted that this formed COO matrix have many unsorted duplicate entries (there are many values with same row and col value) which need to be sorted and consolidated or summed. In many other platforms, like matlab and python, sparse function can automatically sort and sum these duplicates, producing correct final COO format matrix.

But in MKL for Fortran users, I use mkl_?csrcoo, and it doens't consolidate these duplicates even though it produce sorted CSR matrix. Now I am using Matrix Manipulation Rountines in IE Sparse BLAS to do these things. I am not sure if these new rountines can operate COO with duplicate entries.

In addition, the routine mkl_spares_?_export_csr always give out wrong results. I also don't how to allocate array 'col_indx' and 'values', because I even don't know the length of these arrays if the duplicate entries have been consolidated.

My computer environment is VS 2015 community and Intel composer XE2018 update1.

Thanks in advance.

Eric

↧

mkl error in p?geevx - fortran

April 10, 2020, 6:52 am

Latest and popular articles on Intel Technologies

≫ Next: MKL matmul with avx 512 shows bad performance on matrix with certain input size

≪ Previous: About COO with duplicate entries, and MKL_sparse_export_csr

I'm trying the newly introduced pcgeevx (complex non-hermitian problem - single precision), and I obtain "Intel MKL ERROR: Parameter 14 was incorrect on entry to PCGEEVX." I do not understand how come (the 14th should be an integer and a reference an integer).

Here is (in my opinion) the concerned part

complex :: eigenval(n)
complex :: eigenvec(lda,ldb), matrix(lda,ldb)
integer :: desca(9)
integer :: lwork, lrwork, liwork, idum
real :: dum
real, allocatable :: rwork(:)
complex, allocatable :: work(:)
complex :: cdum
integer, allocatable :: iwork(:)

lwork = -1 ! this is just the first call to obtain the minimum work size requirement
allocate(work(1))
allocate(rwork(n))
allocate(iwork(n))
call pcgeevx('B', 'N', 'V', 'N', n, matrix, desca, eigenval, cdum, idum, eigenvec, desca, &
idum, idum, rwork, dum, rwork, dum, work, lwork, info)

I would like to stress that the if I use the hermitian equivalent pcheevx or the non parallel version cgeev everything works like a charm.

Is it possible that I didn't get right the arguments in pcgeevx?

thanks in advance

↧

MKL matmul with avx 512 shows bad performance on matrix with certain input size

April 11, 2020, 11:29 pm

Latest and popular articles on Intel Technologies

≫ Next: Sparse Matrix Matrix Multiplication

≪ Previous: mkl error in p?geevx - fortran

Description: For Intel-MKL compiled with AVX512 support, matmul performance will be bad for certain matrix size. For example, let C = np.matmul(A, B), where A.shape = (m, k), B.shape = (k, n). If m< 192 and n is multiple of 1024, the performance is not as good as expected. For example, on my machine which has CPU "Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz", if A.size = (191, 20000), B.size = (20000, 1024), np.matmul(A, B) will use 120 ms (export OMP_NUM_THREADS =1), however, if A.size = (191, 20000), B.size = (20000, 1023 or 1025). np.matmul(A, B) will us 80 ms. On the other hand, if A.size = (192, 20000), B.size = (20000, 1024), np.matmul will use 75 ms. I did many experiments, and find that if m < 192 and n is 1024, 2048, 3072 ..., the performance will be bad, the number k seems not relevant. The above test is done using numpy with MKL backend installed by Anaconda, the intel-tensorflow shows the same result.

Operating system and version : CentOS Linux release 7.4.1708

Library version: Intel Optimized tensorflow 1.15.0 installed with "pip install intel-tensorflow==1.15.0", and numpy 1.18.1 shipped with Anaconda

Compiler version: gcc 4.8.5

Steps to reproduce the error (include makefiles, command lines, small test cases, and build instructions)

import numpy as np
import time
a = np.random.random((191,20000)).astype(np.float32)
b = np.random.random((20000,1024)).astype(np.float32)
for i in range(20):
    time1 = time.time()
    c = np.matmul(a,b)
    time2 = time.time()
    print(time2 - time1)

Working compiler, tool, or library version, and accelerator driver version (for regressions)

↧