I have many small sparse matrices and I want to multiply them together. Let's assume I can have them as small as desired. My main question is how I can do it efficiently with MKL.

I am aware of "Improved Small Matrix Performance Using Just-in-Time (JIT)", but it is for GEMM, so for dense matrices.

I am also aware of the Intel's open-source library LIBXSMM, but it only works when at least one of the matrices is dense. In my case, it would not be efficient to convert my sparse matrices to dense and then use one of these two methods/libraries.

I have attached my current version at the end. The input matrices are in my customized CSC format, which I believe the class member names are self-explanatory. I removed some lines to have it short and I don't have error-checking for the purpose of performance.

These are my questions and I would appreciate it if anyone could answer any of them:

1- The row and col_scan arrays in my code are in unsigned int, so I am casting them to int. I am aware of "narrowing the range" issue, but is there any problem with the way of passing my data to mkl_sparse_d_create_csc? Or is there a way to keep the range of unsigned int?

2- Is there a way to allocate memory for C and reuse it for all the calls to this function?

3- Is there a better way to rewrite this code? Or use another function?

4- Any suggestion for using another library suited for this case?

void MyClass::MKLSPMM(CSCMat &A, CSCMat &B, std::vector<cooEntry> &C, MPI_Comm comm){

        sparse_matrix_t Amkl = nullptr;
        mkl_sparse_d_create_csc(&Amkl, SPARSE_INDEX_BASE_ZERO, A.row_sz, A.col_sz, (int*)A.col_scan, (int*)(A.col_scan+1), (int*)A.r, A.v);

        sparse_matrix_t Bmkl = nullptr;
        mkl_sparse_d_create_csc(&Bmkl, SPARSE_INDEX_BASE_ZERO, B.row_sz, B.col_sz, (int*)B.col_scan, (int*)(B.col_scan+1), (int*)B.r, B.v);

        // Compute C = A * B
        sparse_matrix_t Cmkl = nullptr;
        mkl_sparse_spmm( SPARSE_OPERATION_NON_TRANSPOSE, Amkl, Bmkl, &Cmk );

    // the alternative version with sp2m
//        struct matrix_descr descr;
//        descr.type = SPARSE_MATRIX_TYPE_GENERAL;
//        mkl_sparse_sp2m(SPARSE_OPERATION_NON_TRANSPOSE, descr, Amkl,
//                        SPARSE_OPERATION_NON_TRANSPOSE, descr, Bmkl,
//                        SPARSE_STAGE_FULL_MULT, &Cmkl);

        mkl_sparse_d_export_csc( Cmkl, &indexing, &rows, &cols, &pointerB_C, &pointerE_C, &rows_C, &values_C );

        MKL_INT i, j, ii = 0;
        for (j = 0; j < B.col_sz; ++j) {
            for (i = pointerB_C[j]; i < pointerE_C[j]; ++i) {
                C.emplace_back(rows_C[ii], j, values_C[ii]);
                ++ii;
            }
        }

        mkl_sparse_destroy(Cmkl);
        mkl_sparse_destroy(Bmkl);
        mkl_sparse_destroy(Amkl);
}

TCE Open Date:

Thursday, March 5, 2020 - 10:11

↧

Duplicate interface blocks in Lapack95

March 5, 2020, 7:31 pm

Latest and popular articles on Intel Technologies

≫ Next: Sparse BLAS mkl_sparse_sypr valgrind memcheck error

≪ Previous: Optimize Many Small SPARSE Matrix-Matrix Multiplications

The file lapack_interfaces.f90 in the ...\MKL\interfaces\lapack95\sources directory contains duplicate interface blocks for the following 62 routines:

F77_DTSVB
F77_DTTRSB
F77_GBRFS
F77_GBSV
F77_GBSVX
F77_GBTRS
F77_GELS
F77_GELSD
F77_GELSS
F77_GELSY
F77_GERFS
F77_GESV
F77_GESVX
F77_GETRS
F77_GTRFS
F77_GTSV
F77_GTSVX
F77_GTTRS
F77_HERFS
F77_HESV
F77_HESVX
F77_HESV_ROOK
F77_HETRS
F77_HETRS2
F77_HETRS_ROOK
F77_HPRFS
F77_HPSV
F77_HPSVX
F77_HPTRS
F77_PBRFS
F77_PBSV
F77_PBSVX
F77_PBTRS
F77_PORFS
F77_POSV
F77_POSVX
F77_POTRS
F77_PPRFS
F77_PPSV
F77_PPSVX
F77_PPTRS
F77_PTRFS
F77_PTSV
F77_PTSVX
F77_PTTRS
F77_SPRFS
F77_SPSV
F77_SPSVX
F77_SPTRS
F77_SYRFS
F77_SYSV
F77_SYSVX
F77_SYSV_ROOK
F77_SYTRS
F77_SYTRS2
F77_SYTRS_ROOK
F77_TBRFS
F77_TBTRS
F77_TPRFS
F77_TPTRS
F77_TRRFS
F77_TRTRS

For example:

grep -in F77_DTSVB lapack_interfaces.f90
4493:INTERFACE F77_DTSVB
4538:END INTERFACE F77_DTSVB
17186:INTERFACE F77_DTSVB
17231:END INTERFACE F77_DTSVB

Is there a purpose for this duplication? Thanks.

TCE Open Date:

Thursday, March 5, 2020 - 19:24

↧

Sparse BLAS mkl_sparse_sypr valgrind memcheck error

March 5, 2020, 10:45 pm

Latest and popular articles on Intel Technologies

≫ Next: How to avoid out of memory problem about the mkl syevd?

≪ Previous: Duplicate interface blocks in Lapack95

Hello,

I ran the attached code and receive:

➜ valgrind --tool=memcheck --leak-check=full --track-origins=yes ./lukeSparseBLAStest > temp.txt
==22146== Memcheck, a memory error detector
==22146== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22146== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==22146== Command: ./lukeSparseBLAStest
==22146==
==22146== Conditional jump or move depends on uninitialised value(s)
==22146== at 0xEBB8501: mkl_sparse_sypr_i4_avx2 (in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==22146== by 0x1096F8: main (lukeSparseBLAStest.cpp:95)
==22146== Uninitialised value was created by a heap allocation
==22146== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22146== by 0x10927B: main (lukeSparseBLAStest.cpp:58)
==22146==
==22146==
==22146== HEAP SUMMARY:
==22146== in use at exit: 3,238 bytes in 10 blocks
==22146== total heap usage: 70 allocs, 60 frees, 306,317 bytes allocated
==22146==
==22146== LEAK SUMMARY:
==22146== definitely lost: 0 bytes in 0 blocks
==22146== indirectly lost: 0 bytes in 0 blocks
==22146== possibly lost: 0 bytes in 0 blocks
==22146== still reachable: 3,238 bytes in 10 blocks
==22146== suppressed: 0 bytes in 0 blocks
==22146== Reachable blocks (those to which a pointer was found) are not shown.
==22146== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==22146==
==22146== For counts of detected and suppressed errors, rerun with: -v
==22146== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

This is line 95

mklinfo = mkl_sparse_sypr(SPARSE_OPERATION_NON_TRANSPOSE,EHandle,symmHandle,spsyprdescr,spsyprResultHandlePointer,SPARSE_STAGE_FULL_MULT);

I know the issue has to do with the variable spsyprResultHandlePointer because if I replace that with the commented out line 96 I get

Uninitialised value was created by a stack allocation

So it has to be this variable. This variable is uninitialized, but I expect it should be, as it only gathers the output from mkl_sparse_sypr.

I am using /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/include

My compilation line:

g++ -g lukeSparseBLAStest.cpp -I /opt/intel/compilers_and_libraries/linux/mkl/include -L/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -o lukeSparseBLAStest

I was able to reproduce this on two different linux machines, so I don't think specific details of these are relevant. If you can't reproduce this on your machine then I can provide more specific information regarding this.

My question is why am I getting this error? Have I done something wrong in the code? Or is there something wrong with the intel sparse BLAS? Or is it spurious valgrind output?

Regards, Luke Mazur

Attachment	Size
Download lukeSparseBLAStest.cpp	5.72 KB

TCE Open Date:

Thursday, March 5, 2020 - 14:45

↧

How to avoid out of memory problem about the mkl syevd?

March 6, 2020, 1:51 am

Latest and popular articles on Intel Technologies

≫ Next: INTEL MKL

≪ Previous: Sparse BLAS mkl_sparse_sypr valgrind memcheck error

When I use the syevd function in mkl of fortran to calculate the eigenvalues and eigenvectors in my linux system, there are some troubles. Sometimes, the system will send message "Intel MKL INTERNAL ERROR: Insufficient workspace available in function SYEVD" or rerurn info=-1000, then the eigenvalues are all zeros. Those show that the computer is running out of memory. But the memory of my linux system is 125G and there are no other processes that take up memory. My matirx is 20000*20000 and is it really due to the insufficient memory?

TCE Open Date:

Friday, March 6, 2020 - 01:35

↧

INTEL MKL

March 6, 2020, 6:50 am

Latest and popular articles on Intel Technologies

≫ Next: zgeev problem

≪ Previous: How to avoid out of memory problem about the mkl syevd?

I am new to intel . i have downloaded intel cluster parallel Xe studio-2019 update 4 i have few simple queries

1 how to find out the version of intel mkl libs installed. from command line or otherwise

2 what is MKL installed <MKL directory> and <MKL PATH>

3 why is the "export BLA_VENDOR=Intel10_64ilp" used ? what is i10 in intel ?

4 what is the appropriate command in my case fro Intel10_ilp.

thanks in anticipation

TCE Open Date:

Friday, March 6, 2020 - 06:01

↧

zgeev problem

March 7, 2020, 2:04 am

Latest and popular articles on Intel Technologies

≫ Next: Memory Leak using many times the cluster sparse solver

≪ Previous: INTEL MKL

Seems that zgeev routine could write past the end of the rwork array of the recommended size ( 2n ).

here is my code calling zgeev

using namespace std;
#define ComplexD std::complex<double>
bool zeigValVecA(ComplexD *Val, ComplexD *Vec, ComplexD *A, int *n, bool *brk)
{
	char left = 'N', right = 'V';
	int ok(0), LW(-1);

	vector<double> RWORK(size_t(*n)*2); // +1 fixes the access beyond the end of the array inside MKL
	ComplexD work;
	mkl_brk = brk;

	ZGEEV(&left,&right,n,A,n,Val,nullptr,n,Vec,n,&work,&LW,RWORK.data(),&ok);

	if (*brk) return *brk;

	LW=int(work.real());
	vector<ComplexD> WORK(LW);
	
	ZGEEV(&left,&right,n,A,n,Val,nullptr,n,Vec,n,WORK.data(),&LW,RWORK.data(),&ok);
	
	return ok!=0 || *brk;
}

This code resides in a dll statically linked with MKL and called from a main program written in Delphi. Dll is compiled with the latest Visual Studio 2019. Debugging the the main program I get the following debug output (presumably from VS C++ RTL)

Debug Output: HEAP[ModalCol.exe]: Process ModalCol.exe (13956)
Debug Output:
Heap block at 000000001525E490 modified at 000000001525F130 past requested size of c90

Offset is beyond the end of the vector storage (n=201). Adding +1 to the RWORK length fixes the problem. The smallest n where i got this behavior is 17.

Tests were made on Windows 10 1909 x64, Intel Core i9-9900K processor, INTEL_MKL_VERSION 20200000, code is 64bit.

TCE Open Date:

Saturday, March 7, 2020 - 01:31

↧

Memory Leak using many times the cluster sparse solver

March 10, 2020, 2:28 pm

Latest and popular articles on Intel Technologies

≫ Next: growth in size of mkl_core.dll

≪ Previous: zgeev problem

Hello, I'll add here the information on a support ticket I started last month to check if if the community has come up with this issue. We use the MKL parallel cluster solver, together with Intel MPI for our HPC software (called FDS). The software has to solve thousands of times a Poisson equation using the MKL cluster solver solve phase. We have noted the memory being used increases as the MKL cluster solver is used, eventually leading to a catastrophic out of memory error in MPI.

I isolated the repeated use of the MKL cluster solver on a single standalone program completely separate from our software, and still see the memory use increase.

Try following the instructions on the README file in this tarball, to compile the code and run the case to see if your memory use increases (takes a few hours of runtime). I have verified this is the case in two Linux clusters with Centos 6 and 7 and Intel parallel studio versions from 2018, 2019 and last 2020.

I would really appreciate any help on this.

Marcos

Attachment	Size
Download test_source.tar	20 KB

TCE Open Date:

Tuesday, March 10, 2020 - 14:27

↧

growth in size of mkl_core.dll

March 10, 2020, 8:14 pm

Latest and popular articles on Intel Technologies

≫ Next: how to install runtime on end user's system including MKL

≪ Previous: Memory Leak using many times the cluster sparse solver

The size of ia32 mkl_core.dll 11.0.5.1 is 10,166,168 bytes. (2013)

The size of ia32 mkl_core.dll 2020.0.0.1 is 23,073,664 bytes. (2020)

The increase for the 64 bit version is even more dramatic. What's the story on this? Why such a big increase?

TCE Open Date:

Tuesday, March 10, 2020 - 13:14

↧

how to install runtime on end user's system including MKL

March 12, 2020, 7:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Performance degrade by combine MPI with MKL

≪ Previous: growth in size of mkl_core.dll

I am finding that the runtime installer ww_ifort_redist_ia32_2020.0.166.msi does not install MKL DLL's. Is that correct or am missing something? Seems like it ought to install MKL and add the appropriate folders to the system path.

TCE Level:

Level 1

TCE Open Date:

Thursday, March 12, 2020 - 19:45

↧

Performance degrade by combine MPI with MKL

March 13, 2020, 7:55 am

Latest and popular articles on Intel Technologies

≫ Next: zero padding and window function of fft

≪ Previous: how to install runtime on end user's system including MKL

I am new to the field of MPI. I write my program by using Intel Math Kernel Library and I want to compute a matrix-matrix multiplication by blocks, which means that I split the large matrix X into many small matrixs along the column as the following. My matrix is large, so each time I only compute (N, M) x (M, N) where I can set M manually.

XX^Ty = X_1X_1^Ty + X_2X_2^Ty + ... + X_nX_n^Ty

I first set the number of total threads as 16 and M equals to 1024. Then I run my program directly as the following . I check my cpu state and I find that the cpu usage is 1600%, which is normal.

./MMNET_MPI --block 1024 --numThreads 16

However, I tried to run my program by using MPI as the following. Then I find that cpu usage is only 200-300%. Strangely, I change the block number to 64 and I can get a little performance improvement to cpu usage 1200%.

mpirun -n 1 --bind-to none ./MMNET_MPI --block 1024 --numThreads 16

I do not know what the problem is. It seems that mpirun does some default setting which has an impact on my program. The following is a part of my matrix multiplication code. The command `#pragma amp parallel for` aims to extract the small N by M matrix from compression format parallel. After that I use clubs_dgemv to compute the matrix-matrix multiplication.

void LMMCPU::multXXTTrace(double *out, const double *vec) const {

  double *snpBlock = ALIGN_ALLOCATE_DOUBLES(Npad * snpsPerBlock);
  double (*workTable)[4] = (double (*)[4]) ALIGN_ALLOCATE_DOUBLES(omp_get_max_threads() * 256 * sizeof(*workTable));

  // store the temp result
  double *temp1 = ALIGN_ALLOCATE_DOUBLES(snpsPerBlock);
  for (uint64 m0 = 0; m0 < M; m0 += snpsPerBlock) {
    uint64 snpsPerBLockCrop = std::min(M, m0 + snpsPerBlock) - m0;
#pragma omp parallel for
    for (uint64 mPlus = 0; mPlus < snpsPerBLockCrop; mPlus++) {
      uint64 m = m0 + mPlus;
      if (projMaskSnps[m])
        buildMaskedSnpCovCompVec(snpBlock + mPlus * Npad, m,
                                 workTable + (omp_get_thread_num() << 8));
      else
        memset(snpBlock + mPlus * Npad, 0, Npad * sizeof(snpBlock[0]));
    }

      // compute A=X^TV
      MKL_INT row = Npad;
      MKL_INT col = snpsPerBLockCrop;
      double alpha = 1.0;
      MKL_INT lda = Npad;
      MKL_INT incx = 1;
      double beta = 0.0;
      MKL_INT incy = 1;
      cblas_dgemv(CblasColMajor,
                  CblasTrans,
                  row,
                  col,
                  alpha,
                  snpBlock,
                  lda,
                  vec,
                  incx,
                  beta,
                  temp1,
                  incy);

      // compute XA
      double beta1 = 1.0;
      cblas_dgemv(CblasColMajor, CblasNoTrans, row, col, alpha, snpBlock, lda, temp1, incx, beta1, out,
                  incy);


  }
  ALIGN_FREE(snpBlock);
  ALIGN_FREE(workTable);
  ALIGN_FREE(temp1);
}

Actually, I have checked the following part can fully use the cpu resources. It seems that there are some problems with cblas_dgemv.

#pragma omp parallel for
    for (uint64 mPlus = 0; mPlus < snpsPerBLockCrop; mPlus++) {
      uint64 m = m0 + mPlus;
      if (projMaskSnps[m])
        buildMaskedSnpCovCompVec(snpBlock + mPlus * Npad, m,
                                 workTable + (omp_get_thread_num() << 8));
      else
        memset(snpBlock + mPlus * Npad, 0, Npad * sizeof(snpBlock[0]));
    }

My CPU information is as the following.

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              44
On-line CPU(s) list: 0-43
Thread(s) per core:  1
Core(s) per socket:  22
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz
Stepping:            4
CPU MHz:             1252.786
CPU max MHz:         2101.0000
CPU min MHz:         1000.0000
BogoMIPS:            4200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            30976K
NUMA node0 CPU(s):   0-21
NUMA node1 CPU(s):   22-43

TCE Level:

Level 1

TCE Open Date:

Friday, March 13, 2020 - 07:48

↧

zero padding and window function of fft

March 14, 2020, 8:18 am

Latest and popular articles on Intel Technologies

≫ Next: [Pardiso] high residual for symmetric indefinite system after schur complement

≪ Previous: Performance degrade by combine MPI with MKL

i'm trying to do match filter with mkl. Zero padding and window function are two common tricks. I used to creating a new pointer and writing a loop to do the two process. But I find it very inefficient. The time of zeros padding and adding window function is three times as long as FFT (multiple kernel). So I want to know is there any efficient ways to do zero padding and adding window function?

Thank you!

TCE Level:

Level 1

TCE Open Date:

Saturday, March 14, 2020 - 08:12

↧

[Pardiso] high residual for symmetric indefinite system after schur complement

March 15, 2020, 11:30 pm

Latest and popular articles on Intel Technologies

≫ Next: ECCN for Intel Performance Libraries 2020

≪ Previous: zero padding and window function of fft

I'm trying to use Pardiso to solve a Stokes problem using weak galerkin finite element method.

All matrix and code are attached and tested at MKL 2019.5.281, Visual Studio 2019, WIN10

The linear system is a saddle point system which can be solved in Full sysem or Schur complement system. Both linear system are symmetric indefinite.

//2nd order FEM with 384 tets
//System1. Full system, where u={u0,ut,v0,vt,w0,wt} and u0,v0,w0 is not shared by neighboring element
[K G * {u = {b_u
G' 0] p} b_p}
system size = 4080 x 4080

//System2. Schur complement , where u_bar={ut,vt,wt} as u0,v0,w0 is eliminated
[A B * {u_bar = {b_u
B' C] p} b_p}
system size = 2640 x 2640

left: Full matrix right: Schur complement matrix

Obviously, System2 can save a lot of memory for a large problme.

Q: I can use Pardiso to solve the System1 very well (residual = 1e-16).

But When I solve the System2, it returns the very high residual (1e-2 -> 1e4).

Any suggestions to solve this symmetric indefinite system?

I already tried the solution of setting following iparm but it is not helps

iparm[9] = 8;
iparm[10] = 1;
iparm[12] = 1;

Attachment	Size
Download MKL_test.cpp	8.25 KB
Download Matrix.zip	3.23 MB

TCE Level:

Level 1

TCE Open Date:

Sunday, March 15, 2020 - 23:28

↧

ECCN for Intel Performance Libraries 2020

March 17, 2020, 3:22 am

Latest and popular articles on Intel Technologies

≫ Next: Severe (157): program exception - access violation when using Fortran & MKL

≪ Previous: [Pardiso] high residual for symmetric indefinite system after schur complement

Hello

What are the ECCN numbers for the initial 2020 release of:

Intel MKL 2020

Intel TBB 2020

Intel IPP 2020

Intel DAAL 2020

Best regards,

Kalle Larsson

TCE Level:

Level 1

TCE Open Date:

Tuesday, March 17, 2020 - 03:19

↧

Severe (157): program exception - access violation when using Fortran & MKL

March 18, 2020, 1:17 pm

Latest and popular articles on Intel Technologies

≫ Next: Two different FPE with Data Fitting library

≪ Previous: ECCN for Intel Performance Libraries 2020

Hi everyone!

I tried installing and running a simple Fortran dot multiplication example to test something, and everything seemed to have gone well with regards to installing Visual Studio 2019 Community and Intel Parallel Studio 2020, and integrating them via the installer. Fortran code runs on the system but when I attempt to use MKL-functions (such as sdot in the code at the bottom of the post) it gives me the following error:

forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
mkl_avx2.dll 7A62D5CA Unknown Unknown Unknown
mkl_core.dll 784955AE Unknown Unknown Unknown
Console2.exe 004813E9 Unknown Unknown Unknown
Console2.exe 0048310F Unknown Unknown Unknown
Console2.exe 004860A3 Unknown Unknown Unknown
Console2.exe 00485F77 Unknown Unknown Unknown
Console2.exe 00485E1D Unknown Unknown Unknown
Console2.exe 00486108 Unknown Unknown Unknown
KERNEL32.DLL 75996359 Unknown Unknown Unknown
ntdll.dll 77B97B74 Unknown Unknown Unknown
ntdll.dll 77B97B44 Unknown Unknown Unknown

Intel MKL is activated under Project > Configuration Properties > Libraries as Sequential (/Qmkl:sequential) so I presume that it knows where the files are, and that I simply that the permission to access them on my own computer? Is there anything I can do to fix this?

Thank you for your time. :)

The code I'm trying follows if it is of any use to you.

program Console1

integer :: n,incA,incB
real :: resd,ress,i
real, dimension(10) :: A,B

resd=0
n = 10
do i = 1,n
A(i) = i
B(i) = i
end do


ress = sdot(n,A,incA,B,incB)

print*,ress

end program Console1

TCE Level:

Level 1

TCE Open Date:

Wednesday, March 18, 2020 - 13:08

↧

Two different FPE with Data Fitting library

March 21, 2020, 5:24 am

Latest and popular articles on Intel Technologies

≫ Next: 1D FFT along second axis of 4D array

≪ Previous: Severe (157): program exception - access violation when using Fortran & MKL

Hi there,

I've encountered two different FPE with the MKL Data Fitting library which have been driving me a little insane!

I previously mentioned the first FPE in another thread here (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...), and at the time I couldn't get a small reproducer to demonstrate the problem. This FPE occurs within the construction of the Akima spline when there are multiple, repeated y-values provided for a spline. It can be consistently repeated using the attached reproducer and compile/link lines.

The second FPE occurs within the construction of the Natural spline and seems to only occur on certain machines within our cluster. I'll provide specific kernel release and CPU info for one of the machines below, but there are multiple machines that repeat the FPE. This specific FPE seems to be temporarily resolved by enabled the MKL CNR compatibility mode. Again, the attached reproducer will consistently produce this FPE on specific machines.

I compile using GCC 7.3 and use MKL 2019u5. I'm compiling on a CentOS 7 box, and running on other CentOS 7 machines.

After sourcing the Intel PSXE script 'psxevars.sh', I compile and link the reproducer using the following commands:

g++ -m64 -I${MKLROOT}/include -c CubicSpline_MKL.cc
g++ -m64 -I${MKLROOT}/include -c test_MKLSpline.cc
g++ -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl CubicSpline_MKL.o test_MKLSpline.o -o test_MKLSpline

If there are any further questions, or you want some more specific info then I'm happy to help. It has taken me a while to get this reproducer, so I'm very keen to try and resolve these issues :)

Thanks,

Ewan

Details of a machine that exhibits the second FPE with the attached testcase:

uname -a:
- Linux haggis41 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
summarised /cat/proc/cpuinfo
- processor : 23
- vendor_id : GenuineIntel
- cpu family : 6
- model : 85
- model name : Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
- stepping : 4
- microcode : 0x200005e
- cpu MHz : 2301.000
- cache size : 16896 KB
- physical id : 1
- siblings : 12
- core id : 13
- cpu cores : 12
- apicid : 58
- initial apicid : 58
- fpu : yes
- fpu_exception : yes
- cpuid level : 22
- wp : yes
- flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_pt spec_ctrl ibpb_support tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
- bogomips : 4606.57
- clflush size : 64
- cache_alignment : 64
- address sizes : 46 bits physical, 48 bits virtual
- power management:

Attachment	Size
Download mklFPEReproducer.tar.gz	5.69 KB

TCE Level:

Level 1

TCE Open Date:

Saturday, March 21, 2020 - 05:22

↧

1D FFT along second axis of 4D array

March 23, 2020, 12:49 am

Latest and popular articles on Intel Technologies

≫ Next: DGESV gives different results on subsequent runs

≪ Previous: Two different FPE with Data Fitting library

Hi,

I am trying to compute the 1D fft along the second axis of a row-major array indexed as A[n*NK*NJ*NI+k*NJ*NI + j*NI + i].

Given the explanations here: https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/..., my understanding would be that i could do a 1D fft with the strides according to the first axis, and then loop over both n and k. is this indeed the correct approach?

Many thanks and with best regards

TCE Level:

Level 1

TCE Open Date:

Monday, March 23, 2020 - 00:45

↧

DGESV gives different results on subsequent runs

March 24, 2020, 3:00 am

Latest and popular articles on Intel Technologies

≫ Next: Help with MKL: Old versions are available but not new

≪ Previous: 1D FFT along second axis of 4D array

Hi,

I got a 181x181 system matrix AA and a 181x1 right hand side bb. In my test program (please see attachment) I solve this equation

system twice and I get two slightly different results. Are there any explanations for this ?

The test was carried out on Windows 10 and I used VS 2017 Version 19.13.26131.1 x86.

To compile and link please use:

cl /Fodgesv_test.obj /c dgesv_test.cpp /EHsc /IC:\<path to MKL headers>

link /nologo /OUT:dgesv_test.exe /LIBPATH:C:\<path to MKL libraries> mkl_intel_c.lib mkl_sequential.lib mkl_core.lib dgesv_test.obj

Then simply run dgesv_test.exe.

Any help is appreciated.

Regards

Attachment	Size
Download dgesv_test.zip	199.37 KB

TCE Level:

Level 1

TCE Open Date:

Tuesday, March 24, 2020 - 02:59

↧

Help with MKL: Old versions are available but not new

March 24, 2020, 6:15 am

Latest and popular articles on Intel Technologies

≫ Next: A CMake config (MKLConfig.cmake) for IntelMKL

≪ Previous: DGESV gives different results on subsequent runs

All,

I feel dumb but I'm not sure what's going on. I wanted to update the MKL that's on my macOS laptop. I know I've downloaded and installed it before because I have some old versions. However, when I go to the download page, I can only download MKL 2019 Update 1 or older. For any other version I get (No access unless renewed) and:

The support period for your license has expired. To download this product update, you will need to purchase renewal licenses to extend your support from your expiration date (11 Dec 2018) to the build date of this product update (04 Dec 2019). Note that support for new renewal licenses will begin on 11 Dec 2018.

Now, I'm pretty certain I've only ever had beta licenses for Intel Compilers and I've never had an issue before this. Any idea what I've done wrong?

I might also have *another* issue that maybe is related? A coworker was able to download the latest for me but when we tried to install it on my laptop, it failed because I had older versions in /opt/intel. But then... I had 2018 and 2019 in /opt/intel and they were happy living together. Why can't 2020 live as well? Could this weird license issue be blocking the install?

Thanks,

Matt

TCE Level:

Level 1

TCE Open Date:

Tuesday, March 24, 2020 - 06:11

↧

A CMake config (MKLConfig.cmake) for IntelMKL

March 25, 2020, 9:30 am

Latest and popular articles on Intel Technologies

≫ Next: Need 64-bit version of mkl_link_tool in the DevCloud

≪ Previous: Help with MKL: Old versions are available but not new

I recently proposed a new FindMKL module to CMake : https://gitlab.kitware.com/cmake/cmake/issues/20479 . The module utilizes modern CMake to provide a set of targets for easy consumption of Intel MKL from other CMake projects. However, CMake developers pointed out that the preferred way of consuming dependencies in CMake is for dependent packages to provide a MKLConfig.cmake file (similar to pkgconfig). I was wondering if there has been any discussion of developing, maintaining and distributing such file with future releases of Intel MKL?

TCE Level:

Level 1

TCE Open Date:

Wednesday, March 25, 2020 - 09:22

↧