Sparse Matrix Matrix Multiplication

April 12, 2020, 6:44 pm

Latest and popular articles on Intel Technologies

≪ Previous: MKL matmul with avx 512 shows bad performance on matrix with certain input size

I have a matrix A I want to perform the following operation in MKL,

C = A[l:h,:]*A^T

Where A is a sparse matrix and A[l:h,:] is the row l from h of A. I am currently computing The A^T and then multiplying it using spmmd. Note that C is a dense matrix. But this approach wastes memory because I need to compute A^Tseparately. Is there any specific function in MKL that I can use for this?

↧

call to ZHEGV failed

April 14, 2020, 8:29 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_cluster_sparse_solver.f90 missing

≪ Previous: Sparse Matrix Matrix Multiplication

Hello,

I am troubleshooting a warning from the VASP plane-wave electronic structure code. The application version is 5.4.4 built against Scalapack.

The warning is of the form

WARNING in EDDRMM: call to ZHEGV failed, returncode =   6  3     16

With the test case I have, the returncode can also be "8 4 16".

Operating system and version: CentOS 7.4
Library version: MKL 2019.5
Compiler version: Intel 2019.5 mpiifort
GNU Compiler Collection (GCC)* or Microsoft Visual Studio* version (if applicable): GCC 8.2.0 (underneath the Intel installation)
Steps to reproduce the error (include makefiles, command lines, small test cases, and build instructions): I can send our makefile.include and input deck under separate cover if required.
Working compiler, tool, or library version, and accelerator driver version (for regressions): The warning has been seen since Intel Parallel Studio 2018 and perhaps earlier.

MKL ldd links:

	libmkl_intel_lp64.so => /nopt/nrel/apps/compilers/intel/2019.5/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007fd8ee5d6000)
	libmkl_cdft_core.so => /nopt/nrel/apps/compilers/intel/2019.5/mkl/lib/intel64/libmkl_cdft_core.so (0x00007fd8ee3ae000)
	libmkl_scalapack_lp64.so => /nopt/nrel/apps/compilers/intel/2019.5/mkl/lib/intel64/libmkl_scalapack_lp64.so (0x00007fd8edaa5000)
	libmkl_blacs_intelmpi_lp64.so => /nopt/nrel/apps/compilers/intel/2019.5/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so (0x00007fd8ed863000)
	libmkl_sequential.so => /nopt/nrel/apps/compilers/intel/2019.5/mkl/lib/intel64/libmkl_sequential.so (0x00007fd8ec24a000)
	libmkl_core.so => /nopt/nrel/apps/compilers/intel/2019.5/mkl/lib/intel64/libmkl_core.so (0x00007fd8e7f18000)
	libiomp5.so => /nopt/nrel/apps/compilers/intel/2019.5/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64/libiomp5.so (0x00007fd8e7b23000)

↧

mkl_cluster_sparse_solver.f90 missing

April 15, 2020, 5:42 am

Latest and popular articles on Intel Technologies

≫ Next: about non-zeros distribution used by the mkl_sparse_?_mv function.

≪ Previous: call to ZHEGV failed

I'm running parallel_studio_xe_2020.0.088 on CentOS 7. I'm trying to compile the PARDISO example for complex unsymmetric matrices using the following:

make libintel64 mpi=intelmpi compiler=gnu interface=lp64 ompthreads=8 mpidir=/opt/intel/impi/2019.6.166/intel64/bin examples=cl_solver_complex_unsym

I get the following output

make ext=a  run
make[1]: Entering directory `/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/examples/cluster_sparse_solverf'
make[1]: *** No rule to make target `/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/include/mkl_cluster_sparse_solver.f90', needed by `_results/gnu_intelmpi_lp64_intel64_a/mkl_cluster_sparse_solver.o'.  Stop.
make[1]: Leaving directory `/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/examples/cluster_sparse_solverf'
make: *** [libintel64] Error 2

Indeed when I look in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/include/, the file mkl_cluster_sparse_solver.f90 is not there. I have an older 2018 version of the sute, and it's not there either.

What I'm I missing?

↧

about non-zeros distribution used by the mkl_sparse_?_mv function.

April 15, 2020, 6:31 am

Latest and popular articles on Intel Technologies

≫ Next: MKL - which dependencies to include in the distribution package?

≪ Previous: mkl_cluster_sparse_solver.f90 missing

Hi all,

I am using the sparse matrix-vector multiplication operation in the MKL library.

I started with a CSR representation (the classical three arrays of the CSR format) and use the mkl_sparse_d_create_csr() function to create a "sparse_matrix_t" handle. Then I ran the mkl_sparse_optimize () function using the handle, and finally the mkl_sparse_d_mv() function for the desired operation.

It works. So far so good. The answers I am getting are correct.

I am able to manipulate the number of threads used in the solution by setting the environmental variable "OMP_NUM_THREADS". This also work as expected.

My question is:

How the sparse matrix is distributed among the treads?

is the distribution based on a similar number of rows per thread?

is it based on a similar number of non-zeros per thread?

or something else?

One more question: Can the user manipulate the distribution?

Thanks

↧

MKL - which dependencies to include in the distribution package?

April 15, 2020, 10:35 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL - why beta?

≪ Previous: about non-zeros distribution used by the mkl_sparse_?_mv function.

Hi everyone,

Is there a systematic way to determine which libraries to include in an application's distribution package?

Up to now, I have been proceeding by trial and error, adding the libs when a user complains that they are missing on his system. For instance for win64, the end result is a package which includes

libiomp5md.dll
mkl_avx2.dll
mkl_core.dll
mkl_def.dll
mkl_intel_thread.dll
mkl_mc3.dll
mkl_sequential.dll

The app itself uses such functions as LAPACKE_dgetrs, _sgetrs, _dgetrf, _sgetrf, _dgesv, _dgels etc.

Sometimes a library seems to be missing for one user but not for another. This happens for instance on macOS.

Couldn't find relevant documentation about this, so any help will be greatly appreciated.

Thanks.

↧

MKL - why beta?

April 15, 2020, 11:06 pm

Latest and popular articles on Intel Technologies

≫ Next: cblas_gemm_s8s8s32 not support？

≪ Previous: MKL - which dependencies to include in the distribution package?

Hi everyone,

Until recently, I had been packaging my app with libraries dating back to 2018. These were wonderfully stable.

Recently however, due to a change of hardware, I installed mkl 2021.1-beta03 and distributed the app with these updated libraries. As a result, some users are now complaining of program crashes in the calls to the libraries.

So I have upgraded to beta05, repackaged the app, and I'm crossing my fingers...

The question is: why does this latest distribution have the attribute "beta" and does an LTS, i.e., stable version, exist?

Thanks.

↧

cblas_gemm_s8s8s32 not support？

April 16, 2020, 1:12 am

Latest and popular articles on Intel Technologies

≫ Next: random malloc error in d_commit_trig_transform

≪ Previous: MKL - why beta?

Dear sir，

I hava a question：

I read the oneDNN code， that seams cblas_gemm_s8s8s32（） is not implemented，just cblas_gemm_s8u8s32，why？

Because ISA of intel （AVX2？）has not special instructions that can execute multiplying or adding operation when the two vectors have the same data type (either s8/s8 or u8/u8)？

Thank you！

↧

random malloc error in d_commit_trig_transform

April 17, 2020, 1:40 pm

Latest and popular articles on Intel Technologies

≫ Next: Calling 'pbtrf' and 'pbtrs' directly from a C# .Net Core library

≪ Previous: cblas_gemm_s8s8s32 not support？

Greetings,

I'm experiencing a random malloc error in d_commit_trig_transform. It's persistent, but happens at different times during the code execution as this routine is called repeatedly. I'm currently testing on a Mac using the latest available Intel C++ compiler and MKL versions. The routine that contains the d_commit_trig_transform call is given below. The error occurs regardless of the number of cores used, but usually after a few thousand calls. Does anything look suspicious in the code below? Any advice would be appreciated.

void TwoDCylRZPotSolver::RHSVectorDST()
{
   /*
   This method performs the discrete sine transform of the first kind (DST-I) on
   rhsvector in preparation to solve the linear tridiagonal system. The transform
   is performed in chunk sizes of nz. Due to the manner in which the DST is
   calculated, an input array (a) of size nz+2 must be used with a[0]=a[nz+1]=0,
   and a[1 to nz]=data. A normalization factor of sqrt(2/(nz+1)) must be applied
   when copying the transformed data back into rhsvector.
   */

    double normfac=sqrt(2/double(nz+1));

#pragma omp parallel for
    for(int i=0; i<nrad; i++)
   {
        int error, ipar[128],n=nz+1,tt_type=0;
       double dpar[5*(nz+2)/2+2];
        DFTI_DESCRIPTOR_HANDLE handle = 0; //data structures used in transform

        double datatemp[nz+2];
        datatemp[0]=0;
        datatemp[nz+1]=0;

       d_init_trig_transform(&n,&tt_type,ipar,dpar,&error);
        d_commit_trig_transform(datatemp,&handle,ipar,dpar,&error);

        //copy data from rhsvector
       for(int j=0; j<nz; j++)
           datatemp[j+1]=rhsvector[i*nz+j];


       //perform transformation
       d_backward_trig_transform(datatemp,&handle,ipar,dpar,&error);

       //copy transformed data back to rhsvector
       for(int j=0; j<nz; j++)
           rhsvector[i*nz+j]=normfac*datatemp[j+1];

       free_trig_transform(&handle,ipar,&error);

       if(error != 0)
           cout<<"Error = "<<error<<" in free_trig_transform in method RHSVectorDST."<<endl;
   }
}

↧

Calling 'pbtrf' and 'pbtrs' directly from a C# .Net Core library

April 20, 2020, 9:23 am

Latest and popular articles on Intel Technologies

≫ Next: Extracting U from getrf

≪ Previous: random malloc error in d_commit_trig_transform

I'm currently porting a small Fortran FE solver to C#. The solver uses MKL, and I'm trying to get the best of both worlds by calling the MKL functions in question directly from C#.
According to Intels documentation, this can be done by using the DllImport statement in C# and calling the relevant function in mkl_rt.dll directly. This Intel tutorial gives a short description of how this can be done, and even provides some C# code examples.

The examples provided compile on my computer, What I want to do is basically exactly the same, only targeting the functions 'pbtrf' and 'pbtrs'. But it seems these functions are not exposed from mkl_rt.dll. Using Dependency Walker, I looked into mkl_rt.dll and found that only the F77-versions are available. So I tried setting up a function call using 'dpbtrf' and 'dpbtrs' instead. These require several more arguments than the F95-versions.

About the case:

My setup is in the attached .cs file (also shown in below image). Some input is hardcoded for testing purposes. The actual case is not provided, but it is a simple static FE problem with a stiffness matrix and a load vector. Mathematically, the stiffness matrix is a band matrix of size n x n. In the code, it is written in compact form as a matrix of size (nSuperDiagonals + 1) x n, that is 4 x n. Only one right hand side is used, i.e. the load vector of length n.

The call does not throw any error messages; I get info=1 in return (not 0) and the stiffness matrix is never factorized.

I know that this specific case will run when the whole shebang is coded in Fortran on the same computer.

.Any thoughts on what the issue could be?

Attachment	Size
Download LaPackCallers.zip	607 bytes

↧

Extracting U from getrf

April 21, 2020, 1:17 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL FEAST EIGEN SOLVER

≪ Previous: Calling 'pbtrf' and 'pbtrs' directly from a C# .Net Core library

I'm trying to use getrf to put a MxN matrix in RREF (get U, divide by leading entry) but either I'm misunderstanding the function description or something strange is going on. Now my understanding is that on return A contains both U and L, and IPIV contains the pivot indices (i.e. the index of the first non-zero entry in U on each row) to enable you to determine where L stops and U begins within A. But the values being return in IPIV often don't make any sense by this interpretation, with the returned index often referring to a point before or after the "actual" start of U, and in some cases they aren't even non-decreasing (i.e. [1,3,2,4,4]). I've confirmed through matlab that the U and L being returned in A are correct, but I can't make any sense of IPIV and without it I'm not able to automatically extract those two matrices from the returned A.

Am I missing something in what IPIV is, or does anyone have some advice on what might be going wrong?

↧

MKL FEAST EIGEN SOLVER

April 23, 2020, 12:50 am

Latest and popular articles on Intel Technologies

≫ Next: Strangeness with DZGEMV

≪ Previous: Extracting U from getrf

I try to use FEAST to solve sparse Matrix generalized eigen program. Before coding I hope to see some samples.

But I can not find the samples in MKL/examples/

can you give some samples.

↧

Strangeness with DZGEMV

April 23, 2020, 12:57 pm

Latest and popular articles on Intel Technologies

≫ Next: Fat and Narrow matrix multiplication with "cblas_cgemm"

≪ Previous: MKL FEAST EIGEN SOLVER

The following code produces different outputs for DZGEMV and GEMV. The code is built with default 64-bit integer and real numbers; The parallel version of the MKL library is used (using ILP64 interfaces).

The path to the module files is: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\intel64\ilp64

Using the link advisor, the libraries used are: mkl_blas95_ilp64.lib mkl_lapack95_ilp64.lib mkl_intel_ilp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib (library directory: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\lib\intel64_win).

GEMV and MATMUL produce the expected result. I can't explain what DZGEMV does here.

This is on a WIN 10 machine, with Intel Parallel Studio XE 2019 Update 3 Cluster Edition.

PROGRAM P
USE BLAS95
IMPLICIT NONE

COMPLEX :: PHI(2,1),V(2),W(1),ALPHA,BETA
INTEGER :: M,N,ONE

PHI = CMPLX(1.0)
V   = CMPLX(2.0)
ALPHA = CMPLX(1.0)
BETA = CMPLX(0.0)
W = CMPLX(0.0)

M   = 2
N   = 1
ONE = 1

CALL DZGEMV('C',M,N,ALPHA,PHI,M,V,ONE,BETA,W,ONE)
WRITE(*,*) W

CALL GEMV(PHI,V,W,ALPHA=ALPHA,BETA=BETA,TRANS='C')
WRITE(*,*) W

W = MATMUL(CONJG(TRANSPOSE(PHI)),V)
WRITE(*,*) W

END PROGRAM P

Output:

 (2.00000000000000,0.000000000000000E+000)
 (4.00000000000000,0.000000000000000E+000)
 (4.00000000000000,0.000000000000000E+000)
Press any key to continue . . .

↧

Fat and Narrow matrix multiplication with "cblas_cgemm"

April 25, 2020, 11:20 pm

Latest and popular articles on Intel Technologies

≫ Next: Sparse-Dense Matrix Multiplication

≪ Previous: Strangeness with DZGEMV

Hi,

I am working with a matrix multiplication of sizes A = 40 x 40 and B is 40 x 10k with MKL support functions "cblas_cgemm". It is taking a 30 milliseconds,

I have enabled mkl multithreading also, which I belive it is more.

I have read in internet that "MKL functions are optimized for generic matrix multiplications"..

Anybody agrees or disagrees with me.

Thanks in advance .

↧

Sparse-Dense Matrix Multiplication

April 28, 2020, 8:44 am

Latest and popular articles on Intel Technologies

≫ Next: Intel MKL Library for Linux (`libmkl_rt.so`) Missing `SONAME`

≪ Previous: Fat and Narrow matrix multiplication with "cblas_cgemm"

Hi,

I am working with the mkl_sparse_s_mm routine to perform:

C = A * B

where A is a sparse matrix and C,B are dense matrices. I would like to have some details about the algorithm for sparse-dense matrix multiplication implemented by mkl_sparse_s_mm, i.e., if it uses cache aware strategies, specific micro-kernel implementations to fully leverage CPU registers etc..

Just to provide an example, high performance dense matrix multiplication (cblas_gemm) is usually implemented following the block pack algorithm.

Thank you

Cosimo Rullli

↧

Intel MKL Library for Linux (`libmkl_rt.so`) Missing `SONAME`

April 29, 2020, 3:14 am

Latest and popular articles on Intel Technologies

≫ Next: Matrix Inversion and matrix-vector multiplication or solve linear equation for simulation

≪ Previous: Sparse-Dense Matrix Multiplication

There is an issue with Intel MKL RT Library files at Linux (`libmkl_rt.so`).

Their `soname` isn't defined.
Please have a look at https://github.com/JuliaSparse/Pardiso.jl/issues/69#issuecomment-620898554.

Any chance to fix it in the next update?

↧

Matrix Inversion and matrix-vector multiplication or solve linear equation for simulation

April 29, 2020, 12:48 am

Latest and popular articles on Intel Technologies

≫ Next: Which subroutine can achieve the same result as matlab's mldivide?

≪ Previous: Intel MKL Library for Linux (`libmkl_rt.so`) Missing `SONAME`

Hi,

I have seen multiple times that matrix inversion is not recommended when solving linear equations and everyone says to just use a solver but I may reduce execution time significantly by inverting instead (or will I?).

I have a simulation where either I will calculate the inverse of a sparse symmetric matrix at the beginning and for each time-step calculate the matrix-(new vector) multiplication to solve the system,

I could just use the original sparse matrix and solve the linear system at each time step even though the matrix doesn't change.

My matrix-vector has: n ~= 20,000 and simulation has approx 10^7 time steps. So what is the optimum method?

I found mkl_dcsrsm to solve the system unless someone has a better recommendation.

Using MKL 2016.4 on a cluster so I could request more CPUs but my code isn't parallelized.

Thanks for your time

↧

Which subroutine can achieve the same result as matlab's mldivide?

April 30, 2020, 2:30 am

Latest and popular articles on Intel Technologies

≫ Next: Cloudera CDH 5.16 MKL Parcel offline install

≪ Previous: Matrix Inversion and matrix-vector multiplication or solve linear equation for simulation

I know that DGETRF and DGETRI are for matrix inversion in large scale matrix.

However, it is not the same with Matlab's mldivide.

I want to know which subroutine can achieve the same result as matlab's mldivide.

Thanks for the help.

S. Kim

↧

Cloudera CDH 5.16 MKL Parcel offline install

April 30, 2020, 9:13 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso example tutorial.

≪ Previous: Which subroutine can achieve the same result as matlab's mldivide?

Good afternoon, Folks

Can anyone tell me from where I can download the MKL parcel for Cloudera CDH 5.16 for an offline installation?

This link: http://parcels.repos.intel.com/mkl/latest and it's version-specific variants seem only to be available for access by a Cloudera Manager

Our Cloudera cluster is on-prem and without access to the internet so an offline install is necessary

Kind Regards

Mark

↧

Pardiso example tutorial.

May 2, 2020, 2:56 am

Latest and popular articles on Intel Technologies

≫ Next: Parallel Direct Sparse Solver for Clusters and iparm[30]

≪ Previous: Cloudera CDH 5.16 MKL Parcel offline install

Hello !!

Can I get an example tutorial file for the pardiso subroutine?

Ultimately, I want to use the pardiso subroutine as a solver for structural analysis.

So I just want to get any example file for understanding how to use pardiso. (symmetric case and unsymmetric case)

Thanks all,

S. Kim

↧

Parallel Direct Sparse Solver for Clusters and iparm[30]

May 2, 2020, 11:55 pm

Latest and popular articles on Intel Technologies

≫ Next: Signature of LU factorisation and Armadillo

≪ Previous: Pardiso example tutorial.

Hi,

Does Parallel Direct Sparse Solver for Clusters support iparm[30] for Partial solve and computing selected components of the solution vectors?

It is mentioned in the reference manual , but it also says "iparm[30] - iparm[32] Reserved. Set to zero."

Non-MPI MKL PARDISO works for iparm[30] 1,2, and 3 but the cluster edition doesn't ? Is this a bug or is iparm[30] simply not supposed to work with the cluster edition?

best,

marius

↧