ATTENTION: Read if you cannot install from deb feed: The APT distribution key has expired today

September 28, 2019, 9:35 pm

Latest and popular articles on Intel Technologies

≫ Next: unable to find @rpath/libiomp5.dylib

≪ Previous: gpg keys failure prevent yum install of Intel libraries

The key listed on the instruction page

https://software.intel.com/en-us/articles/installing-intel-free-libs-and...

specifically, this file:

wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS...

is old, and has just expired. There will be a million failures; do not panic; I hope Intel will fix that very quickly.

# curl -Ss https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB | gpg -
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
pub   rsa2048 2016-09-28 [SC] [expired: 2019-09-27]  <<<=== THIS
      BF4385F91CA5FC005AB39E1C1A8497B11911E097
uid           "CN = Intel(R) Software Development Products", O=Intel Corporation

↧

unable to find @rpath/libiomp5.dylib

September 29, 2019, 2:59 pm

Latest and popular articles on Intel Technologies

≫ Next: Diagonalization of symmetric matrices - form of matrix?

≪ Previous: ATTENTION: Read if you cannot install from deb feed: The APT distribution key has expired today

I am new to MKL on Mac.

I am running code in Fortran as follows:

ifort -o name.x -fast -mkl program.f90

and I get the following error

ipo: warning #11012: unable to find @rpath/libiomp5.dylib

I tried looking online to find a solution but I could find anything straight forward.

The closest I got to a solution is to do the following wich is without '-fast' option

ifort -o name.x -mkl program.f90 -Wl,-rpath,${MKLROOT}/lib -Wl,-rpath,$MKLROOT/../compiler/lib/

This works but when I do the following I get the same error

ifort -o name.x -fast -mkl program.f90 -Wl,-rpath,${MKLROOT}/lib -Wl,-rpath,$MKLROOT/../compiler/lib/

ipo: warning #11012: unable to find @rpath/libiomp5.dylib

How can I add the path?

↧

Diagonalization of symmetric matrices - form of matrix?

October 1, 2019, 1:06 am

Latest and popular articles on Intel Technologies

≫ Next: Traceback MKL ERROR (Environmental variable?)

≪ Previous: unable to find @rpath/libiomp5.dylib

I would like to compute the eigenvalues of a symmetric matrix and wanted to use the LAPACKE_dsyev function from the MKL Library in C++ for that.

From the documentation https://software.intel.com/en-us/mkl-developer-reference-c-syev, I concluded that I would have to pass only the upper/lower triangular part of the matrix. It says about the argument that it "is an array containing either upper or lower triangular part of the symmetric matrix A".

However, it seems that actually one needs to pass the pointer to the full matrix to the routine. Say I want to diagonalize the following matrix:

[[-2,   0,     0.5, 0],
[0,     0.5, -2,     0.5],
[0.5, -2,    0.5, 0],
[0,     0.5, 0,    -1]],
which has eigenvalues [ 2.56, -2.22, -1.53, -0.81]

Then in the following code, only the first option gives the correct values.

#include <iostream>
#include"mkl_lapacke.h"
using namespace std;
int main(){
	MKL_INT N = 4;
	double matrix_ex_full[16] = {-2,0,0.5,0,0,0.5,-2,0.5,0.5, -2, 0.5, 0, 0,0.5,0,-1};
	double evals_full[4];
// Pass over the full matrix
	MKL_INT test1 = LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'N', 'U', N, matrix_ex_full,N, evals_full);
	cout << "success = "<<test1 << endl;
	for (MKL_INT i = 0;i<4;i++)
		cout << evals_full[i] << endl;
// Pass only the upper triagonal
	double matrix_ex_uppertri[10] = {-2, 0, 0.5, 0, 0.5, -2, 0.5, 0.5, 0, -1};
	double evals_uppertri[4];
	MKL_INT test2 = LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'N', 'U', N, matrix_ex_uppertri,N, evals_uppertri);
	cout << "success = "<<test2 << endl;
	for (MKL_INT i = 0;i<4;i++)
		cout << evals_uppertri[i] << endl;

}
//Compiled with g++ test.cpp -o main -m64 -I/share/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/include -L/share/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl

I guess I am missing something obvious here, but why, if the full matrix has to be given anyway, is it necessary at all to specify 'U' or 'L'? Or am I doing something wrong elsewhere?

Thanks for any help, and apologies if this question is trivial (which I have the feeling it must be), I am rather new to using MKL.

↧

Traceback MKL ERROR (Environmental variable?)

October 1, 2019, 1:58 pm

Latest and popular articles on Intel Technologies

≫ Next: cluster sparse solver returns error = -1

≪ Previous: Diagonalization of symmetric matrices - form of matrix?

Is there a MKL environmental variable that will make all errors fatal? I would like to use this to find the location of an error in a large code. (Currently the code completes correctly, the error does not matter.)

↧

cluster sparse solver returns error = -1

October 3, 2019, 8:06 am

Latest and popular articles on Intel Technologies

≫ Next: Incorrect band matrix storage description

≪ Previous: Traceback MKL ERROR (Environmental variable?)

I can't understand why I get the "input inconsistent" error when I run the code that I am attaching. I am using a toy example to check on it and that works for P = 4, where P is the number of MPI processes.
I set:
iparm[34] = 1;// indexing from 0
iparm[36] = 0; //csr format
iparm[39] = 2; //matrix distributed

in the csrA_rand.txt file, the first element is n, that is the number of rows and columns of the sparse matrix A. Process 0 reads n and broadcasts it to all other processes. The first n%P processes have (n / P) + 1 rows, whereas the remaining P - (n%P) ones have (n / P) rows each. In this case anyway, the matrix size n is 48, so all processes have the same number of rows (12). Since indexing starts from 0, all the ia's are such that ia[0] = 0 and ia[myn] = nA. All processes read their portion of a, ja and ia correctly, or at least this is what it seems to me.

*** Edit, found an error when generating ja

Attachment	Size
Download cluster_sparse_solver_ex.zip	4.74 KB

↧

Incorrect band matrix storage description

October 2, 2019, 5:27 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_sparse_sp2m: Conditional jump or move

≪ Previous: cluster sparse solver returns error = -1

In MKL C Manual on page 399 (PDF) row major layout formula looks like "row major layout: k(i, j) = (i - j)*ldab + kl + j - 1; 1 ≤i≤m, max(1, i - kl) ≤j≤ min(n, i + ku)". Let's take row i=1, j=i+ku, kl=ku=k (equal # of bands), ldab=kl + ku + 1 as suggested earlier and get k(i,j)=-k*(k+k+1)+k+1+k-1=k-2*k*k, which is negative for all k>0. Please adjust to make it correct. Thank you!

↧

mkl_sparse_sp2m: Conditional jump or move

October 6, 2019, 12:01 pm

Latest and popular articles on Intel Technologies

≫ Next: error on description page

≪ Previous: Incorrect band matrix storage description

struct matrix_descr descrA;
struct matrix_descr descrB;

descrA.type = SPARSE_MATRIX_TYPE_GENERAL;
descrB.type = SPARSE_MATRIX_TYPE_GENERAL;

std::cout << mkl_sparse_sp2m(SPARSE_OPERATION_TRANSPOSE, descrA, A, SPARSE_OPERATION_TRANSPOSE, descrB, B, SPARSE_STAGE_FULL_MULT, &C) << std::endl;

Assume there are two csr matrices A and B and one declared csr matrix C. Status 0 for the above code is returned (SPARSE_STATUS_SUCCESS). However, Valgrind indicates a "Conditional jump or move depends on uninizialised value" error. If I create the C matrix with dummy csr arrays I do not get the error, but then memory is directly lost. Is this a bug? I'm using mkl 2019 update 5.

↧

error on description page

October 6, 2019, 1:52 pm

Latest and popular articles on Intel Technologies

≫ Next: Where is the preconditioning of coefficient matrix -A in FGMRES

≪ Previous: mkl_sparse_sp2m: Conditional jump or move

There is a line of code on two-stage-algorithm-for-inspector-executor-sparse-blas-routines that seems to be incorrect

status = mkl_sparse_x_export_csr ( csrC, &indexing, &rows, &cols, &rows_start, &rows_end, &col_indx, &values);

MKL_INT nnz = rows_end[rows] - rows_start[0];

"rows_end[rows]" accesses uninitialized space. "rows_end[rows - 1]" addresses the last element.

↧

Where is the preconditioning of coefficient matrix -A in FGMRES

October 4, 2019, 1:38 am

Latest and popular articles on Intel Technologies

≫ Next: cannot find libimf.so

≪ Previous: error on description page

Hello, every one .

I want to implement the ILUT preconditioned FGMRES RCI to solve a large Poisson equation in a lab CFD code which based on non-uniform cartesian grids and standard 7-point discretization scheme . I read the MKL Developer Reference and example : dcsrilut_exampl2.f90 and understand the RCI mechanism , GMRES method and ILU well, but I am still confused about the process of the preconditioned FGMRES method.

According to my understanding , to use ILUT+FGMRES, the user should first generate a CSR matrix for both A(csrA) and precondition matrix B(csrL) .Next, MKL invokes the preconditioned version of FGMRES by setting ipar(11)=1 and then performs an additional matrix(csrL)-vector multiplication step (RCI_request =3) to precondition the rhs , which probably correspond : B.inv() * b in GMRES.

What confuses me is that where is the preconditioning step of coefficients A? Shouldn't there exists a B.inv() * A stepin the (RCI_request =3) (Not clear whether right or left precondition is adopted in MKL) ? Actually , I am expecting some code like :

call mkl_sparse_d_spmm(op, B.inv() , csrA , reconditioned_A_CSR)

following the preconditioning of vector TMP(ipar(22)) .

I also notice the RCI_request =1 performs a set of A*V_i operations with the input of csrA, not csrA and csrL. So, How the preconditioning of A is handled in FGMRES is not clear to me at all.

My guess is that MKL performs the precondition of A automatically after precondition of vector tmp(ipar(22)) ,and then overrides the original A matrix with the same variable name , so that explains why the RCI_request=1 part remain unchanged in precondition version. Can anyone confirm this or explains it for me? Thanks a lot !

Lin Yang

↧

cannot find libimf.so

October 5, 2019, 7:24 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL function cblas_sgemv gives different results each time

≪ Previous: Where is the preconditioning of coefficient matrix -A in FGMRES

Situation:

After installing Intel Composer XE 19 and setting environment variables, I cannot use my internet in Ubuntu 18.04 and printing system in OpenSUSE Leap 15.1. The error message said that they(cups and networking service...) cannot find libimf.so and other library to intel composer. But if I install Intel Composer XE 19 without sudo and root, only within user environment, the system work smoothly. I asked the same question in OpenSUSE forum and someone suggested that I shouldn't use LD_LIBRARY_PATH. And some articles says LD_LIBRARY_PATH is evil. Now I just avoid using LD_LIBRARY_PATH.

Question:

If I install Intel Composer XE 19 as administrator, how should I set environment variables? The installation manual says setting environment variables with "source /opt/intel/.../compilervars.sh intel64". But it causes a lot of troubles for me.

↧

MKL function cblas_sgemv gives different results each time

October 7, 2019, 4:28 am

Latest and popular articles on Intel Technologies

≫ Next: MKL's cblas_saxpy outputs incorrect results

≪ Previous: cannot find libimf.so

Hi, I have used cblas_sgemv but this function gives different results each time and I have checked the inputs which are always the same. Sometimes the result is correct (with 1e-6 L2 norm error compared to the correct result). Could someone tell me why I have this?

Thanks,

Cindy

↧

MKL's cblas_saxpy outputs incorrect results

October 7, 2019, 9:36 am

Latest and popular articles on Intel Technologies

≫ Next: pardiso_handle_store Segmentation fault

≪ Previous: MKL function cblas_sgemv gives different results each time

Hi,

I need to add two arrays in an efficient way, so I tried MKL's saxpy.

When I use the cblas_saxpy function on two dummy arrays with all values initialized to 1 and 2 respectively, I get totally wrong results. And I couldn't figure out what's wrong with my code.

(I omitted other includes)

#include "mkl.h"

#define SIZE 10000

int main()
{

        float* buf_x = (float*) malloc(SIZE * sizeof(float));
        float* buf_y = (float*) malloc(SIZE * sizeof(float));

        for (int i = 0; i<SIZE; i++){

            buf_x[i] = 1.f;

            buf_y[i] = 2.f;

        }

        cblas_saxpy ((MKL_INT)SIZE, 1, buf_x, (MKL_INT)1, buf_y, (MKL_INT)1);

        for (int i = 0; i<SIZE; i++)

            printf("%f, ", buf_y[i]);

        return 0;
}

I get as an output 204 instead of 3.

Can anybody shed a light on this ?

Thank you.

↧

pardiso_handle_store Segmentation fault

October 7, 2019, 10:17 am

Latest and popular articles on Intel Technologies

≫ Next: Permutation of a large sparse matrix

≪ Previous: MKL's cblas_saxpy outputs incorrect results

Hi,

I have been using `pardiso`, and everything works fine. I can successfully factorize a large sparse matrix and later solve systems using the factorization. Now I wish to save the factorization to a file, so I don't need to do the factorization every time I run the application. I believe `pardiso_handle_store` is the correct function to use, but I keep getting the `Segmentation fault (core dumped)` error.

The relevant code and the console output can be found in this gist: https://gist.github.com/hkalexling/f8e0a22d1a29569a4012f717de8c7798.

Any help would be appreciated!

Alex

↧

Permutation of a large sparse matrix

October 8, 2019, 11:48 am

Latest and popular articles on Intel Technologies

≫ Next: use of MKL spline functions strange behavior at second run time

≪ Previous: pardiso_handle_store Segmentation fault

Hi,

What is the fastest way of permuting a large sparse_matrix_t in csr or csc format?

I could either do manual permutations on the csr arrays or I could create a sparse permutation matrix and use the mkl_sparse_spmm method.

Either method seems to be not optimal since I don't benefit from parallelism on the former method and I have to create additional arrays for the Permutation matrix and create a new copy of the matrix on the latter method.

Also, I notice that there might be performance differences between column and row permutations depending on whether the matrix is in csr or csc.

Is there a better way to do it?

↧

use of MKL spline functions strange behavior at second run time

October 10, 2019, 1:02 am

Latest and popular articles on Intel Technologies

≫ Next: Serious memory leak problem of mkl_sparse_d_add subroutine

≪ Previous: Permutation of a large sparse matrix

Hi,

I have wrapped the code required to do an Akima spline interpolation in the attached source code. When I link my main application statically with the MKL, it runs fine. However, with dynamic linking (i.e. with the need of MKL runtime libs) I experienced a strange behavior. First of all I did not know which runtime libs I needed so when trying to call the dfdNewTask1D routine simply terminates my application; after several tries (running in console mode) I found that mkl_vml_avx2.dll and mkl_vml_p4.dll were required. I loaded manually these dlls and my code runs fine and I unload them after. However without exiting my app, if I run again the same calculation (after loading again the same set of dlls), the call to the dfdNewTask1D routine generates an exception an my application crashes (under debug, the call to dfdNewTask1D never returns because an exception is generated somewhere). The runtime dlls I load/unload at/after every run are:

libimalloc.dll
libmmd.dll
libifcoremd.dll
libifportmd.dll
libiomp5md.dll
msvcr100.dll
mkl_vml_avx2.dll
mkl_vml_p4.dll
mkl_core.dll
mkl_sequential.dll

I don't know if I need other dlls or what is going wrong.

Best regards,

Phil.

Attachment	Size
Download AkimaSpline.f90	3.32 KB

↧

Serious memory leak problem of mkl_sparse_d_add subroutine

October 12, 2019, 7:34 pm

Latest and popular articles on Intel Technologies

≫ Next: serious memory leak problem found within mkl_sparse_d_add

≪ Previous: use of MKL spline functions strange behavior at second run time

Hi,

I'm currently programming with the new sparse interface, and experienced serious memory leak problem when this routine: mkl_sparse_d_add is called several thousand times, it takes up all my 64 GB memory and program cannot go on. I'm not sure whether other sparse routines has similar problems, but at least routines mkl_sparse_d_create_coo, mkl_sparse_convert_csr, mkl_sparse_d_mv do not have this problem.

Please take a check of it, thank you very much!

↧

serious memory leak problem found within mkl_sparse_d_add

October 12, 2019, 7:37 pm

Latest and popular articles on Intel Technologies

≫ Next: dtrnlspbc_solve solution outside the allowed range

≪ Previous: Serious memory leak problem of mkl_sparse_d_add subroutine

Hi,

Please take a check of it, thank you very much!

↧

dtrnlspbc_solve solution outside the allowed range

October 13, 2019, 11:40 pm

Latest and popular articles on Intel Technologies

≫ Next: Using MKL Features: MKL Direct Call, MKL JIT, MKL Compact API, MKL Batch API, MKL Packed API on the Single Dynamic Library

≪ Previous: serious memory leak problem found within mkl_sparse_d_add

We are using a Trust Region MKL API: dtrnlspbc_solve and related functions.
We use the optimization with constraints.
The objective function written by us worked properly for years, in your algorithm.
Untill a customer of us signalated a strange behaviour, this is, a valid solution outside the allowed range!

We observed that, moving a little bit the initial conditions, if works! why this can happen?
Any suggestions?

Thank you

Gianluca

↧

Using MKL Features: MKL Direct Call, MKL JIT, MKL Compact API, MKL Batch API, MKL Packed API on the Single Dynamic Library

October 14, 2019, 11:52 am

Latest and popular articles on Intel Technologies

≫ Next: Why different thread num makes no different in performance?

≪ Previous: dtrnlspbc_solve solution outside the allowed range

Hello,

I saw many cool features of MKL listed in Speed Up Small-Matrix Multiplication using New Intel® Math Kernel Library Capabilities.
I am also aware of MKL JIT feature.

I was wondering, if I linked my code in the Using the Single Dynamic Library model, can I still use those?

Could you please list which of the features is available on the Single Dynamic Library?

Thank You.

↧

Why different thread num makes no different in performance?

October 14, 2019, 3:02 am

Latest and popular articles on Intel Technologies

≫ Next: Unexpected DftiComputeForward failures using larger inputs

≪ Previous: Using MKL Features: MKL Direct Call, MKL JIT, MKL Compact API, MKL Batch API, MKL Packed API on the Single Dynamic Library

Hi everyone,

I'm testing MKL using VisualStudio 2019 and MKL v2019.5 on Intel i7-9750H CPU with 6 cores and 12 threads.I'm interested in the time consumed of vector mathematics and FFT functions in MKL.As I understand it, as to these two categories of functions,the time consumed should decrease when max theads num increases.But it did'nt happen to vector mathematics functions.I have tested vcMul and vcAdd function.The time consumed just makes no much different between thread num setting to 1 and 6.It's werid to me and I can't figure out a reason for it.Can anyone help me about it?The code is attached below,thanks very much!

////////////////////////////////

int N = 16384;
int M = 2000;

//#define FFTTEST
#define CMULTEST
int main(void)
{

double clkfreq = mkl_get_clocks_frequency();

   unsigned MKL_INT64 startclk, endclk;
   double time;
   double time2[16384];
   int kk = 0;

/* Execution status */
MKL_LONG status = 0;

DFTI_DESCRIPTOR_HANDLE hand = 0;

//mkl_set_dynamic(0);

   //mkl_set_num_threads(1);
   int threadnum = mkl_get_max_threads();
   printf("设置线程数：%d\n", threadnum);
   printf("FFT点数：%d FFT次数：%d\n", N,M);

   /* Pointer to input/output data */
   MKL_Complex8* x = 0;
   MKL_Complex8* y = 0;
   x = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   y = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   MKL_Complex8* x2 = 0;
   MKL_Complex8* y2 = 0;
   x2 = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   y2 = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   if (x == NULL) goto failed;

   init2(x, x2);
   vmlSetMode(VML_EP);
   mkl_get_cpu_clocks(&startclk);
   for (kk = 0; kk < M; kk++)
   {
       vcAdd(N, &x[N*kk], &x2[N * kk], &y[N * kk]);
   }
   mkl_get_cpu_clocks(&endclk);
   time = (double)(endclk - startclk) / (clkfreq * 1e9) * 1e6 / M;
   printf("复乘： %f us\n", time);

   mkl_free(x);
   mkl_free(y);
   mkl_free(x2);
   mkl_free(y2);

failed:
return 0;
}

↧