Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

Is mkl speed dependend upon how contiguous memory is?

$
0
0

Does the speed of the mkl blas/lapack library routines change significantly when one has contiguous memory versus not?

I have a strange problem that looks like a "Memory Cache Leak" (not a memory leak) leading to a slow down of a program.

Let me set the stage first. Reproducibly (using ganglia to monitor), on a cluster I have noticed that the cached memory is increasing, relatively slowly. When it becomes large, something like 2/3 of the total memory (Intel Gold with 32 cores & 192Gb) a program is running slower by about a factor of ~1.5. If I clear the cache and sync the disc (I have not tested which matter) with "sync ; echo 3 > /proc/sys/vm/drop_caches" the speed of the program increases back (~1.5 times faster).

The issue seems to be associated with I/O -- the relevant code uses mpi and only the core that is doing any I/O shows the cache leak. The program is doing a fair amount of I/O, but not massive amounts (10-40 Mb). I compile using ifort with -assume buffered_io. My suspicion is that may leave some cached files at the end, effectively a "cache leak".

The program uses a large number of blas/lapack calls. It is reasonable that the memory is less contiguous when the cached memory is large -- fragmented RAM. Can this lead to a speed change of the blas/lapack routines?


Can PARDISO routines exploit an initial guess?

$
0
0

Hi,

I'm using PARDISO to find vector x in the equation

Ax = b

Where A is a real unsymmetric matrix. In the PARDISO manual, they mention the vector x is only accessed in the solution phase.

If I initially have a pretty good estimate of what X is, can I use that estimate somehow (like I can in, say, the Jacobi or Gauss-Seidel algorithm)?

Thanks

Meeting Access Conflict when trying to use mkl_sparse_convert_csr

$
0
0

Hi, has anyone encounterd the situation I met? All I did was try to convert a sparse matrix in CSC format to CSR format.

At the foremost , I use mkl_sparse_z_creat_csc to create a matrix in CSC format, and the stat is '0', which shows that the creation is successful, and I use mkl_sparse_convert_csr to get matrix in CSR format. I`ve tried this function using a small matrix and surely it works, however, when I use it to convert a big matrix, the compiler suddenly occur to prompt me that there exists an Access Conflict when running the convert function, and I have no idea why this hint comes to me, so is there anyone knows why?

PARDISO - different time in factorization step

$
0
0

Hi all

We use the PARDISO solver (Parallel Studio 2018) in our FEM code for a symmetric indefinite system and discovered that the time spent in the factorization step may vary significantly. We use the following solver parameters, all other values are set to zero:

    iparm [0] =  1;
    iparm [1] =  2;
    iparm [2] =  4;
    iparm [9] =  8;

The solver output of a slow and fast example can be found below. The total time differs almost by a factor of 2 despite similar size of the equation system. This makes PARDISO clearly less attractive than our iterative GPU solver in certain cases. Is this a performance issue/bug? Do we use bad settings? Or is it just normal behavior? Any help is highly appreciated.

Thank you very much and best regards

David

 

Slow example:

Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.703717 s
Time spent in reordering of the initial matrix (reorder)         : 11.242095 s
Time spent in symbolic factorization (symbfct)                   : 3.675149 s
Time spent in data preparations for factorization (parlist)      : 0.107626 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 133.596461 s
Time spent in direct solver at solve step (solve)                : 1.954850 s
Time spent in allocation of internal data structures (malloc)    : 0.250255 s
Time spent in additional calculations                            : 4.518999 s
Total time spent                                                 : 156.049152 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

< Linear system Ax = b >
             number of equations:           2956396
             number of non-zeros in A:      79437854
             number of non-zeros in A (%): 0.000909

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    338571
             size of largest supernode:               17752
             number of non-zeros in L:                3101919901
             number of non-zeros in U:                1
             number of non-zeros in L+U:              3101919902
             gflop   for the numerical factorization: 20373.427694

             gflop/s for the numerical factorization: 152.499756

 

Fast example:

Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.462895 s
Time spent in reordering of the initial matrix (reorder)         : 10.686642 s
Time spent in symbolic factorization (symbfct)                   : 3.656114 s
Time spent in data preparations for factorization (parlist)      : 0.108843 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 64.537849 s
Time spent in direct solver at solve step (solve)                : 1.397830 s
Time spent in allocation of internal data structures (malloc)    : 0.240506 s
Time spent in additional calculations                            : 4.692960 s
Total time spent                                                 : 85.783639 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

< Linear system Ax = b >
             number of equations:           2961080
             number of non-zeros in A:      83916044
             number of non-zeros in A (%): 0.000957

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    308944
             size of largest supernode:               6004
             number of non-zeros in L:                2958633165
             number of non-zeros in U:                1
             number of non-zeros in L+U:              2958633166
             gflop   for the numerical factorization: 8909.331438

             gflop/s for the numerical factorization: 138.048162

 

 

 

Intel MKL FEAST

$
0
0

Hi

I am using feast extremel eigenvalue solver (dexample_extremal_ev_c1.c). for finding lowest eigenvalues. My matrix size is big 1 million cross 1 million. There are around 31 non zero values in each row.  I can find lower 300 eigenvalues easily, but when I try to find 310 lower eigenvalues(meaning 10 more), my code got just crashed down and shows an error.

mkl_sparse_d_ev output info 2
Routine mkl_sparse_d_ev returns code of ERROR: 2Time taken: 19.06s

Please help me regarding this.

 

Looking for a library or package that can read and write sparse matrices

$
0
0

When Intel visual Fortran call Pardiso in MKL to solve a linear equation group, you need to read the sparse matrix from the file. But there are many ways to store sparse matrices in files (such as: COO,CSR,CSC,DIO,et al.), and I want to find a library or package that can read and write sparse matrices.This library or package needs to meet two conditions: (1) There is a Fortran interface, (2) suitable for various sparse matrix storage methods. Or, what other methods do you have to solve the problem of reading and writing sparse matrices. I need your advice. Thank you.

Schur complement for asymmetric matrix

$
0
0

Hi all,

I am using mkl pardiso to do Schur complement. So far I got it working for symmetric matrix.  But for asymmetric matrix, I got crash at phase = 11. The error message is "... Access violation writing location ...." 

Can anyone provide any guidance on what the possible root cause is? I am using mkl 2018 update 2. The source code is as below. 

Thanks!

Hainan

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#include <mkl.h>

#include <iostream>

 

void main()

{

int n = 5;

double a[11] = { 2, 1.0,

1.0, 1.0,

6.0, 7.0,

2.5,

1.0, 1.0, 7.0, 3.5 };

int ia[6] = { 0, 2, 4, 6, 7, 11 };

int ja[11] = { 0, 4,

1, 4,

2, 4,

3,

0, 1, 2, 4 };

int matrix_order = LAPACK_ROW_MAJOR;

char uplo = 'U';

int mtype = 11;   /* Real asymmetric matrix */

  /* RHS and solution vectors. */

//double b[8], x[8];

int nrhs = 1;     /* Number of right hand sides. */

  /* Internal solver memory pointer pt, */

  /* 32-bit: int pt[64]; 64-bit: long int pt[64] */

  /* or void *pt[64] should be OK on both architectures */

void *pt[64];

/* Pardiso control parameters. */

int iparm[64];

int maxfct, mnum, phase, error, msglvl, info;

/* Auxiliary variables. */

int i, j;

double ddum;          /* Double dummy */

int idum;         /* Integer dummy. */

  /* Schur data */

double schur[3] = { 0.0, 0.0,

0.0 };

int perm[5] = { 0, 0, 1, 1, 1 };

int ipiv[2];

int n_schur = 3; /* Schur complement solution size */

/* -------------------------------------------------------------------- */

/* .. Setup Pardiso control parameters. */

/* -------------------------------------------------------------------- */

for (i = 0; i < 64; i++)

{

iparm[i] = 0;

}

iparm[1 - 1] = 1;         /* No solver default */

iparm[2 - 1] = 2;         /* Fill-in reordering from METIS */

iparm[10 - 1] = 8;        /* Perturb the pivot elements with 1E-13 */

iparm[11 - 1] = 0;        /* Use nonsymmetric permutation and scaling MPS */

iparm[13 - 1] = 0;        /* Maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm[12] = 1 in case of inappropriate accuracy */

iparm[14 - 1] = 0;        /* Output: Number of perturbed pivots */

iparm[18 - 1] = -1;       /* Output: Number of nonzeros in the factor LU */

iparm[19 - 1] = -1;       /* Output: Mflops for LU factorization */

iparm[35 - 1] = 1;        /*0: one-base; 1: zero-base*/

iparm[36 - 1] = 1;        /* Use Schur complement */

 

maxfct = 1;           /* Maximum number of numerical factorizations. */

mnum = 1;             /* Which factorization to use. */

msglvl = 1;           /* Print statistical information in file */

error = 0;            /* Initialize error flag */

  /* -------------------------------------------------------------------- */

  /* .. Initialize the internal solver memory pointer. This is only */

  /* necessary for the FIRST call of the PARDISO solver. */

  /* -------------------------------------------------------------------- */

for (i = 0; i < 64; i++)

{

pt[i] = 0;

}

/* -------------------------------------------------------------------- */

/* .. Reordering and Symbolic Factorization. This step also allocates   */

/* all memory that is necessary for the factorization.                  */

/* -------------------------------------------------------------------- */

phase = 11;

pardiso(pt, &maxfct, &mnum, &mtype, &phase,

&n, a, ia, ja, perm, &nrhs, iparm, &msglvl, &ddum, &ddum, &error);

if (error != 0)

{

printf("\nERROR during symbolic factorization: %d", error);

exit(1);

}

}

Can I use MKL_DIRECT_CALL JIT on a Core i5 CPU?

$
0
0

The MKL Developer's Guide says JIT can be used on Xeon processor with AVX2, but can it be used on a Core i5 CPU with AVX2?


What LAPACK function is available to calculate 2-norm of a matrix?

$
0
0

What LAPACK function is available to calculate 2-norm (or spectral norm) of a matrix? Thank you.

Questions & Puzzled about mkl_sparse_spmm

$
0
0

Hi, I`m very very very anxious now because of the using of mkl_sparse_spmm function always feedback me an error beyond my understanding. When I trying to compute the product of two sparse matrix created by mkl_sparse_csr_create function, for the foremost, everything tends to be good, but when I tried to compute it again, an Access Conflict error appeared and I have no idea why this error shows.

Here is part of my codes
    do i = 1, 5    

!*steps to compute the values needed are ignored,and I `ve tested them to make sure they`re correct*!

        !create sparse matrix, this step is right and the stat = 0
        stat=mkl_sparse_z_create_csr(local(i)%Krc%B,sparse_index_base_one,sub(i)%temp, &
            sub(i)%Noc,local(i)%rows_start,local(i)%rows_end,local(i)%col_indx,local(i)%values)
        write(*,*) stat    

      !create sparse matrix, this step is right and the stat = 0
        stat=mkl_sparse_z_create_csr(local(i)%Kcr%A,sparse_index_base_one,&
            local(i)%Kcr%rows,local(i)%Kcr%cols,local(i)%Kcr%rows_start,local(i)%Kcr%rows_end,&
            local(i)%Kcr%col_indx,local(i)%Kcr%values)
        write(*,*)  stat

!compute the product
        info = mkl_sparse_spmm(sparse_operation_non_transpose,local(i)%Kcr%A,local(i)%Krc%B,local(i)%Kcr%B)
        write(*,*) info 
 
    end do

At the first loop for i == 1, it seems good, bu when it turns to the loop for i == 2, then the compiler feedback an error of Access Conflict which confuses me most. I can make sure that the sparse matrices are correct. and I did not deallocate the memory or destroy the sparse matrix handle ,so I have no ides why this situation shows. 

Can U help me?

Is there any service function that helps us form the needed matrix storage scheme?

$
0
0

Although the various storage schemes are explained in detail in MKL Reference Manual, I believe human users are still willing to present their matrix in conventional full storage form. Is there any service functions in MKL that take in a full-storage matrix and convert it to other storage format like packed, band or RFP? That would reduce programming work a lot.

using MKL's fft in fixed point dsp codes

$
0
0

Hi,

We've started exploring MKL's FFT in our dsp programs. We were able to set up FFT and get correct output. One issue we have now is that: the data flows before and after FFT are all in fixed point, int32_t, e.g.. Other than converting vectors btw int32_t and float/double, bf and af FFT, is there any other options, any provision in MKL that we missed, or Intel's IPP would be a better option, b/c they do seem to have some vector/image data type conversion tools. Any advice? Thanks.

Best,

Raymond

Fortran MKL matrix operations in pure CSR format

$
0
0

Hi folks, I'm an enthusiastic Intel-MKL user employing parallel Fortran and C code. I need to perform specific

CSR matrix-dot-matrix operations like MKL_SPARSE_SYRK without resorting

to dense matrices at all. Where I can found some examples showing practical use of these routines in

Fortran?  Why there is no example even in the most recent version of the manual?

Pleaase, consider that this topic is fundamental for solving large sparse algebraic parallel problems

in CFD and electromagnetics (e.g. Tykhonov Regolarization of ill-posed problems) so all suggestions are welcome.

Thanks, Ciro

 

ANCILLARY ROUTINES: mkl_sparse_d_create_csr, mkl_sparse_convert_csr, etc.

 

memory leakage(内存泄漏)

$
0
0

vslsConvNewTask1D will add 1M memory when running in first time,

vslConvDeleteTask(&t) can not free the memory, why?

(vslsConvNewTask1D 这个函数会增加1M左右的内存,运行vslConvDeleteTask后并不能释放增加的内存,请问为什么?)

Intel® MKL version 2019 Update 1 is now available

$
0
0

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance.

Intel MKL 2019 Update 1 packages are now ready for download.

Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio. Please visit the Intel® Math Kernel Library Product Page.

Please see What's new in Intel MKL 2019 and in MKL 2019 Update 1 follow this link - https://software.intel.com/en-us/articles/intel-math-kernel-library-rele...

and here is the link to the MKL 2019 Bug Fix list - https://software.intel.com/en-us/articles/intel-math-kernel-library-2019...

 


Problem with Compiling HPL using MKL&MPI

$
0
0

Hello everyone~

I've installed Intel MKL & Intel MPI for linux to run HPL, however, I have problem while compiling it.

 I use virtual machine to run CentOS, imformation of my computer and virtual machine setting is written below.

Oracle VirtualBox 5.2.20

OS: CentOS 6.10

CPU: Intel(R) Core(TM) i5-4200H *2

Memory(for Centos): 8192MB

gcc version: Red Hat 7.3.1-5

MKL: Intel®MKL 2019 Initial Release

MPI: Intel®MPI Library(Linux*packages) 2019 Initial Release

I reference the article 

https://software.intel.com/en-us/articles/performance-tools-for-software...

but I got an error while I do the make arch

I choose Make.Linux_PII_CBLAS to compile and rename it Make.intel64

below is the contents of my Makefile 

#------------------------------------------------

SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch

ARCH         = intel64

TOPdir       = /root/hpl
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a
 

MPdir        = /opt/intel/impi/2019.0.117/intel64
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/release/libmpi.a 

 

LAdir        = /opt/intel/compilers_and_libraries/linux/mkl
ifndef  LAinc
LAinc        = -I$(LAdir)/include
endif
ifndef  LAlib
LAlib        = -L $(LAdir)/lib/intel64 \
               -Wl,--start-group \
               $(LAdir)/lib/intel64/libmkl_intel_lp64.a \
               $(LAdir)/lib/intel64/libmkl_intel_thread.a \
               $(LAdir)/lib/intel64/libmkl_core.a \
               -Wl,--end-group -lpthread -ldl -lm
endif

 

F2CDEFS      =
 

HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)

HPL_OPTS     = -DHPL_CALL_CBLAS

HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)

 

CC           = /opt/intel/compilers_and_libraries_2019.0.117/linux/mpi/intel64/bin/mpicc
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops

 

LINKER       = $(CC)
LINKFLAGS    = $(CCFLAGS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# --------------------------------------------

 

and the error I get

---------------------------

 

/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__trsm_drv.o): In function `omp_driver_iterative':
_trsm.c:(.text+0x7f): undefined reference to `__kmpc_global_thread_num'
_trsm.c:(.text+0x4b4): undefined reference to `__kmpc_ok_to_fork'
_trsm.c:(.text+0x4d5): undefined reference to `__kmpc_push_num_threads'
_trsm.c:(.text+0x510): undefined reference to `__kmpc_fork_call'
_trsm.c:(.text+0x52b): undefined reference to `__kmpc_serialized_parallel'
_trsm.c:(.text+0x57e): undefined reference to `__kmpc_end_serialized_parallel'
_trsm.c:(.text+0x123d): undefined reference to `omp_get_thread_num'
_trsm.c:(.text+0x12be): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__trsm_drv.o): In function `omp_driver_v2_leaf':
_trsm.c:(.text+0x3eed): undefined reference to `__kmpc_global_thread_num'
_trsm.c:(.text+0x3f02): undefined reference to `__kmpc_ok_to_fork'
_trsm.c:(.text+0x3f22): undefined reference to `__kmpc_push_num_threads'
_trsm.c:(.text+0x3f58): undefined reference to `__kmpc_fork_call'
_trsm.c:(.text+0x3f73): undefined reference to `__kmpc_serialized_parallel'
_trsm.c:(.text+0x3fc3): undefined reference to `__kmpc_end_serialized_parallel'
_trsm.c:(.text+0x4040): undefined reference to `omp_get_thread_num'
_trsm.c:(.text+0x40b7): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__trsm_drv.o): In function `omp_driver_recursive':
_trsm.c:(.text+0x7247): undefined reference to `__kmpc_global_thread_num'
_trsm.c:(.text+0x725c): undefined reference to `__kmpc_ok_to_fork'
_trsm.c:(.text+0x727d): undefined reference to `__kmpc_push_num_threads'
_trsm.c:(.text+0x72b9): undefined reference to `__kmpc_fork_call'
_trsm.c:(.text+0x72d4): undefined reference to `__kmpc_serialized_parallel'
_trsm.c:(.text+0x7327): undefined reference to `__kmpc_end_serialized_parallel'
_trsm.c:(.text+0x73a1): undefined reference to `omp_get_thread_num'
_trsm.c:(.text+0x741d): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_drv.o): In function `omp_parallel_acopy_lookahead':
_gemm.c:(.text+0x207a): undefined reference to `__kmpc_global_thread_num'
_gemm.c:(.text+0x280c): undefined reference to `__kmpc_critical'
_gemm.c:(.text+0x29db): undefined reference to `__kmpc_end_critical'
_gemm.c:(.text+0x2c7e): undefined reference to `__kmpc_critical'
_gemm.c:(.text+0x2e88): undefined reference to `__kmpc_end_critical'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_drv.o): In function `omp_simple_3d':
_gemm.c:(.text+0x738f): undefined reference to `__kmpc_global_thread_num'
_gemm.c:(.text+0x73a3): undefined reference to `__kmpc_ok_to_fork'
_gemm.c:(.text+0x73c3): undefined reference to `__kmpc_push_num_threads'
_gemm.c:(.text+0x749c): undefined reference to `__kmpc_fork_call'
_gemm.c:(.text+0x74bc): undefined reference to `__kmpc_serialized_parallel'
_gemm.c:(.text+0x755c): undefined reference to `__kmpc_end_serialized_parallel'
_gemm.c:(.text+0x770d): undefined reference to `__kmpc_for_static_init_8'
_gemm.c:(.text+0x816c): undefined reference to `__kmpc_for_static_fini'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_drv.o): In function `mkl_blas_dgemm':
_gemm.c:(.text+0x8aa0): undefined reference to `__kmpc_global_thread_num'
_gemm.c:(.text+0x8ab5): undefined reference to `__kmpc_ok_to_fork'
_gemm.c:(.text+0x8ada): undefined reference to `__kmpc_push_num_threads'
_gemm.c:(.text+0x8bad): undefined reference to `__kmpc_fork_call'
_gemm.c:(.text+0x8bce): undefined reference to `__kmpc_serialized_parallel'
_gemm.c:(.text+0x8c66): undefined reference to `__kmpc_end_serialized_parallel'
_gemm.c:(.text+0x8da8): undefined reference to `omp_get_thread_num'
_gemm.c:(.text+0x8de2): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_drv.o): In function `gemm_omp_driver_v2':
_gemm.c:(.text+0xa1de): undefined reference to `__kmpc_global_thread_num'
_gemm.c:(.text+0xa1f2): undefined reference to `__kmpc_ok_to_fork'
_gemm.c:(.text+0xa215): undefined reference to `__kmpc_push_num_threads'
_gemm.c:(.text+0xa277): undefined reference to `__kmpc_fork_call'
_gemm.c:(.text+0xa294): undefined reference to `__kmpc_serialized_parallel'
_gemm.c:(.text+0xa314): undefined reference to `__kmpc_end_serialized_parallel'
_gemm.c:(.text+0xa3b5): undefined reference to `omp_get_thread_num'
_gemm.c:(.text+0xa44b): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_drv.o): In function `use_smalln_kernels':
_gemm.c:(.text+0xdb83): undefined reference to `__kmpc_global_thread_num'
_gemm.c:(.text+0xdb98): undefined reference to `__kmpc_ok_to_fork'
_gemm.c:(.text+0xdbbd): undefined reference to `__kmpc_push_num_threads'
_gemm.c:(.text+0xdc90): undefined reference to `__kmpc_fork_call'
_gemm.c:(.text+0xdcb1): undefined reference to `__kmpc_serialized_parallel'
_gemm.c:(.text+0xdd49): undefined reference to `__kmpc_end_serialized_parallel'
_gemm.c:(.text+0xde32): undefined reference to `omp_get_thread_num'
_gemm.c:(.text+0xde6c): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__scal_drv.o): In function `mkl_blas_dscal':
_scal.c:(.text+0x1f3): undefined reference to `__kmpc_global_thread_num'
_scal.c:(.text+0x208): undefined reference to `__kmpc_ok_to_fork'
_scal.c:(.text+0x229): undefined reference to `__kmpc_push_num_threads'
_scal.c:(.text+0x25f): undefined reference to `__kmpc_fork_call'
_scal.c:(.text+0x27a): undefined reference to `__kmpc_serialized_parallel'
_scal.c:(.text+0x2b3): undefined reference to `__kmpc_end_serialized_parallel'
_scal.c:(.text+0x375): undefined reference to `omp_get_thread_num'
_scal.c:(.text+0x37d): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__iamax_drv.o): In function `mkl_blas_idamax':
_iamax.c:(.text+0x23a): undefined reference to `__kmpc_global_thread_num'
_iamax.c:(.text+0x24f): undefined reference to `__kmpc_ok_to_fork'
_iamax.c:(.text+0x270): undefined reference to `__kmpc_push_num_threads'
_iamax.c:(.text+0x2a6): undefined reference to `__kmpc_fork_call'
_iamax.c:(.text+0x2c1): undefined reference to `__kmpc_serialized_parallel'
_iamax.c:(.text+0x2fa): undefined reference to `__kmpc_end_serialized_parallel'
_iamax.c:(.text+0x4b8): undefined reference to `omp_get_thread_num'
_iamax.c:(.text+0x4bf): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__copy_drv.o): In function `mkl_blas_dcopy':
_copy.c:(.text+0x222): undefined reference to `__kmpc_global_thread_num'
_copy.c:(.text+0x237): undefined reference to `__kmpc_ok_to_fork'
_copy.c:(.text+0x258): undefined reference to `__kmpc_push_num_threads'
_copy.c:(.text+0x28e): undefined reference to `__kmpc_fork_call'
_copy.c:(.text+0x2a9): undefined reference to `__kmpc_serialized_parallel'
_copy.c:(.text+0x2e2): undefined reference to `__kmpc_end_serialized_parallel'
_copy.c:(.text+0x3a7): undefined reference to `omp_get_thread_num'
_copy.c:(.text+0x3af): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__axpy_drv.o): In function `mkl_blas_daxpy':
_axpy.c:(.text+0x24e): undefined reference to `__kmpc_global_thread_num'
_axpy.c:(.text+0x263): undefined reference to `__kmpc_ok_to_fork'
_axpy.c:(.text+0x284): undefined reference to `__kmpc_push_num_threads'
_axpy.c:(.text+0x2ba): undefined reference to `__kmpc_fork_call'
_axpy.c:(.text+0x2d5): undefined reference to `__kmpc_serialized_parallel'
_axpy.c:(.text+0x30e): undefined reference to `__kmpc_end_serialized_parallel'
_axpy.c:(.text+0x3d3): undefined reference to `omp_get_thread_num'
_axpy.c:(.text+0x3db): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_domain_get_max_threads':
mkl_threading.c:(.text+0x229): undefined reference to `omp_get_num_procs'
mkl_threading.c:(.text+0x297): undefined reference to `omp_in_parallel'
mkl_threading.c:(.text+0x340): undefined reference to `omp_get_max_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_get_dynamic':
mkl_threading.c:(.text+0x115a): undefined reference to `omp_get_num_procs'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_omp_in_parallel':
mkl_threading.c:(.text+0x2681): undefined reference to `omp_in_parallel'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_get_num_stripes':
mkl_threading.c:(.text+0x270a): undefined reference to `omp_get_num_procs'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_get_ncpus':
mkl_threading.c:(.text+0x33ea): undefined reference to `omp_get_num_procs'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_get_ncorespercpu':
mkl_threading.c:(.text+0x40aa): undefined reference to `omp_get_num_procs'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_get_ht':
mkl_threading.c:(.text+0x4d6a): undefined reference to `omp_get_num_procs'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_threading_patched.o): In function `mkl_serv_get_nlogicalcores':
mkl_threading.c:(.text+0x5a1a): undefined reference to `omp_get_num_procs'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(mkl_print_verbose_patched.o): In function `mkl_serv_print_verbose_info':
mkl_print_verbose.c:(.text+0x1c3): undefined reference to `omp_get_thread_num'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d_ds_ger_omp.o): In function `mkl_blas_dger_omp':
ds_ger_omp.c:(.text+0x58): undefined reference to `__kmpc_global_thread_num'
ds_ger_omp.c:(.text+0x6a): undefined reference to `__kmpc_ok_to_fork'
ds_ger_omp.c:(.text+0x89): undefined reference to `__kmpc_push_num_threads'
ds_ger_omp.c:(.text+0x10c): undefined reference to `__kmpc_fork_call'
ds_ger_omp.c:(.text+0x124): undefined reference to `__kmpc_serialized_parallel'
ds_ger_omp.c:(.text+0x18c): undefined reference to `__kmpc_end_serialized_parallel'
ds_ger_omp.c:(.text+0x2d0): undefined reference to `__kmpc_for_static_init_8'
ds_ger_omp.c:(.text+0x3fd): undefined reference to `__kmpc_for_static_fini'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemv_omp.o): In function `mkl_blas_dgemv_omp':
_gemv_omp.c:(.text+0x11d): undefined reference to `__kmpc_global_thread_num'
_gemv_omp.c:(.text+0x132): undefined reference to `__kmpc_ok_to_fork'
_gemv_omp.c:(.text+0x153): undefined reference to `__kmpc_push_num_threads'
_gemv_omp.c:(.text+0x254): undefined reference to `__kmpc_fork_call'
_gemv_omp.c:(.text+0x275): undefined reference to `__kmpc_serialized_parallel'
_gemv_omp.c:(.text+0x32a): undefined reference to `__kmpc_end_serialized_parallel'
_gemv_omp.c:(.text+0x480): undefined reference to `omp_get_thread_num'
_gemv_omp.c:(.text+0x593): undefined reference to `omp_get_num_threads'
_gemv_omp.c:(.text+0x79e): undefined reference to `__kmpc_master'
_gemv_omp.c:(.text+0x7dc): undefined reference to `__kmpc_end_master'
_gemv_omp.c:(.text+0x7ee): undefined reference to `__kmpc_barrier'
_gemv_omp.c:(.text+0x8dc): undefined reference to `__kmpc_barrier'
_gemv_omp.c:(.text+0xb06): undefined reference to `__kmpc_barrier'
_gemv_omp.c:(.text+0xb1b): undefined reference to `__kmpc_master'
_gemv_omp.c:(.text+0xb3f): undefined reference to `__kmpc_end_master'
_gemv_omp.c:(.text+0xc66): undefined reference to `__kmpc_master'
_gemv_omp.c:(.text+0xccc): undefined reference to `__kmpc_end_master'
_gemv_omp.c:(.text+0xcde): undefined reference to `__kmpc_barrier'
_gemv_omp.c:(.text+0xdba): undefined reference to `__kmpc_barrier'
_gemv_omp.c:(.text+0xdcf): undefined reference to `__kmpc_master'
_gemv_omp.c:(.text+0xf7b): undefined reference to `__kmpc_end_master'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemmger_omp.o): In function `mkl_blas_dgemmger_omp':
_gemmger_omp.c:(.text+0x58): undefined reference to `__kmpc_global_thread_num'
_gemmger_omp.c:(.text+0x6a): undefined reference to `__kmpc_ok_to_fork'
_gemmger_omp.c:(.text+0x89): undefined reference to `__kmpc_push_num_threads'
_gemmger_omp.c:(.text+0xf7): undefined reference to `__kmpc_fork_call'
_gemmger_omp.c:(.text+0x112): undefined reference to `__kmpc_serialized_parallel'
_gemmger_omp.c:(.text+0x1b7): undefined reference to `__kmpc_end_serialized_parallel'
_gemmger_omp.c:(.text+0x335): undefined reference to `__kmpc_for_static_init_8'
_gemmger_omp.c:(.text+0x46d): undefined reference to `__kmpc_for_static_fini'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__trsm_omp_v1_drv.o): In function `mkl_blas_dtrsm_omp_driver_v1':
_trsm_omp_v1.c:(.text+0x108): undefined reference to `__kmpc_global_thread_num'
_trsm_omp_v1.c:(.text+0x233): undefined reference to `__kmpc_ok_to_fork'
_trsm_omp_v1.c:(.text+0x252): undefined reference to `__kmpc_push_num_threads'
_trsm_omp_v1.c:(.text+0x30a): undefined reference to `__kmpc_fork_call'
_trsm_omp_v1.c:(.text+0x328): undefined reference to `__kmpc_serialized_parallel'
_trsm_omp_v1.c:(.text+0x3b1): undefined reference to `__kmpc_end_serialized_parallel'
_trsm_omp_v1.c:(.text+0x56f): undefined reference to `__kmpc_ok_to_fork'
_trsm_omp_v1.c:(.text+0x58e): undefined reference to `__kmpc_push_num_threads'
_trsm_omp_v1.c:(.text+0x64a): undefined reference to `__kmpc_fork_call'
_trsm_omp_v1.c:(.text+0x668): undefined reference to `__kmpc_serialized_parallel'
_trsm_omp_v1.c:(.text+0x6f0): undefined reference to `__kmpc_end_serialized_parallel'
_trsm_omp_v1.c:(.text+0x89d): undefined reference to `omp_get_thread_num'
_trsm_omp_v1.c:(.text+0x95c): undefined reference to `omp_get_num_threads'
_trsm_omp_v1.c:(.text+0xb50): undefined reference to `omp_get_thread_num'
_trsm_omp_v1.c:(.text+0xc17): undefined reference to `omp_get_num_threads'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_omp_driver_v1':
_gemm_omp_v1.c:(.text+0x577): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x6ac): undefined reference to `__kmpc_ok_to_fork'
_gemm_omp_v1.c:(.text+0x6d1): undefined reference to `__kmpc_push_num_threads'
_gemm_omp_v1.c:(.text+0x7c3): undefined reference to `__kmpc_fork_call'
_gemm_omp_v1.c:(.text+0x7e4): undefined reference to `__kmpc_serialized_parallel'
_gemm_omp_v1.c:(.text+0x8a2): undefined reference to `__kmpc_end_serialized_parallel'
_gemm_omp_v1.c:(.text+0xa6d): undefined reference to `omp_get_num_threads'
_gemm_omp_v1.c:(.text+0xbcd): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0xc19): undefined reference to `omp_get_num_threads'
_gemm_omp_v1.c:(.text+0xdf2): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0xe41): undefined reference to `omp_get_num_threads'
_gemm_omp_v1.c:(.text+0x1527): undefined reference to `__kmpc_ok_to_fork'
_gemm_omp_v1.c:(.text+0x154c): undefined reference to `__kmpc_push_num_threads'
_gemm_omp_v1.c:(.text+0x15e1): undefined reference to `__kmpc_fork_call'
_gemm_omp_v1.c:(.text+0x15ff): undefined reference to `__kmpc_serialized_parallel'
_gemm_omp_v1.c:(.text+0x166e): undefined reference to `__kmpc_end_serialized_parallel'
_gemm_omp_v1.c:(.text+0x16d7): undefined reference to `__kmpc_ok_to_fork'
_gemm_omp_v1.c:(.text+0x16fc): undefined reference to `__kmpc_push_num_threads'
_gemm_omp_v1.c:(.text+0x1783): undefined reference to `__kmpc_fork_call'
_gemm_omp_v1.c:(.text+0x17a1): undefined reference to `__kmpc_serialized_parallel'
_gemm_omp_v1.c:(.text+0x1811): undefined reference to `__kmpc_end_serialized_parallel'
_gemm_omp_v1.c:(.text+0x1955): undefined reference to `__kmpc_single'
_gemm_omp_v1.c:(.text+0x1a27): undefined reference to `__kmpc_end_single'
_gemm_omp_v1.c:(.text+0x1a3c): undefined reference to `__kmpc_barrier'
_gemm_omp_v1.c:(.text+0x1b00): undefined reference to `__kmpc_single'
_gemm_omp_v1.c:(.text+0x1b3c): undefined reference to `__kmpc_end_single'
_gemm_omp_v1.c:(.text+0x1b51): undefined reference to `__kmpc_barrier'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_1D_row':
_gemm_omp_v1.c:(.text+0x2a87): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0x2ba1): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x2bb1): undefined reference to `__kmpc_barrier'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_1D_col':
_gemm_omp_v1.c:(.text+0x2d20): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0x2e44): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x2e54): undefined reference to `__kmpc_barrier'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_2D_bsrc':
_gemm_omp_v1.c:(.text+0x2f88): undefined reference to `omp_get_thread_num'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_2D_improved_bsrc':
_gemm_omp_v1.c:(.text+0x33ab): undefined reference to `omp_get_thread_num'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_2D_bcopy':
_gemm_omp_v1.c:(.text+0x36ba): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x37c3): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0x3a8a): undefined reference to `__kmpc_barrier'
_gemm_omp_v1.c:(.text+0x3c21): undefined reference to `__kmpc_barrier'
_gemm_omp_v1.c:(.text+0x3c36): undefined reference to `__kmpc_barrier'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_1D_with_copy_0':
_gemm_omp_v1.c:(.text+0x3cf2): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x3dd5): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0x3ffb): undefined reference to `__kmpc_barrier'
_gemm_omp_v1.c:(.text+0x4199): undefined reference to `__kmpc_barrier'
_gemm_omp_v1.c:(.text+0x41ae): undefined reference to `__kmpc_barrier'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_2D_abcopy_abx_m_km_par_p':
_gemm_omp_v1.c:(.text+0x43aa): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0x43c6): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x43d6): undefined reference to `__kmpc_barrier'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_2D_xgemm_p':
_gemm_omp_v1.c:(.text+0x51b3): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0x51d4): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x51e4): undefined reference to `__kmpc_barrier'
/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_intel_thread.a(d__gemm_omp_v1_drv.o): In function `mkl_blas_dgemm_2D_acopy_n':
_gemm_omp_v1.c:(.text+0x5652): undefined reference to `__kmpc_global_thread_num'
_gemm_omp_v1.c:(.text+0x56f5): undefined reference to `omp_get_thread_num'
_gemm_omp_v1.c:(.text+0x5a8d): undefined reference to `__kmpc_barrier'
_gemm_omp_v1.c:(.text+0x5cc1): undefined reference to `__kmpc_barrier'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:76: dexe.grd] Error 1
make[2]: Leaving directory '/root/hpl/testing/ptest/intel64'
make[1]: *** [Make.top:68: build_tst] Error 2
make[1]: Leaving directory '/root/hpl'
make: *** [Makefile:73: build] Error 2
 

---------------------------------------------------------

What should I do to fix the problem?

Thanks!

[icpc Warning]: Specifying -lm before files may supersede the Intel(R) math library and affect performance

$
0
0

I just notice this warning while compiling my C++ code. My sIntel version is 

compilers_and_libraries_2018.3.222

My compile options from the MKL advisor is in MKL advisor

Then it will say a warning:

Specifying -lm before files may supersede the Intel(R) math library and affect performance

Although it is just a warning, I am not sure whether or not it will harm the performance. Does anyone have the similar problem?

 

 

 

Sum along specific matrix axis

$
0
0

I am working on a project where I want to accelerate numpy element-wise multiplication and sum.

What I am doing is transfer the numpy array to C pointer and use MKL function to accelerate them(through cython)

For element-wise multiplication I have got the vdmul function. However when I check for sum there is no suitable function

in MKL which could sum a matrix along its specific axis and return a smaller matrix.

Example:

input: matrix A, shape is [100,200,300]

B = sum(A, axis = 0)

B shape is [200,300]

Could anyone give some advice? Thank you very much!

Strange bus error in Pardiso

$
0
0

Hi, all !

I just met a strange bus error while using Pardiso. The following is the detailed information.

First, the purpose of my code: As I study finite element method (FEM), I would like to use Pardiso to solve my equations. Since the right hand side changes at each time step but the matrix keeps the same, I want to factorize the matrix once and reuse it for many times. I wrote three subroutines, each of which contains different stages of Pardiso (1. Reordering&factorization; 2. back substitution; 3. release of memory), and linked them with the pointer array pt.

Second, the problem: I put the three subroutines (solver_pardiso_factor/bsubst/termin) in the module (Solver.f90) and call them in the main program (solvercheck.f90). Everything is ok unless the last step of the memory release. The second call of solver_pardiso_termin of matrix 2 returns the bus error.

Third, what's strange: If I put solver_pardiso_termin at the beginning of the module, followed by solver_pardiso_factor and solver_pardiso_bsubst , the code runs in a good condition.

I'm confused about the problem. If anyone can help me, I would be very rateful. 

My platform is CentOS 6.9, the version of the ifort is 13.1.1, the version of MKL is 11.0.5. Attached is my code.

Thanks!

 

 

AttachmentSize
Downloadapplication/octet-streamSolver.f907.53 KB
Downloadapplication/octet-streamsolvercheck.f902.86 KB

PARDISO - huge memory requirement, slow calculation

$
0
0

I implemented PARDISO solver to solve a linear system of equations with a sparse matrix. I tested it using a model that  yields 1 million equations, each of them containing typically 7 nonzero elements.

All PARDISO settings in iparm are default, matrix type is 11 (real and nonsymmetric), one core is used for solving the system, phase is 13. These settings allocate 15 GB of memory during the phase 13, it takes about 9 min to solve the system.

My colleague tested the very same model in OpenFOAM and its matrix solver (I do not know what exactly they use) needed 1 GB of memory and it only took 1 minute to solve the system. His computer has the same CPU and 1 core was used for the solution.

My questions are:

  1. Would the solution be significantly faster and significantly more memory efficient if the matrix was symmetrical?
  2. Is there any way in MKL to find out whether the matrix is symmetrical and positive definite or indefinite?
  3. Does it make sense to try different sparse solvers than PARDISO?

Thank you in advance for any suggestions and tips.

Viewing all 3005 articles
Browse latest View live