Shared Library for Intel fftw wrapper

July 20, 2017, 7:02 am

Latest and popular articles on Intel Technologies

≫ Next: Statically linking to MKL 2017.3 on GCC 5.4.0

≪ Previous: Facing error when running PARDISO at phase11

Dear Engineer,

I am a newbee for complier opensource software. Also I found that intel parallel XE support FFTW wrapper include C and Fortran interface, I successful complier the .a static library on CentOS 7 Linux, I want to know, how to complier a shared library .so?

Thanks a lot and best wishes!

Weasley

↧

Statically linking to MKL 2017.3 on GCC 5.4.0

July 20, 2017, 11:25 pm

Latest and popular articles on Intel Technologies

≫ Next: Trying to skip pardiso reordering.

≪ Previous: Shared Library for Intel fftw wrapper

Hi there, I am trying to statically link my library to MKL 2017.3 (using Intel OpenMP for threading), under GCC 5.4.0, on Ubuntu. So according to the link line advisor, I need to link with libmkl_core, libmkl_intel_thread, libmkl_intel_lp64, and libiomp5. I'm finding that when I try to link with libiomp5.a, (and all the others *.a) I get the following linker error:

/usr/bin/ld: libXXX.so: version node not found for symbol omp_set_nest_lock@OMP_3.0
/usr/bin/ld: failed to set dynamic section sizes: Bad value
collect2: error: ld returned 1 exit status

[I replaced my library name with XXX]

When I change the configuration to link with libiomp5.so (and the rest still *.a) my program builds successfully (I have not yet tried to run it, for unrelated reasons). Can anyone explain to me what the above error means? I assumed it was due to the symbol omp_set_nest_lock@OMP_3.0 not being defined, but I checked libiomp5.a with the 'nm' command-line tool and the symbol does appear to be defined there. I'm reasonably confident that the linker is finding all the libraries.

In general, is it valid to link statically with libiomp5? I would have assumed so, since libiomp5.a is provided. I have also successfully linked my library to MKL statically on Windows and Mac, so I assumed it would be possible on Linux also.

Any help would be much appreciated. Let me know if you need more information. I've done very little development for non-Windows platforms, so hopefully I haven't omitted anything important.

↧

Trying to skip pardiso reordering.

July 24, 2017, 3:31 am

Latest and popular articles on Intel Technologies

≫ Next: LAPACK(zsptrf), using Complex has no speed improvement comparing with Complex

≪ Previous: Statically linking to MKL 2017.3 on GCC 5.4.0

I'm currently trying to skip Pardiso reordering as I have already manually reordered the matrix previously.

As far as I can understand from the table description I should be able to skip the reordering by setting

iparm(5)=1

and

perm(i)=i for i=1,n

As is also suggested in the following topic:

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...

However when I do this and perform a factorization it still spends the majority of its time doing reordering!

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
 0
 1
 2
 3
 4
 5
 6
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 22
 23
 24
 25
 26
 27
 28
 29
 30
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 58
 59
 62
 64
 65
 68
 70
 73
 74
 75
 76
 79
 82
 83
 84
 85
 87
 92
 95
 98
 100


=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON


Summary: ( starting phase is reordering, ending phase is factorization )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.002223 s
Time spent in reordering of the initial matrix (reorder)         : 0.750173 s
Time spent in symbolic factorization (symbfct)                   : 0.023511 s
Time spent in data preparations for factorization (parlist)      : 0.002437 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.076703 s
Time spent in allocation of internal data structures (malloc)    : 0.009243 s
Time spent in additional calculations                            : 0.023699 s
Total time spent                                                 : 0.887989 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

< Linear system Ax = b >
             number of equations:           42332
             number of non-zeros in A:      294080
             number of non-zeros in A (): 0.016411

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    30842
             size of largest supernode:               361
             number of non-zeros in L:                3099767
             number of non-zeros in U:                1
             number of non-zeros in L+U:              3099768
             gflop   for the numerical factorization: 0.636571

             gflop/s for the numerical factorization: 8.299116

Does anyone know what is going on?

↧

LAPACK(zsptrf), using Complex has no speed improvement comparing with Complex

July 21, 2017, 4:42 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel(r) MKL-DNN - when should I destroy primitive_desc?

≪ Previous: Trying to skip pardiso reordering.

For function LAPACK(zsptrf) to solve matrix LU, if the matrix data type is Complex<float>, the speed should be faster than Complex<double>. But I did not see the improvement. I use the version mkl-10.3.9. Will the new version has this benefit? Newest version is mkl-11.2.2.

Thanks.

↧

Intel(r) MKL-DNN - when should I destroy primitive_desc?

July 23, 2017, 9:55 am

Latest and popular articles on Intel Technologies

≫ Next: omp_set_num_threads vs mkl_set_num_threads

≪ Previous: LAPACK(zsptrf), using Complex has no speed improvement comparing with Complex

Hi,

Thank you for the open-source MKL-DNN product.

The documentation says little about cleaning up objects created with mkldnn_*_create functions, particularly mkldnn_primitive_desc_t objects. The simple_net.c example makes no calls to mkldnn_primitive_desc_destroy function, and valgrind indeed shows leaked pd objects.

Thanks,
Dima

↧

omp_set_num_threads vs mkl_set_num_threads

July 24, 2017, 10:17 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso pointer (pt) type

≪ Previous: Intel(r) MKL-DNN - when should I destroy primitive_desc?

According MKL documentation, omp_set_num_threads is enough to define the num of threads used by MKL. But that's not the case: I have to use mkl_set_num_threads.

↧

Pardiso pointer (pt) type

July 28, 2017, 6:48 am

Latest and popular articles on Intel Technologies

≫ Next: SVD hangs periodically

≪ Previous: omp_set_num_threads vs mkl_set_num_threads

Hi,

I am using pardiso solver in c++ under visual studio 2012 (x64). What should be the proper type of pt (int or long int). I cam currently using it as void *. Also I am linking with interface layer lp64. Should I be using ilp64?

When I used it as long int, my run would crash in Release mode, but not in debug mode. However it seems to run fine if I use void *

Thanks

Dinesh

↧

SVD hangs periodically

July 28, 2017, 7:10 am

Latest and popular articles on Intel Technologies

≫ Next: Licence problem

≪ Previous: Pardiso pointer (pt) type

We are noticing that SVD, both dgesvd and dgesdd, will hang periodically. The call stack terminates with a call to either one of those functions. Killing the process and rerunning alleviates the problem.

We suspect there's some form of locking going on in SVD somewhere. Something we've tried is the following:

start a new thread to compute SVD
wait until finish or timeout
if timeout, try again
if timeout, throw

What was interesting about this is that the first thread will block (just like before we put the threading in, but the second thread wouldn't even start. Now, there are a limited number of resources and actions that could stop a new thread. One of them is that every thread must call into DllMain of every assembly in the process. If one of the assemblies is doing something non-standard (holding a lock), then you are dead.

We suspect there's a problem somewhere with Intel, and possibly, another library in our stack. We generally do not mess with DllMain and thread registration.

↧

Licence problem

August 1, 2017, 3:45 am

Latest and popular articles on Intel Technologies

≫ Next: Cross Power Spectral Density

≪ Previous: SVD hangs periodically

I have the following error when I try to use icc:

Error: unknown string
License file(s) used were (in this order):
    1.  Trusted Storage
**  2.  /opt/intel/licenses/NCOM____XXXX-XXXXXXXX_1.lic
**  3.  /opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64/../../Licenses
**  4.  /home/user/Licenses
**  5.  /Users/Shared/Library/Application Support/Intel/Licenses
**  6.  /opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64/*.lic

Please visit https://software.intel.com/en-us/faq/licensing#invalid-license-error if you require technical assistance.

icc: error #10052: could not checkout FLEXlm license

The licence file is at this location

/opt/intel/licenses/

My machine does not have an Ethernet port and I used the MAC of my wifi card for the licence. I am not sure if this is related to my issue.

↧

Cross Power Spectral Density

August 1, 2017, 10:16 am

Latest and popular articles on Intel Technologies

≫ Next: real to split complex FT

≪ Previous: Licence problem

Has anyone seen a code for the Cross Power Spectral Density Function -- it exists in MATLAB, but I would rather not use MATLAB for obvious reasons.

John

↧

real to split complex FT

August 1, 2017, 3:40 pm

Latest and popular articles on Intel Technologies

≫ Next: Adding write statements to a subroutine breaks the MKL lib

≪ Previous: Cross Power Spectral Density

Hello there,

I need to do real to complex FT and really want the complex output to be stored in two different arrays (e.g. rdata and idata for the real and imag, respectively). I am able to configure a complex forward FT to use two separate arrays for both the input and output data by setting DFTI_COMPLEX_STORAGE = DFTI_REAL_REAL, but I don't see how I can do the same thing for the complex output in a DFTI_REAL forward FT. I was hoping that DFTI_CONJUGATE_EVEN_STORAGE provides a way for me to do what I need, but this does not seem to be the case. I am really puzzled why MKL does not allow users to store the real to complex FT output in a split array format? Any suggestions on how I may do what fftwf_plan_guru_split_dft_r2c() in FFTW does are much appreciated!

Thanks!

Jinfa

↧

Adding write statements to a subroutine breaks the MKL lib

July 18, 2017, 10:35 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso w/ Quasi Newton method?

≪ Previous: real to split complex FT

Hi all,

Classic strange behaviour with write statements in a subroutine. Adding them breaks the MKL library.

Without the writes, the dolfyn CFD code (written in f2003) runs:

 Flag   Step    Res U     Res V     Res W     Mass      Res k     Res eps   Res T          U         V         W         P         k        eps        T
     :     1:  1.36E-04  7.41E-05  0.00E+00  5.71E-02  0.00E+00  0.00E+00  0.00E+00     4.58E+00 -1.65E+00  0.00E+00 -1.39E+01  0.00E+00  0.00E+00  0.00E+00
     :     2:  9.50E-04  1.02E-03  0.00E+00  3.35E-02  0.00E+00  0.00E+00  0.00E+00     4.73E+00 -1.57E+00  0.00E+00 -1.47E+01  0.00E+00  0.00E+00  0.00E+00

Adding them:

 *** Error: Failed preconditioner of mkl solver         -103
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 Flag   Step    Res U     Res V     Res W     Mass      Res k     Res eps   Res T          U         V         W         P         k        eps        T
     :     1:  1.36E-04  7.41E-05  0.00E+00  3.61E-01  0.00E+00  0.00E+00  0.00E+00     4.68E+00 -2.52E+00  0.00E+00 -2.22E+01  0.00E+00  0.00E+00  0.00E+00
 *** Error: Failed preconditioner of mkl solver         -103
     :     2:  1.24E-03  1.54E-03  0.00E+00  6.94E-02  0.00E+00  0.00E+00  0.00E+00     4.92E+00 -1.41E+00  0.00E+00  1.13E+01  0.00E+00  0.00E+00  0.00E+00
 *** Error: Failed preconditioner of mkl solver         -103

The difference in the two subroutines is:

$ diff solver_mkl_OK.f90 solver_mkl_FOUT.f90
199c199<        write(IOdef,*)'*** Error: Failed preconditioner of mkl solver'
--->        write(IOdef,*)'*** Error: Failed preconditioner of mkl solver ',IERR
238a239>      write(IOdbg,*)'*** start solver dfgmres'
242a244>        !    dfgmres( n,    x,  b,  RCI_request,    ipar,    dpar,    tmp)
243a246>        write(IOdbg,*)'*** solver dfgmres returns with ',RCI_request
250a254,256>          ! multiply the matrix by tmp(ipar(22)), put the result in tmp(ipar(23)),>          ! and return the control to the dfgmres routine>
259c265,266<          write(iodef,*)'*** Error: Failed in mkl solver'
--->          write(IOdbg,*)'*** Error: Failed in mkl solver dfgmres, i =',i,'RCI',RCI_request>          write(iodef,*)'*** Error: Failed in mkl solver dfgmres, i =',i,'RCI',RCI_request
270c277<      write(IOdbg,*)'Leave MKL solve P'
--->      write(IOdbg,*)'Leave MKL solve P',RCI_request

As you can see no significant differences.

The full code, with two versions of subroutines, is in the tarball along with a small demo data set. Unpack it on a linux box with ifort. Change into test and hit:

$ ./go.sh

To reproduce the bug (or not).

$ ifort --version
ifort (IFORT) 17.0.4 20170411
Copyright (C) 1985-2017 Intel Corporation. All rights reserved.

$ uname -a
Linux tennekes 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 16.04.2 LTS
Intel i7-6700HQ CPU @ 2.60GHz
Lenovo Y700 16GB

Any hints or suggestions welcome, but this might interest the developers more.

Thank you!

Henk

Attachment	Size
Download test.tar.gz	638.6 KB

↧

Pardiso w/ Quasi Newton method?

August 3, 2017, 11:41 am

Latest and popular articles on Intel Technologies

≫ Next: MKL Poisson Solver Profiling Question

≪ Previous: Adding write statements to a subroutine breaks the MKL lib

I have read through the page describing using PARDISO to solve a nonlinear set of equations after they have been linearized, but I noticed the algorithm is just Newton's method applied with the PARDISO solving routing.

For faster and more reliable convergence, has anyone created a Quasi Newton solving algorithm using the PARDISO routine during the external iterations?

↧

MKL Poisson Solver Profiling Question

August 4, 2017, 6:00 pm

Latest and popular articles on Intel Technologies

≫ Next: solve Linear system

≪ Previous: Pardiso w/ Quasi Newton method?

Hello all,

I am using MKL to solve a 2D Cartesian Poisson Problem with fixed Neumann boundary conditions.

I have two questions:

(i) based on experiments, I have noticed that the runtime/complexity depends both on (1) the number of intervals and (2) the magnitude of the domain (ax, bx, ay, by). Is this always the case? I expected (1) to be the main parameter which affects performance. I guess this is because internally the algorithm uses an internal grid. If this is the case is there way to make it more granular to increase performance?

(ii) when profiling my code, most of the execution time is spent on one MKL function dgemm():

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call name
72.06    564.09   564.09                             dgemm_

despite that the number of intervals is small, specifically 196 x 196. Why would that be? Is there a way to speed up my code?

Thanks very much in advance.

↧

solve Linear system

August 6, 2017, 2:27 am

Latest and popular articles on Intel Technologies

≫ Next: MKL FATAL Error - Error on loading function mkl_vml_serv_threader_c_1i_2o.

≪ Previous: MKL Poisson Solver Profiling Question

i am struggling with solving dense linear systems. my question is ho i can use LAPACK or is there any internal package in Inter parallel-x Fortran?

kind regards

↧

MKL FATAL Error - Error on loading function mkl_vml_serv_threader_c_1i_2o.

August 7, 2017, 8:49 am

Latest and popular articles on Intel Technologies

≫ Next: MKL Memory Allocator

≪ Previous: solve Linear system

Hi, i am trying out the latest intel python with TBB. i tried 'python -m TBB test.py' which test is a program with lots of PANDAS dataframe and random forest regressors. Any advice what should i be looking at? This is on windows 7 64 bits

↧

MKL Memory Allocator

August 9, 2017, 6:52 am

Latest and popular articles on Intel Technologies

≫ Next: MKL with Spark

≪ Previous: MKL FATAL Error - Error on loading function mkl_vml_serv_threader_c_1i_2o.

Hello,

I am looking for information regarding the Memory Allocator embedded in MKL.

We are using intensively MKL_malloc/MKL_free in a project and are planning to add a memory manager on top of it. Our goal is to reuse aligned memory without freeing it and to have per thread memory pools and to have the ability to do fine tuning on those memory pools. (We are indeed challenged by the memory consumption)

The mkl_disable_fast_mm page refers to a per thread memory pool but with no more details. Does anyone have more information? (lock-free malloc with per thread heap? Monitoring the available memory in the memory pool, etc.)

We are also considering a deactivation of mkl memory manager and rely on intel TBB malloc implementation (either by redefining memory function with i_malloc, or with std::vector> style implementation). Does anyone have feedback of such implementation?

Thank you
Arnaud

↧

MKL with Spark

August 10, 2017, 5:31 am

Latest and popular articles on Intel Technologies

≫ Next: problem with PARDISO and DSS

≪ Previous: MKL Memory Allocator

Hello,

I'm trying to use MKL with Spark using netlib-java. I included the folder containing the dlls in the Path variable and specified the following option to the JVM : -Dcom.github.fommil.netlib.BLAS=mkl_rt.dll

However, it doesn't work and I still get the following warnings :

17/08/10 14:22:09 WARN BLAS: Failed to load implementation from: mkl_rt.dll
17/08/10 14:22:09 WARN BLAS: Using the fallback implementation.

Any help would be greatly appreciated

↧

problem with PARDISO and DSS

August 11, 2017, 2:14 am

Latest and popular articles on Intel Technologies

≫ Next: Linux setcap and MKL

≪ Previous: MKL with Spark

Hello my dear friends ...

I wrote a FORTRAN code for simulating a transient 2D laminar flow in fintite differences method.

I have 2 problems with my code. One of them is with PARDISO and the other is with DSS.

for PARDISO :
My code solves 3 systems of linear equations respectively in each iteration :

The matrices of the coefficients for the first system and the second one are the same. The systems' difference is just the RHS.
The matrices of the coefficients for the third system is different. but the it's dimension is the same.
I used 2 approaches (i.e. 2 SOLVERs in one code in eatch iteration to campare the results of the same system of equations between them):

In the first part of each iteration, I make the matrix of coefficients of each of the 3 systems of equations "by myself" (for example A , B , C) and then I send the system to DGESV to solve. Fortunately it gives me the best results.
In the second part of each iteration, I send the the matrix of coefficients (A , B , C) to the DDNSCSR to make it in CSR form.Then I send it to the PARDISO and it gives me almost the same results as which DGESV had returnet to me.
MY PROBLEM IS : when I make the CSR format by myself and send it to the PARDISO, the results are not the same !!!

I should note that the CSR format which I made is EXACTLY ( !!! ) the same as the CSR format which was made by DDNSCSR ( !!! )
I mean exactly : the inlet CSR format matrices which was sent to PARIDISO, are exactly the same in 2 approach. (I had checked it completely with a separate code which I had writtenbefore)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

for DSS : I don't know anything about the applying the DSS solver. I will send my code (it is the same code but I substitude PARDISO with DSS). I don't know why it didn't work at all. ( I gave the solver from internet)
Thank for your help.

↧

Linux setcap and MKL

August 11, 2017, 10:16 am

Latest and popular articles on Intel Technologies

≫ Next: Using multiple DFTI DESCRIPTOR (FFT in MKL)

≪ Previous: problem with PARDISO and DSS

I have a program using MKL on Linux (Centos 7.3 1611) that runs fine without any setcap capabilities. I would like to adjust thread priorities, so I added CAP_SYS_NICE using setcap. When I run the program, it starts fine. As soon as it tries to run any MKL functions, it fails with an error saying it failed to load mkl_loader. The program runs fine as root with CAP_SYS_NICE set. I have googled around, and have not found a solution that works yet.

↧