Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

A quick question on using PARDISO as iterative solver

$
0
0

Hello,

I am trying to follow the example described in this thread to use PARDISO as an iterative solver,

http://software.intel.com/en-us/forums/topic/326721

For now my code works fine if I put iparm(4)=0 and phase=13 for all matrices, but when I modify the code to use the template above, error message appears sometimes (error=-1 in the numerical factorization phase), so I think I must did something wrong. To clarify it, is it correct for me to do the following for A_i x_i = b_i, where the right-hand-side b_i is determined by x_i through some other subroutines (it is a self-consistent calculation),

----------------------------

pt=0
call PARDISO (phase=13, iparm(4)=0, A_1, pt, maxfct, mnum=1, x_1, b_1)

do while ( error > tolerance )

 do i = 1, maxfct
  call PARDISO (phase=23, iparm(4)=61, A_i, pt, maxfct, mnum=i, x_i, b_i)
 end do

 call subroutine to calculate new b_i from the x_i, and calculate the error

end do

 call PARDISO (phase=-1, iparm(4)=61, A_1, pt, maxfct, mnum=1, x_1, b_1)

----------------------------

Just for reference, the other iparm parameters I am using are (these should be the same as the example file pardiso_unsym_complex_f.f ):

  iparm(1) = 1
  iparm(2) = 2
  iparm(3) = 1
  iparm(8) = 2
  iparm(10) = 13
  iparm(11) = 1
  iparm(13) = 1
  iparm(18) = -1
  iparm(19) = -1

, and others are set to be zero. The matrix type is mtype=3 (complex strucuturely-symmetric)

By the way, I can not see any example code for using Pardiso as iterative solver in the MKL package. The examples codes I found are using Pardiso as direct solver. Did I missout something or there is no example code for this kind of applications? Thank you and I appreciate it.

Best regards,

CC


Problem about using FGMRES

$
0
0

I write a code in C using mkl FGMRES to solve a Ax=b linear system. I follow the example code given in mkl handbook here: http://sepwww.stanford.edu/sep/claudio/Research/Prst_ExpRefl/ShtPSPI/int..., which use ILU0 as the preconditioner. The structure of the code is almost the same as the example code so I don't put it here. Now I met a problem. For some problems, my code can solve the problem successfully; but sometimes it can not. 

For example, today it happens like this: I put A, b and initial guess solution into the code, when it run after calling dfgmres function, depending on the value of RCI_request parameter, it will do different operations (for example if RCI_request = 0, then we get the answer; if RCI_request = 1, we should do what what). This example will keep repeat RCI_request = 1 and RCI_request = 3, and the while loop in the code will not stop and I can not get the answer. My guess is that the matrix A doesn't have a good property (for example condition number is too large,etc) and the solver can not deal with it. But I am not sure about this. My question is in what condition the RCI_request can repeat between 1 and 3? How can I make sure this is because of  the bad property of the input matrix A? If this is ture, does it mean FGMRES can not solve this problem?

Another question is about the preconditioner. In the example code, it used ilu0 as the preconditioner. If the solver can not deal with one problem well, if I change to a different preconditioner (like LU), will it help or not?  

Also, which parameter, the number of rows N in the matrix, or the number of non-zeros M in matrix A will determine the scalibility of this FGMERES solver?

Thanks.

C.J.X

question on PARDISO iterative solver

$
0
0

Hello,

I can successfully use the PARDISO to solve my problem with iparm(4)=0. However when I try to use it as an iterative solver by setting iparm(4)=61 and keepping all the other parameters the same, it gives error. I turn on msglvl=1 to check the detail, and it says:

---------------------------------------------------------------------------------------------------------------------------------

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
 37 %  87 %  100 %
*** Error in PARDISO  ( numerical_factorization) error_num= -1
*** Error in PARDISO: cgs error iparam(20) -22

=== PARDISO: solving a complex nonsymetric system ===
The local (internal) PARDISO version is                          : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Minimum degree algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON

Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000010 s
Time spent in reordering of the initial matrix (reorder)         : 0.000014 s
Time spent in symbolic factorization (symbfct)                   : 0.000065 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.000088 s
Time spent in iterative solver at solve step (cgs)               : 0.000259 s cgx iterations -22

Time spent in allocation of internal data structures (malloc)    : 0.002747 s
Time spent in additional calculations                            : 0.000025 s
Total time spent                                                 : 0.003208 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 6
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b >
             number of equations:           4
             number of non-zeros in A:      8
             number of non-zeros in A (%): 50.000000

             number of right-hand sides:    4

< Factors L and U >
< Preprocessing with multiple minimum degree, tree height >
< Reduction for efficient parallel factorization >
             number of columns for each panel: 128
             number of independent subgraphs:  0
             number of supernodes:                    2
             size of largest supernode:               2
             number of non-zeros in L:                8
             number of non-zeros in U:                1
             number of non-zeros in L+U:              9
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000364
---------------------------------------------------------------------------------------------------------------------------------

In this example I use phase=13. I also tried to use phase=11 with iparm(4)=0 and then followed by phase=23 with iparm(4)=61. It gives similar error message.

For the above example, the other non zero iparms I use is:

iparm(1)=1

iparm(3)=1

iparm(10)=13

. I also tried to use the iparm in the example file pardiso_unsym_complex_f.f , but similar error message appears.

How should I modified the code to make the iterative solver works? Thank you.

Best regards,

CC

update MKL 2013.4.190 fails to install documentation in MS Help Viewer

$
0
0

After upgrade to Intel Visual Fortran Compser XE 2013.4, including MKL, for Windows (released on May 29),

no MKL documentation has been integrated in MSHelp 2.0, due apparently non-updated links (see image below)

There are already 9 Sticky Topic threads on MKL forum

$
0
0

A message to MKL forum moderator.

As we can see there are already 9 Sticky Topic threads on MKL forum. My question is why wouldn't you convert some threads to articles?

If you create a couple of more Sticky Topic threads they will cover the main page completely.

Thanks in advance.

 

Upcoming webinar: Accelerating financial services applications using Intel Parallel Studio XE

$
0
0

Accelerating financial services applications using Intel® Parallel Studio XE with the Intel® Xeon Phi™ Coprocessor

Join us for a Webinar on June 4

Space is limited.
Reserve your Webinar seat now at:
https://www1.gotomeeting.com/register/753286969

Improving the performance software applications is a constant challenge for software developers in the financial services industry.  This webinar provides an overview of how to accelerate these computations, especially Monte Carlo and Black-Scholes, using a combination of the new Intel® Xeon Phi™ coprocessor and the Intel® Parallel Studio XE suite of software development tools.  Areas explored include performance analysis, threading, vectorization, math library usage and compiler optimization.

Title:

Accelerating financial services applications using Intel® Parallel Studio XE with the Intel® Xeon Phi™ Coprocessor

Date:

Tuesday, June 4, 2013

Time:

9:00 AM - 10:00 AM PDT

After registering you will receive a confirmation email containing information about joining the Webinar.

System Requirements
PC-based attendees
Required: Windows® 7, Vista, XP or 2003 Server

Mac®-based attendees
Required: Mac OS® X 10.6 or newer

Mobile attendees
Required: iPhone®, iPad®, Android™ phone or Android tablet

Compiling SuiteSparse (UMFPACK) with Intel MKL BLAS/LAPACK (Linux)

$
0
0

Hello,

I am trying to compile the newest version of SuiteSparse linked with the newest Intel MKL. However I get unresolved external errors pointing to BLAS functions (all BLAS functions required by SS). How should I modify make file  SuiteSparse_config/SuiteSparse_config.mk to use Intel libs? I have tried (in my knowing) everything and still there problem occurs, and I am unable to investigate it.This is where I am stuck at:

    BLAS = -L/opt/intel/mkl/lib/intel64 -lmkl_core -lmkl_intel_lp64 -static -lmkl_blas95_lp64
    LAPACK = -L/opt/intel/mkl/lib/intel64 -lmkl_core -lmkl_intel_lp64 -static -lmkl_lapack95_lp64

However still it brings me no closer to the solution. Somehow I feel that the problem might be in using static libs. How do I link them correctly?

Without BLAS library compiles just fine.

I'll be grateful for any suggestions.

Regards,

Misery

Intel® Math Kernel Library 11.0 update 4 is now available

$
0
0

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance. The Intel MKL 11.0 Update 4 packages are now ready for download. Intel MKL is available as a stand-alone product and as a part of the Intel® Parallel Studio XE 2013, Intel® C++ Studio XE 2013, Intel® Composer XE 2013, Intel® Fortran Composer XE 2013, and Intel® C++ Composer XE 2013. Please visit the Intel® Software Evaluation Center to evaluate this product.

Intel® MKL 11.0 Bug fixes

What's New in Intel® MKL 11.0 Update 4 : Release Notes


getrf Fortran 77 Call Corrupts Integers

$
0
0

I have a helper subroutine that basically calls the MKL getrf/getrs functions like this:

 

  INTERFACE GETRF
          SUBROUTINE SGETRF(M,N,A,NMAX,IR,ISING)
              INTEGER NMAX,M,N,ISING
              INTEGER, DIMENSION(NMAX) :: IR
              REAL(4), DIMENSION(NMAX,NMAX) :: A
          END SUBROUTINE
          SUBROUTINE DGETRF(M,N,A,NMAX,IR,ISING)
              INTEGER NMAX,M,N,ISING
              INTEGER, DIMENSION(NMAX) :: IR
              REAL(8), DIMENSION(NMAX,NMAX) :: A
          END SUBROUTINE
      END INTERFACE GETRF
C
      INTERFACE GETRS
          SUBROUTINE SGETRS(TRANS,M,N,A,NMAX,IR,B,NMAX2,ISING)
              INTEGER NMAX,NMAX2,M,N,ISING
              INTEGER, DIMENSION(NMAX) :: IR
              CHARACTER TRANS
              REAL(4), DIMENSION(NMAX) :: B
              REAL(4), DIMENSION(NMAX,NMAX) :: A
          END SUBROUTINE
          SUBROUTINE DGETRS(TRANS,M,N,A,NMAX,IR,B,NMAX2,ISING)
              INTEGER NMAX,NMAX2,M,N,ISING
              INTEGER, DIMENSION(NMAX) :: IR
              CHARACTER TRANS
              REAL(8), DIMENSION(NMAX) :: B
              REAL(8), DIMENSION(NMAX,NMAX) :: A
          END SUBROUTINE
      END INTERFACE GETRS

I just upgraded from Update 2 to Update 4 of the 2013 compiler (version 13.1.2.190).

My incoming real arguments can be either real(4) or real(8).  My integer arguments are always (project setting /integer_size:64) integer(8).

If I run this code with real(8) and Win32 project configuration, the IR() array gets corrupted.  If I run it with real(8) and x64 project configuration, IR() does not get corrupted.  What am I doing wrong now?

Benchmarking MKL Lapack on ccNUMA systen

$
0
0

I work with a large ccNUMA SGI Altix system. One of our users is trying to benchmark some LAPACK routines on our system and is getting some disappointing scaling - stops scaling after 4 threads.

The test I am running is of diagonalizing a 4097x4097 matrix of double precision floats. It uses the routine DSYEV.

From analysing the hotspots in VTune, I find that almost all the time is spent in overhead and spin time from the functions:

[OpenMP dispatcher]<- pthread_create_child and in [OpenMP fork].

The code was compiled using ifort with the options: -O3 -openmp -g -traceback -xHost -align -ansi-alias -mkl=parallel.  Using version 13.1.0.146 of the compiler and version 11 of MKL. The system is made up of 8 core Xeon sandy bridge sockets.

The code was ran with the envars:

OMP_NUM_THREADS=16
MKL_NUM_THREADS=16
KMP_STACKSIZE=2gb
OMP_NESTED=FALSE
MKL_DYNAMIC=FALSE
KMP_LIBRARY=turnaround
KMP_AFFINITY=disabled

It is also ran with the SGI command for NUMA systems 'dplace -x2' which locks the threads to their cores.

So I suspect that there is something up with the options for the MKL, or the library isn't configured properly for our system. I have attached the code used.

Does anybody have any ideas on this?

Jim

AttachmentSize
Downloadcode.tar.gz2.02 KB

Matrix Multiplication problem: Wrong results in a 1st row of a resulting matrix C ( C[8][8] = A[8][8] x B[8][8] )

$
0
0

I detected some problem with SGEMM and DGEMM functions. In essence, there are wrong numbers in a 1st row of a resulting matrix C ( C[8][8] = A[8][8] x B[8][8] ).

It is verified with 32-bit and 64-bit versions of MKL:
...
#define __INTEL_MKL__ 10
#define __INTEL_MKL_MINOR__ 3
#define __INTEL_MKL_UPDATE__ 12
...
and
...
#define __INTEL_MKL__ 11
#define __INTEL_MKL_MINOR__ 0
#define __INTEL_MKL_UPDATE__ 2
...
using Intel, Microsoft, MinGW and Borland C++ compilers.

 

Intel going over to the dark side?

$
0
0

I cannot install mkl and I own it, viz., I own the version which I tried to install.  My license was perfectly valid for 10.3.11 though it did later expire.  But I am only trying to re-install 10.3.11.  I know Intel has never claimed to be a pure rental model, therefore this must be illegal.  I am very angry at having wasted my whole evening trying to install mkl 10.3.11.  Please help soon, as I am starting to like the idea of giving Intel some very bad advertising.
Sincerely, Paul F Ringseth -- pringseth@gmail.com

Best function for inplace matrix addition (w. stride)

$
0
0

I often need to calculate the sum of a set of matrices or submatrices of a dataset. Unfortunately the two matrices do not always have the same stride, when I am selectively using a subset of a large dataset, which means I have to resort to calculating the sum by hand (alternatively, I could call vkadd or similar once per row, I'm not sure how much overhead this implies when calling vkadd 500 or 1000 times for a 500x500 matrix).

I am aware of the mkl_?omatadd function, but the documentation states that the input and output arrays cannot overlap, which means I would need an extra temporary matrix. While I would assume calculating A = A + m * B works inplace when not transposing matrices, unless this can be guaranteed for all future versions I cannot use that approach.

Are there any other functions which could be used for this calculation I have missed?

OpenMP very slow when run outside of Visual Studio

$
0
0

Since we are using intel MKL library we have to load INTEL's OpenMP library (libiomp5md.dll) at run time and exclude vcomp.lib at link time. But we have to compile and link with VC++. With my release 64 bit build if I run it directly, part of my code won't fully utilize the cores I specified and it runs very slowly. It seems to be using multiple cores but might be even slower than one core. If I attach it (release build) to the visual studio debugger without doing anything else, then it fully utilize the cores I specified. Does anybody have any ideas?

We are using Visual Studio 2010 on Window 7 professional. libiomp5md.dll shows file version of 5.0.2012.803.

Use of Scalapack for solving general square system of linear equations

$
0
0

Hi!

I have used FGMRES from mkl recently and since mkl doesn't support parallelization of FGMRES on multiple processors (only multithreading!), I would like to try my hands on Scalapack, which is able to solve the linear system of equations in a direct sense on multiple processors.

I was studying p?getrs and found the mention of a distributed matrix. Physically this makes sense to me as I can imagine that the actual matrix will be split into various sub matrices. Each of these would then be solved in parallel on multiple processors. BUT there's some information missing or perhaps I missed it:(?)

1. Should the user provide the sub matrices to all processors or will mkl scalapack do this automatically? If the user should provide this, what is the format and on what basis the partitioning must be performed. Will it be okay to use a library like ParMETIS to do this? If not, then does that mean the code runs sequentially to start with and then broadcasts the sub matrices to respective processors?

2. Also, I expect some communication between processors when the sub matrices are solved in parallel. There is no mention of this either.

3. In the examples folder, I was unable to locate codes which give the actual calling sequence of sub routines for scalapack like for other solvers like FGMRES. Is this because this is rather trivial and needs just a factorization call followed by the call to the linear solver?

Many Thanks,

Amar   


MKL v8.1 or 9 Download

$
0
0

hello,

 

I need the MKL-Library for CATIA V5R18 Analysis under Windows 7 64bit

the only woking Version can be v8.1 and v9

but i dont can download this on the homepage.

 

And where can i buy this. there are also only offers for v10 an v.11 but no older version offer.

thanks

 

Issue introduced in MKL 11.0 Update 4 (64-bit Linux only)

$
0
0

After installing MKL 11.0 Update 4 over MKL 11.0 Update 2 on Linux our QA process is SIGSEGV at...

#0  0x00002aaab745874a in mkl_serv_malloc ()
 #1  0x00002aaab7f6bbcc in mkl_blas_mc3_dgemm_get_bufs ()
 #2  0x00002aaab6ae8a99 in mkl_blas_mc3_xdgemm_par ()
#3  0x00002aaab4c2cf74 in mkl_blas_xdgemm_par ()
 #4  0x00002aaab4b81ecb in mkl_blas_dgemm_2d_bsrc ()
 #5  0x00002aaab4b7b489 in gemm_host ()
 #6  0x00002aaabb92b4f3 in L_kmp_invoke_pass_parms ()
   from /opt/intel/composer_xe_2013.4.183/compiler/lib/intel64/libiomp5.so

100% reproducible in certain cases.

Reverting to MKL Update 2 solves the issue.

Seems to happen after many iterations , and many threads computation created/destroyed.

Note we are running multiple (boost) threads that call MKL. We call MKL_Thread_Free_Buffers at the completion of each thread.

Solver 3d Poisson equation in cylindrical coordinate

$
0
0

Dear colleagues,
in SB of RAS, Novosibirsk there is a great need to solve 3d Poisson equation in cylindrical coordinate (the MKL has 3d solver only in cartesian coordinate). Can Intel expand the MKL in such solver?

Pardiso output error after symbolic factorization

$
0
0

Hi, 

Attached contains a test code and input file. When I link the code with MKL 10.2.3.029, pardiso produce no error information. When I link to either MKL 11.0.3, the following information is outputted. At the end, there is an error information. I also tried 11.0.4, the outputs are similar.

The problem originated on Linux. Since I observe the strange behavior on windows, I would like to get someone's help to resolve this windows problem first.

Thanks!

Xin

output:

===================================================

clean

The local (internal) PARDISO version is : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON

Summary: ( reordering phase )
Time numfct :

Time malloc :
Time cgs :
Time spent in calculations of symmetric matrix portrait (fulladj): 0.006105 s
Time spent in reordering of the initial matrix (reorder) : 0.567821 s
Time spent in symbolic factorization (symbfct) : 0.060449 s
Time spent in data preparations for factorization (parlist) : 0.005577 s
Time spent in allocation of internal data structures (malloc) : 0.011062 s
Time spent in additional calculations : 0.047440 s
Total time spent : 0.698454 s

#non-zeros in A: 0
Time solve :
#non-zeros in A: 2
#right-hand sides: 0

< Factors L and U >
#columns for each panel: 100283
#independent subgraphs: 458666
#independent subgraphs: -334499459

< with multiple minimum degree on the separator nodes >

< no multiple minimum degree on the separator nodes >
#supernodes: 80
size of largest supernode: 0
number of nonzeros in U 0
gflop for the numerical factorization: 0.000000
||A|| 0.000000
L&U for matrix number 2683178 deleted
Input error is not equal to ZERO, error = 1

==========================================================

AttachmentSize
Downloadpardisocrash.7z9.29 MB

Scaling with least squares

$
0
0

Hello,

I am trying to fit some data using higher order polynomials. The data has 15000 points with ranges as below:

X (independent):  Min Value = 100000, Max Value = 6000000

Y (dependent): Min Val = 150,000, Max Val = 560,000

I am using the GELS least squares driver (SVD method). For the coefficient matrix, I am scaling each value by the respective column average. I still calculate x^20 for all x observations, then calculate the average and then scale the column values.

For a polynomial of order 20, I get results from the code and these values differ starting at 2nd or 3rd decimal place, as compared to values obtained using a commercially available statistical analysis software, which give more accurate predictions.

How can I improve the accuracy of the least squres fit? I see the following issues, but havent found a solution yet:

1. When I calculate the powers (x^16, x^17...etc), for the coefficient matrix, there may be some precision issues.

2. Is my scaling correct? Or should I use something like ( x - mean_x)/ (stddev_x) [ I just found this via Google]. In this case, how do I get the correct coefficients back?

Thank you for your advice.

-V

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>