Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

Using the Feast Eigen Solver

$
0
0

Hi,

I am tying to use the FEAST eigen solver for soliving sparse and dense eigen-generalized  system

to be integrated in the AMLS method for reasearch purpose. However i did not figure out why i have

like information output (variable info ) iqual to -4 when i call the routine dfeast_scsarv(...).

When searching the meaning of this, there are no report for this output error.

I am really apreciate any help or ideas.

 

Thanks in advance

 

Nelson


Gibbs sampling solution ?

$
0
0

Hi

I'm attempting to write a restricted boltzman machine using Gibbs Sampling for a deep learning neural net . I had a look in MKL and didn't find a specific routine so I had a search on the internet and found a C/Java/Python/R/Scala implementation http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/

I created my own implementation using ifort and MKL based on C code I found there and on referenced pages, I'm not a mathematician but I did physics at university 30yrs ago and have written neural nets before so I can follow a formula and I get the rough gist of gibbs sampling but I'm looking at GS as a black box solution

2 questions -

1 is there a ready made MKL solution?

2 The C code from the web runs in just under 8 seconds on my computer, however the Fortran version using gamma and gaussian distribution takes 55 sec which is slower than python. Now I assume this is because the other web progs are using distributions returning scalars rather than a vector of size 1 like me, plus there is no statement as to correctness of implementation of the C/Java/Python etc libs. Indeed , I changed the return vector size in fortran to a large size and proportionally reduced the loop size and the the MKL implementation comes in under 2 seconds, so I'm obviously not doing a like by like comparison. BUT, my simplistic understanding of Gibbs sampling is that x and y need to be cross related across the 2 distributions and I can't think how to do this with a vector of size > 1 to take advantage of the MKL implementation, any ideas?? (I'm using a Mersenne Twister as a direct comparison - I can cut time in half with a simpler method)

thanks

Steve

include 'mkl_vsl.f90'
PROGRAM Gibbs

    USE IFPORT
    USE MKL_VSL_TYPE
    USE MKL_VSL
    IMPLICIT NONE
REAL(8) START_CLOCK, STOP_CLOCK
INTEGER status,n,i,j, M, thin
REAL(8), DIMENSION(1) :: x,y
TYPE (VSL_STREAM_STATE) :: stream, stream2
REAL(8) alpha, a

!VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE
!VSL_RNG_METHOD_GAMMA_GNORM
!VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACCURATE

START_CLOCK = DCLOCK()

n=1
alpha = 3.0
a=1.0
x(1) = 0.0
y(1) = 0.0
M=50000
thin=1000

status = vslnewstream( stream, VSL_BRNG_SFMT19937, 1777 )
status = vslnewstream( stream2, VSL_BRNG_SFMT19937, 1877 )

! f(x|y) = (x^2)*exp(-x*(4+y*y))               ## a Gamma density kernel
! f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1))  ## a Gaussian kernel

do j=1,M
    do i=1,thin
        status = vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, a, (1.0/(4.0 + y(1)**2) ) )
        status = vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, a, 1.0/sqrt(2*x(1)+2) )
        y(1) = 1.0/(x(1)+1) + y(1)
    enddo
enddo

print*, "X" , x
print*, "Y" , y
STOP_CLOCK = DCLOCK()
print *, 'Gibbs Sampler took:', STOP_CLOCK - START_CLOCK, 'seconds.'

end PROGRAM Gibbs

 

compiler doesn't recongnize mkl_set_exit_handler.

$
0
0

Hi,
I installed the Intel MKL v11.1 update 4 and I am trying to use mkl_set_exit_handler to capture the system errors (#136) we have been seeing lately on our Fortran applications. The compiler doesn't seem to recognize this subroutine as I keep getting the unresolved external symbol error.

Do I need any use statements for to call this mkl subroutine?

I attached my test project for your reference. It is the same as the one in the manual.

Thanks,
Pramod

AttachmentSize
DownloadConsole1_0.zip18.08 KB

mkl fft3w choosing PRECISION SINGLE OR DOUBLE

$
0
0

Hi,

I have installed Intel Composer XE 14.0.2.144.

I can compile fft2w 'single' precision version and 'double' precision version adding  a  compilation parameter:

[PRECISION={MKL_DOUBLE|MKL_SINGLE}]

I do not known how to do the same in fftw3. The make command does not show me this option.

So I do not known how to compile my apps in single or double precision linking mkl fftw3 libraries.

Thanks in advance.

selected inversion with pardiso

$
0
0

Hi, I would like to obtain some elements of the inverse matrix, based on a user selection of the indexes. Is it possible to use the selected inversion process of Pardiso?

Acuattly, I tried to play with iparm(36) and iparm(37) with "local (internal) PARDISO version is  : 103911000", but I obtain an error.

Bug in FEAST Eigensolver

$
0
0

I have discovered what appears to be a fairly serious bug in the FEAST MKL eigensolver that occurs when there are coupled eigenvalues (ie. multiple eigenvalues with the same value). When coupled eigenvalues occur, the FEAST module returns eigenvectors that are a combination of the eigenvectors for each eigenvalue rather than keeping them separate.

For example, if the 1st and 2nd eigenvalues have the same value, FEAST returns a 1st eigenvector that is a combination of what should be separate 1st and 2nd eigenvectors, each factored by a different apparently random value. Similarly, it returns the 2nd eigenvector that is also a combination of the 1st and 2nd eigenvectors.

Has anyone else experienced this and/or found a solution?

I can provide data that illustrates the problem if required.

 

Error -1073741701 while running on remote cluster

$
0
0

I am using remote cluster (supercomputer) to start the .exe file, which was built with Visual Studio 2008 (with C language) using the Intel MKL Library. And it gives me "Task failed during execution with exit code -1073741701". When I run this file on my computer, all is ok. But the cluster opens this file as some another user and I suppose it cannot find Intel MKL files. 

This file is runned from command promt, maybe there are some keys which I can add in command promt to locate this files? Or maybe reason is different? 

FEAST sparse with error

$
0
0

Hi all.

Intel MKL 11.2

Calling dfeast_scsrgv, it returns with info=-4

What is this error?

 


Static linking PGI with MKL - missing `mkl_serv_default_xerbla'

$
0
0

Dear experts,

I am trying to couple PGI compilers with the MKL library in static manner. Please what library should I include/how to rearrange linking command to satisfy missing reference ?

ilias@login-sivvp.ui.savba.sk:/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/./opt/pgi/linux86-64/13.10/bin/pgf90 -Bstatic -Wl,--no-export-dynamic -Wl,-E -DVAR_PGF90 -Bstatic -i8 -g CMakeFiles/cfread.x.dir/utils/cfread.F90.o -o cfread.x -L/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/lib -L/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/pcmsolver-build/external/lib lib/libdirac.a -lpcm -lgetkw /usr/lib64/libz.a lib/libxcfun.a -Wl,--start-group /opt/intel/mkl/lib/intel64/libmkl_lapack95_ilp64.a /opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a -mp -Wl,--end-group -Wl,--no-export-dynamic -lzceh -lstdz -lCz -lstdc++ -Wl,-rpath,/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/lib:/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/pcmsolver-build/external/lib: /opt/intel/mkl/lib/intel64/libmkl_core.a 

/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a(_xerbla.o): In function `XERBLA':

../../../../serv/iface/thunks_ext_to_ker/_xerbla.c:(.text+0x1): undefined reference to `mkl_serv_default_xerbla'

ilias@login-sivvp.ui.savba.sk:/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/.

 

block tridiagonal and block upper hessenberg eigenvalue solvers

$
0
0

Dear all,

I am looking at some options in order to compare the performance of eigenvalue solvers for

+ symmetric block tridiagonal 

+ block upper hessenberg matrices.

If I iterate in a single vector fashion(not in blocks), I can use stevd and hseqr, respectively(I guess), since the manual and selection tree points to these routines.

But if I convert to block iteration mode, is there a direct replacement for these routines when the matrices become block symmetric tridiagonal or block upper hessenberg.

What would be the most efficient way for the computation of the eigenvalues and eigenvectors in the case of block iterations for a symmetric and hessenberg matrix?

Best,

Umut

Extended eigensolver (FEAST) segfaults

$
0
0

Hi,

I'm calling the FEAST eigensolver using matrices assembled by PETSc (sparse CSR) and it segfaults when running PARDISO. Here is the stack:

#8  <signal handler called>
#9  0x00002b0221661836 in mkl_pds_metis_pqueueupdateup () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#10 0x00002b0221666d99 in mkl_pds_metis_fm_2waynodebalance () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#11 0x00002b0221666e10 in mkl_pds_metis_refine2waynode () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#12 0x00002b022165ad3a in mkl_pds_metis_mlevelnodebisectionmultiple () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#13 0x00002b02200d7d30 in mkl_pds_metis_mlevelnesteddissection_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_thread.so
#14 0x00002b022165bd2b in mkl_pds_metis_nodend_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#15 0x00002b02216fdf0b in mkl_pds_reorder1_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#16 0x00002b02216df2ef in mkl_pds_do_all_pardiso_fc () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#17 0x00002b0221619f0a in mkl_pds_pardiso_c () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#18 0x00002b02216f860d in mkl_pds_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#19 0x00002b02219c0479 in mkl_feast_dfeast_scsrgv () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#20 0x00002b021f51a792 in dfeast_scsrgv_ () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so

Can you please point out what could be the problem in my code?

Thanks,

Harshad

uninstall+cluster studio2013 for windows64

$
0
0

hello:

I cant uninstall the cluster studio 2013 for windows64, which always causes a crack. The crack step is shown in figure following.

can you help me?

 

 

 

 

wrong result when using openmp with DCT

$
0
0

Hi,

I am trying to use openmp with DCT transform to speedup the performance. The program works fine when I set omp_num_threads=1, when I set it to >1, I can see the CPU load is doubled but the result is wrong. Below is the code snippet. Could anyone help me out?

======================================================================

! prepare for DCTs
call d_init_trig_transform(nx-1,MKL_COSINE_TRANSFORM,ipar,dpar,ir)
call d_commit_trig_transform(alpha(:,1),handle,ipar,dpar,ir)

! forward transform
!$OMP PARALLEL DO
do i = 1,ny,1
    call d_forward_trig_transform(alpha(:,i),handle,ipar,dpar,ir)
end do
!$OMP END PARALLEL DO

! SOME PROCESSING IN FREQUENCY DOMAIN

! inverse transform
!$OMP PARALLEL DO
do i = 1,ny,1
    call d_backward_trig_transform(alpha(:,i),handle,ipar,dpar,ir)
end do
!$OMP END PARALLEL DO

! clean up transform
call free_trig_transform(handle,ipar,ir)

 

The compiling problems in intel ODE solver

$
0
0

Hi,

I am installing intel ODE solver on a linux 64-bit OS system
I used the intel fortran compiler.

Following the installation manual, I've tried to run the examples by doing :

ifort -static iode_example_f.f -I../../include -L../../lib/intel64 -liode_intel64 -lm -o iode.out

However, I've got the following error message: 
ld: cannot find -liode_intel64

I really have no idea why it can't find this.

Would you give me some comments on it???

Regards,

Belmiro
 

 

 

 

 

Linear systems with complex-number

$
0
0

dear all,

I would like to know if there is some MKL routines to solve a sparse linear system where some elements are complex number.

Thanks a lot


Register Now for the Webinar: New Intel® Math Kernel Library Features Boost Performance for Tiny and Gigantic Computations

$
0
0

Title: New Intel® Math Kernel Library Features Boost Performance for Tiny and Gigantic Computations

 

Date/Time: Tue, Oct 28, 2014 9:00 AM - 10:00 AM PDT

 

Registration Link: https://www1.gotomeeting.com/register/867043545

 

Description of the Webinar: Intel® Math Kernel Library (Intel® MKL) is a computational math library aimed at unleashing performance on Intel® architecture. Designed for scientific, engineering, and financial applications, it efficiently handles both very small and very large computations. Here, we’ll introduce two new features in Intel MKL. The first, helps programmers to boost performance on a single CPU core with minimal effort when dealing with small data sets (for example, matrix multiplication for tiny matrices). The second, at the other end of the spectrum, efficiently solves large-scale sparse linear systems with tens of millions of equations on clusters. We’ll focus on usage models and APIs for these new features and share relevant performance data

 

Thank you,

Intel MKL Team

triangular solver

$
0
0

I am using the mkl_?coosv, specifically one with d, and compiled the program with 'ifort -openmp -mkl'. I have set 'mkl_omp_num_threads', and also 'omp_proc_bind=true'. I have tested it for three different thread numbers 4, 8, 16. I am getting the following timings: 0.36, 0.3, and 0.32. I am running it on a machine with 16 cores. are these timings reasonable? or are there anything else I should be doing before doing the runs. thanks.

Dense * Sparse matrix calculations, is there an easier way

$
0
0

I am porting code (C, so row major) which makes use of ?gemm, ?syrk and ?syr2k calls from dense matrices to sparse matrices, and would to know if there is a simpler way of calculating the various internal matrix products than the following:

 

?syrk: use ?csrmultd. I assume since this method only allows 1 based indexing the resulting dense matrix is column major, but I would like confirmation.

 

?syr2k: use two ?gemm calls here instead.

 

?gemm: This case gets fairly complicated, and it would be extremely nice if someone can tell me if there are methods / options I am overlooking which would simplify this. A and D are the dense result and multiplicand matrices, S['] is a sparse matrix which may be transposed.

A = S[']  * D  + A
 - Use mkl_dcsrmm directly (using zero based indexing)

A = S[']  * D' + A
Either
 - Transpose D -> Dt
 => A = S[']  * Dt + A
 - use mkl_dcsrmm (using zero based indexing)
Or
 - Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (D' -> Dt)
 - Calculate temp. matrix Tt = S['] * Dt via mkl_dcsrmm
 - T' (row major) = Tt (col. major)
 - calculate A = T' + A;

A = D  * S[']  + A
 - Transpose equation:
 -> A' = (D * S['])' + A' = S[!'] * D' + A'
Either:
 - Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (A' -> At, D' -> Dt)
 => At = S[!'] * Dt + At
 -> Use mkl_dcsrmm
Or:
 - Transpose A' => At, D' => Dt
 => At = S[!'] * Dt + At
 - Use mkl_dcsrmm  (using zero based indexing)
 - Transpose At

A = D' * S['] + A
 -> Transpose equation:
 -> A' = (D' * S['])' + A' = S[!'] * D + A'
Either:
 - Transpose A' => At
 => At = S[!'] * D + At
 -> Use mkl_dcsrmm  (using zero based indexing)
 -> Transpose At => A'
Or:
 -> Calculate temp. matrix T = S['] * D via mkl_dcsrmm
 -> Calculate A = T' + A

In theory I could always store my sparse matrix as one based, and if I need to treat it as zero based add dummy rows / columns to the dense matrices to catch the additional row/column created when multiplying, which means converting between the indexing won't take any time.

 

One final question: Could someone confirm that it is possible to use mkl_?omatadd to calculate A = A + B without using a temp matrix? The documentation doesn't state whether the memory is allowed to overlap between input and output if no transposition is being done.

consecutive call of pardiso

$
0
0

Hi.

When sovling a set of linear equations, a typical example program calls pardiso four times,

with parameter phase = 11, 22, 33, and -1.

If i want to solve n of linear systems, each with the same structure, i will call pardiso 4n times.

Is there any simpler way that i just initialize once, do all the calculations and realease once to

make less than 4n calls?

I guess calls with phase = 11,  (22, 33, 0),  (22, 33, 0), ... , 22,33,-1 will work, but i'm not sure.

 

Performance of matmul vs dgemm for small size matrices

$
0
0

Hi,

my question is regarding improving the performance of following line:

------------------------

MKM = MD*FA1 - MATMUL(MATMUL(MATMUL(ME,MQ),TRANSPOSE(MG)),TRANSPOSE(ME)) + MATMUL(MATMUL(MATMUL(ME,MG),VA),VR) 

------------------------

this line is executed for every element within a finite element implementation and is the bottleneck according to performance wizard.

All the matrices are max 12x12 by size. I have tried using DGEMM in the following way:

------------------------

CALL DGEMM('N', 'N', 12, 3,  12, 1.0D0, ME,      12, MQ, 12, 0, MDUMMY3, 12)

CALL DGEMM('N', 'T', 12, 12, 3,  1.0D0, MDUMMY3, 12, MG, 12, 0, MDUMMY4, 12)

CALL DGEMM('N', 'T', 12, 12, 12, 1.0D0, MDUMMY4, 12, ME, 12, 0, MDUMMY5, 12)

CALL DGEMM('N', 'N', 12, 3,  12, 1.0D0, ME,      12, MG, 12, 0, MDUMMY6, 12)

CALL DGEMM('N', 'N', 12, 1,  3,  1.0D0, MDUMMY6, 12, VA, 12, 0, MDUMMY7, 12)

CALL DGEMM('N', 'N', 12, 12, 1,  1.0D0, MDUMMY7, 12, VR, 1,  0, MDUMMY8, 12)

MKM = MD*FA1 - MDUMMY5 + MDUMMY8 

------------------------

however it did not provide any improvement (I think it was even a little bit slower).

I was wondering if you would know if any MKL function or setting would help to speed up this line.

Thank you very much in advance,

Murat

 

 

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>