I'm attempting to write a restricted boltzman machine using Gibbs Sampling for a deep learning neural net . I had a look in MKL and didn't find a specific routine so I had a search on the internet and found a C/Java/Python/R/Scala implementation http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/

I created my own implementation using ifort and MKL based on C code I found there and on referenced pages, I'm not a mathematician but I did physics at university 30yrs ago and have written neural nets before so I can follow a formula and I get the rough gist of gibbs sampling but I'm looking at GS as a black box solution

2 questions -

1 is there a ready made MKL solution?

2 The C code from the web runs in just under 8 seconds on my computer, however the Fortran version using gamma and gaussian distribution takes 55 sec which is slower than python. Now I assume this is because the other web progs are using distributions returning scalars rather than a vector of size 1 like me, plus there is no statement as to correctness of implementation of the C/Java/Python etc libs. Indeed , I changed the return vector size in fortran to a large size and proportionally reduced the loop size and the the MKL implementation comes in under 2 seconds, so I'm obviously not doing a like by like comparison. BUT, my simplistic understanding of Gibbs sampling is that x and y need to be cross related across the 2 distributions and I can't think how to do this with a vector of size > 1 to take advantage of the MKL implementation, any ideas?? (I'm using a Mersenne Twister as a direct comparison - I can cut time in half with a simpler method)

thanks

Steve

include 'mkl_vsl.f90'
PROGRAM Gibbs

USE IFPORT
USE MKL_VSL_TYPE
USE MKL_VSL
IMPLICIT NONE
REAL(8) START_CLOCK, STOP_CLOCK
INTEGER status,n,i,j, M, thin
REAL(8), DIMENSION(1) :: x,y
TYPE (VSL_STREAM_STATE) :: stream, stream2
REAL(8) alpha, a

!VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE
!VSL_RNG_METHOD_GAMMA_GNORM
!VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACCURATE

START_CLOCK = DCLOCK()

n=1
alpha = 3.0
a=1.0
x(1) = 0.0
y(1) = 0.0
M=50000
thin=1000

status = vslnewstream( stream, VSL_BRNG_SFMT19937, 1777 )
status = vslnewstream( stream2, VSL_BRNG_SFMT19937, 1877 )

! f(x|y) = (x^2)*exp(-x*(4+y*y)) ## a Gamma density kernel
! f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1)) ## a Gaussian kernel

do j=1,M
   do i=1,thin
       status = vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, a, (1.0/(4.0 + y(1)**2) ) )
       status = vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, a, 1.0/sqrt(2*x(1)+2) )
       y(1) = 1.0/(x(1)+1) + y(1)
   enddo
enddo

print*, "X" , x
print*, "Y" , y
STOP_CLOCK = DCLOCK()
print *, 'Gibbs Sampler took:', STOP_CLOCK - START_CLOCK, 'seconds.'

end PROGRAM Gibbs

↧

compiler doesn't recongnize mkl_set_exit_handler.

October 20, 2014, 3:40 pm

Latest and popular articles on Intel Technologies

≫ Next: mkl fft3w choosing PRECISION SINGLE OR DOUBLE

≪ Previous: Gibbs sampling solution ?

Hi,
I installed the Intel MKL v11.1 update 4 and I am trying to use mkl_set_exit_handler to capture the system errors (#136) we have been seeing lately on our Fortran applications. The compiler doesn't seem to recognize this subroutine as I keep getting the unresolved external symbol error.

Do I need any use statements for to call this mkl subroutine?

I attached my test project for your reference. It is the same as the one in the manual.

Thanks,
Pramod

Attachment	Size
Download Console1_0.zip	18.08 KB

↧

mkl fft3w choosing PRECISION SINGLE OR DOUBLE

October 21, 2014, 1:56 am

Latest and popular articles on Intel Technologies

≫ Next: selected inversion with pardiso

≪ Previous: compiler doesn't recongnize mkl_set_exit_handler.

Hi,

I have installed Intel Composer XE 14.0.2.144.

I can compile fft2w 'single' precision version and 'double' precision version adding a compilation parameter:

[PRECISION={MKL_DOUBLE|MKL_SINGLE}]

I do not known how to do the same in fftw3. The make command does not show me this option.

So I do not known how to compile my apps in single or double precision linking mkl fftw3 libraries.

Thanks in advance.

↧

selected inversion with pardiso

October 21, 2014, 2:23 am

Latest and popular articles on Intel Technologies

≫ Next: Bug in FEAST Eigensolver

≪ Previous: mkl fft3w choosing PRECISION SINGLE OR DOUBLE

Hi, I would like to obtain some elements of the inverse matrix, based on a user selection of the indexes. Is it possible to use the selected inversion process of Pardiso?

Acuattly, I tried to play with iparm(36) and iparm(37) with "local (internal) PARDISO version is : 103911000", but I obtain an error.

↧

Bug in FEAST Eigensolver

October 21, 2014, 6:34 pm

Latest and popular articles on Intel Technologies

≫ Next: Error -1073741701 while running on remote cluster

≪ Previous: selected inversion with pardiso

I have discovered what appears to be a fairly serious bug in the FEAST MKL eigensolver that occurs when there are coupled eigenvalues (ie. multiple eigenvalues with the same value). When coupled eigenvalues occur, the FEAST module returns eigenvectors that are a combination of the eigenvectors for each eigenvalue rather than keeping them separate.

For example, if the 1st and 2nd eigenvalues have the same value, FEAST returns a 1st eigenvector that is a combination of what should be separate 1st and 2nd eigenvectors, each factored by a different apparently random value. Similarly, it returns the 2nd eigenvector that is also a combination of the 1st and 2nd eigenvectors.

Has anyone else experienced this and/or found a solution?

I can provide data that illustrates the problem if required.

↧

Error -1073741701 while running on remote cluster

October 22, 2014, 11:11 am

Latest and popular articles on Intel Technologies

≫ Next: FEAST sparse with error

≪ Previous: Bug in FEAST Eigensolver

I am using remote cluster (supercomputer) to start the .exe file, which was built with Visual Studio 2008 (with C language) using the Intel MKL Library. And it gives me "Task failed during execution with exit code -1073741701". When I run this file on my computer, all is ok. But the cluster opens this file as some another user and I suppose it cannot find Intel MKL files.

This file is runned from command promt, maybe there are some keys which I can add in command promt to locate this files? Or maybe reason is different?

↧

FEAST sparse with error

October 23, 2014, 9:37 am

Latest and popular articles on Intel Technologies

≫ Next: Static linking PGI with MKL - missing `mkl_serv_default_xerbla'

≪ Previous: Error -1073741701 while running on remote cluster

Hi all.

Intel MKL 11.2

Calling dfeast_scsrgv, it returns with info=-4

What is this error?

↧

Static linking PGI with MKL - missing `mkl_serv_default_xerbla'

October 23, 2014, 9:46 am

Latest and popular articles on Intel Technologies

≫ Next: block tridiagonal and block upper hessenberg eigenvalue solvers

≪ Previous: FEAST sparse with error

Dear experts,

I am trying to couple PGI compilers with the MKL library in static manner. Please what library should I include/how to rearrange linking command to satisfy missing reference ?

ilias@login-sivvp.ui.savba.sk:/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/./opt/pgi/linux86-64/13.10/bin/pgf90 -Bstatic -Wl,--no-export-dynamic -Wl,-E -DVAR_PGF90 -Bstatic -i8 -g CMakeFiles/cfread.x.dir/utils/cfread.F90.o -o cfread.x -L/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/lib -L/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/pcmsolver-build/external/lib lib/libdirac.a -lpcm -lgetkw /usr/lib64/libz.a lib/libxcfun.a -Wl,--start-group /opt/intel/mkl/lib/intel64/libmkl_lapack95_ilp64.a /opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a -mp -Wl,--end-group -Wl,--no-export-dynamic -lzceh -lstdz -lCz -lstdc++ -Wl,-rpath,/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/lib:/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/external/pcmsolver-build/external/lib: /opt/intel/mkl/lib/intel64/libmkl_core.a

/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a(_xerbla.o): In function `XERBLA':

../../../../serv/iface/thunks_ext_to_ker/_xerbla.c:(.text+0x1): undefined reference to `mkl_serv_default_xerbla'

ilias@login-sivvp.ui.savba.sk:/shared/home/ilias/Work/software/dirac/trunk_release/build_pgi_mkl_i8_dbg_static/.

↧

block tridiagonal and block upper hessenberg eigenvalue solvers

October 24, 2014, 2:31 am

Latest and popular articles on Intel Technologies

≫ Next: Extended eigensolver (FEAST) segfaults

≪ Previous: Static linking PGI with MKL - missing `mkl_serv_default_xerbla'

Dear all,

I am looking at some options in order to compare the performance of eigenvalue solvers for

+ symmetric block tridiagonal

+ block upper hessenberg matrices.

If I iterate in a single vector fashion(not in blocks), I can use stevd and hseqr, respectively(I guess), since the manual and selection tree points to these routines.

But if I convert to block iteration mode, is there a direct replacement for these routines when the matrices become block symmetric tridiagonal or block upper hessenberg.

What would be the most efficient way for the computation of the eigenvalues and eigenvectors in the case of block iterations for a symmetric and hessenberg matrix?

Best,

Umut

↧

Extended eigensolver (FEAST) segfaults

October 24, 2014, 8:40 pm

Latest and popular articles on Intel Technologies

≫ Next: uninstall+cluster studio2013 for windows64

≪ Previous: block tridiagonal and block upper hessenberg eigenvalue solvers

Hi,

I'm calling the FEAST eigensolver using matrices assembled by PETSc (sparse CSR) and it segfaults when running PARDISO. Here is the stack:

#8 <signal handler called>
#9 0x00002b0221661836 in mkl_pds_metis_pqueueupdateup () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#10 0x00002b0221666d99 in mkl_pds_metis_fm_2waynodebalance () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#11 0x00002b0221666e10 in mkl_pds_metis_refine2waynode () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#12 0x00002b022165ad3a in mkl_pds_metis_mlevelnodebisectionmultiple () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#13 0x00002b02200d7d30 in mkl_pds_metis_mlevelnesteddissection_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_thread.so
#14 0x00002b022165bd2b in mkl_pds_metis_nodend_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#15 0x00002b02216fdf0b in mkl_pds_reorder1_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#16 0x00002b02216df2ef in mkl_pds_do_all_pardiso_fc () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#17 0x00002b0221619f0a in mkl_pds_pardiso_c () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#18 0x00002b02216f860d in mkl_pds_pardiso () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#19 0x00002b02219c0479 in mkl_feast_dfeast_scsrgv () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so
#20 0x00002b021f51a792 in dfeast_scsrgv_ () from /apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so

Can you please point out what could be the problem in my code?

Thanks,

Harshad

↧

uninstall+cluster studio2013 for windows64

October 26, 2014, 11:40 pm

Latest and popular articles on Intel Technologies

≫ Next: wrong result when using openmp with DCT

≪ Previous: Extended eigensolver (FEAST) segfaults

hello:

I cant uninstall the cluster studio 2013 for windows64, which always causes a crack. The crack step is shown in figure following.

can you help me?

↧

wrong result when using openmp with DCT

October 27, 2014, 8:44 am

Latest and popular articles on Intel Technologies

≫ Next: The compiling problems in intel ODE solver

≪ Previous: uninstall+cluster studio2013 for windows64

Hi,

I am trying to use openmp with DCT transform to speedup the performance. The program works fine when I set omp_num_threads=1, when I set it to >1, I can see the CPU load is doubled but the result is wrong. Below is the code snippet. Could anyone help me out?

======================================================================

! prepare for DCTs
call d_init_trig_transform(nx-1,MKL_COSINE_TRANSFORM,ipar,dpar,ir)
call d_commit_trig_transform(alpha(:,1),handle,ipar,dpar,ir)

! forward transform
!$OMP PARALLEL DO
do i = 1,ny,1
    call d_forward_trig_transform(alpha(:,i),handle,ipar,dpar,ir)
end do
!$OMP END PARALLEL DO

! SOME PROCESSING IN FREQUENCY DOMAIN

! inverse transform
!$OMP PARALLEL DO
do i = 1,ny,1
    call d_backward_trig_transform(alpha(:,i),handle,ipar,dpar,ir)
end do
!$OMP END PARALLEL DO

! clean up transform
call free_trig_transform(handle,ipar,ir)

↧

The compiling problems in intel ODE solver

October 27, 2014, 9:09 pm

Latest and popular articles on Intel Technologies

≫ Next: Linear systems with complex-number

≪ Previous: wrong result when using openmp with DCT

Hi,

I am installing intel ODE solver on a linux 64-bit OS system
I used the intel fortran compiler.

Following the installation manual, I've tried to run the examples by doing :

ifort -static iode_example_f.f -I../../include -L../../lib/intel64 -liode_intel64 -lm -o iode.out

However, I've got the following error message:
ld: cannot find -liode_intel64

I really have no idea why it can't find this.

Would you give me some comments on it???

Regards,

Belmiro

↧

Linear systems with complex-number

October 28, 2014, 2:36 am

Latest and popular articles on Intel Technologies

≫ Next: Register Now for the Webinar: New Intel® Math Kernel Library Features Boost Performance for Tiny and Gigantic Computations

≪ Previous: The compiling problems in intel ODE solver

dear all,

I would like to know if there is some MKL routines to solve a sparse linear system where some elements are complex number.

Thanks a lot

↧

Register Now for the Webinar: New Intel® Math Kernel Library Features Boost Performance for Tiny and Gigantic Computations

October 27, 2014, 10:14 am

Latest and popular articles on Intel Technologies

≫ Next: triangular solver

≪ Previous: Linear systems with complex-number

Title: New Intel® Math Kernel Library Features Boost Performance for Tiny and Gigantic Computations

Date/Time: Tue, Oct 28, 2014 9:00 AM - 10:00 AM PDT

Registration Link: https://www1.gotomeeting.com/register/867043545

Description of the Webinar: Intel® Math Kernel Library (Intel® MKL) is a computational math library aimed at unleashing performance on Intel® architecture. Designed for scientific, engineering, and financial applications, it efficiently handles both very small and very large computations. Here, we’ll introduce two new features in Intel MKL. The first, helps programmers to boost performance on a single CPU core with minimal effort when dealing with small data sets (for example, matrix multiplication for tiny matrices). The second, at the other end of the spectrum, efficiently solves large-scale sparse linear systems with tens of millions of equations on clusters. We’ll focus on usage models and APIs for these new features and share relevant performance data

Thank you,

Intel MKL Team

↧

triangular solver

October 28, 2014, 10:33 am

Latest and popular articles on Intel Technologies

≫ Next: Dense * Sparse matrix calculations, is there an easier way

≪ Previous: Register Now for the Webinar: New Intel® Math Kernel Library Features Boost Performance for Tiny and Gigantic Computations

I am using the mkl_?coosv, specifically one with d, and compiled the program with 'ifort -openmp -mkl'. I have set 'mkl_omp_num_threads', and also 'omp_proc_bind=true'. I have tested it for three different thread numbers 4, 8, 16. I am getting the following timings: 0.36, 0.3, and 0.32. I am running it on a machine with 16 cores. are these timings reasonable? or are there anything else I should be doing before doing the runs. thanks.

↧

Dense * Sparse matrix calculations, is there an easier way

October 28, 2014, 11:04 am

Latest and popular articles on Intel Technologies

≫ Next: consecutive call of pardiso

≪ Previous: triangular solver

I am porting code (C, so row major) which makes use of ?gemm, ?syrk and ?syr2k calls from dense matrices to sparse matrices, and would to know if there is a simpler way of calculating the various internal matrix products than the following:

?syrk: use ?csrmultd. I assume since this method only allows 1 based indexing the resulting dense matrix is column major, but I would like confirmation.

?syr2k: use two ?gemm calls here instead.

?gemm: This case gets fairly complicated, and it would be extremely nice if someone can tell me if there are methods / options I am overlooking which would simplify this. A and D are the dense result and multiplicand matrices, S['] is a sparse matrix which may be transposed.

A = S['] * D + A
- Use mkl_dcsrmm directly (using zero based indexing)

A = S['] * D' + A
Either
- Transpose D -> Dt
=> A = S['] * Dt + A
- use mkl_dcsrmm (using zero based indexing)
Or
- Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (D' -> Dt)
- Calculate temp. matrix Tt = S['] * Dt via mkl_dcsrmm
- T' (row major) = Tt (col. major)
- calculate A = T' + A;

A = D * S['] + A
- Transpose equation:
-> A' = (D * S['])' + A' = S[!'] * D' + A'
Either:
- Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (A' -> At, D' -> Dt)
=> At = S[!'] * Dt + At
-> Use mkl_dcsrmm
Or:
- Transpose A' => At, D' => Dt
=> At = S[!'] * Dt + At
- Use mkl_dcsrmm (using zero based indexing)
- Transpose At

A = D' * S['] + A
-> Transpose equation:
-> A' = (D' * S['])' + A' = S[!'] * D + A'
Either:
- Transpose A' => At
=> At = S[!'] * D + At
-> Use mkl_dcsrmm (using zero based indexing)
-> Transpose At => A'
Or:
-> Calculate temp. matrix T = S['] * D via mkl_dcsrmm
-> Calculate A = T' + A

In theory I could always store my sparse matrix as one based, and if I need to treat it as zero based add dummy rows / columns to the dense matrices to catch the additional row/column created when multiplying, which means converting between the indexing won't take any time.

One final question: Could someone confirm that it is possible to use mkl_?omatadd to calculate A = A + B without using a temp matrix? The documentation doesn't state whether the memory is allowed to overlap between input and output if no transposition is being done.

↧

consecutive call of pardiso

October 28, 2014, 9:29 pm

Latest and popular articles on Intel Technologies

≫ Next: Performance of matmul vs dgemm for small size matrices

≪ Previous: Dense * Sparse matrix calculations, is there an easier way

Hi.

When sovling a set of linear equations, a typical example program calls pardiso four times,

with parameter phase = 11, 22, 33, and -1.

If i want to solve n of linear systems, each with the same structure, i will call pardiso 4n times.

Is there any simpler way that i just initialize once, do all the calculations and realease once to

make less than 4n calls?

I guess calls with phase = 11, (22, 33, 0), (22, 33, 0), ... , 22,33,-1 will work, but i'm not sure.

↧

Performance of matmul vs dgemm for small size matrices

October 29, 2014, 3:47 am

Latest and popular articles on Intel Technologies

≫ Next: PARDISO - Phase 33

≪ Previous: consecutive call of pardiso

Hi,

my question is regarding improving the performance of following line:

------------------------

MKM = MD*FA1 - MATMUL(MATMUL(MATMUL(ME,MQ),TRANSPOSE(MG)),TRANSPOSE(ME)) + MATMUL(MATMUL(MATMUL(ME,MG),VA),VR)

------------------------

this line is executed for every element within a finite element implementation and is the bottleneck according to performance wizard.

All the matrices are max 12x12 by size. I have tried using DGEMM in the following way:

------------------------

CALL DGEMM('N', 'N', 12, 3, 12, 1.0D0, ME, 12, MQ, 12, 0, MDUMMY3, 12)

CALL DGEMM('N', 'T', 12, 12, 3, 1.0D0, MDUMMY3, 12, MG, 12, 0, MDUMMY4, 12)

CALL DGEMM('N', 'T', 12, 12, 12, 1.0D0, MDUMMY4, 12, ME, 12, 0, MDUMMY5, 12)

CALL DGEMM('N', 'N', 12, 3, 12, 1.0D0, ME, 12, MG, 12, 0, MDUMMY6, 12)

CALL DGEMM('N', 'N', 12, 1, 3, 1.0D0, MDUMMY6, 12, VA, 12, 0, MDUMMY7, 12)

CALL DGEMM('N', 'N', 12, 12, 1, 1.0D0, MDUMMY7, 12, VR, 1, 0, MDUMMY8, 12)

MKM = MD*FA1 - MDUMMY5 + MDUMMY8

------------------------

however it did not provide any improvement (I think it was even a little bit slower).

I was wondering if you would know if any MKL function or setting would help to speed up this line.

Thank you very much in advance,

Murat

↧