There is a strange comment on Wikipedia about Intel(R) MKL

June 13, 2013, 10:17 pm

Latest and popular articles on Intel Technologies

≫ Next: FFT scale factor usage is inconsistant

This is a very generic post and I found a strange comment on Wikipedia about Intel(R) MKL and take a look at:

http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms

...
Intel MKL
The Intel Math Kernel Library, supporting the old Intel Pentium ( although there are some doubts about future support to the Pentium architecture ), Core and Itanium CPUs under Linux, Windows and Mac OS X.[9]
...

I checked Release Notes of MKL version 10 Initial Release and this is a quote:

...
Supported processors - The following is a list of processors on which Intel(R) MKL is expected to run.

Intel(R) CoreT processor family
Intel(R) Xeon(R) processor family
Intel(R) Itanium(R) processor family
Intel(R) Pentium(R) 4 processor family
Intel(R) Pentium(R) III processor
Intel(R) Pentium(R) processor (300 MHz or faster)
Intel(R) Celeron(R) processor
AMD Athlon* and Opteron* processors
...

Also, a dedicated web-page for MKL on Wikipedia needs to be updated, more generic details are needed and it would be nice to add a history of MKL.

↧

FFT scale factor usage is inconsistant

June 14, 2013, 1:27 pm

Latest and popular articles on Intel Technologies

≫ Next: Dynamically linking MKL

≪ Previous: There is a strange comment on Wikipedia about Intel(R) MKL

I downloaded a 30-day trial of MKL to play around with the 2D FFT and test the speed of the routines. The performance has been exceptional and I plan to purchase a license. But I ran into one issue that looks like a software bug. I am coding in C#. If I use a simple real array input, forward transform, and backward transform, the resulting array, x_normal, contains the same as my inputted array. But if I use a real and imaginary array input, forward transform, and backward transform, the resulting aray, x_normal_real, differs from the inputted array by the scale factor. In my case, I send a image of 1024 x 1024, so the scale factor is 1 / 1024 * 1 / 1024 = .0.0000009536743164. Is there a reason why complex 2D input is not scaled like real 2D input? For more information, I include the relevant portions of my two code cases below. Method 1: Real var desc = new IntPtr(); int precision = DFTI.DOUBLE; int forward_domain = DFTI.REAL; // DFTI.COMPLEX; int dimension = 2; int[] len = {rows, columns}; // Create the new DTFI descriptor int ret = DFTI.DftiCreateDescriptor(ref desc, precision, forward_domain, dimension, len); // Setup the scale factor long transform_size = rows * columns; double scale_factor = 1.0 / transform_size; DFTI.DftiSetValue(desc, DFTI.BACKWARD_SCALE, scale_factor); // Setup the transform parameters DFTI.DftiSetValue(desc, DFTI.PLACEMENT, DFTI.NOT_INPLACE); DFTI.DftiSetValue(desc, DFTI.PACKED_FORMAT, DFTI.PACK_FORMAT); // Commit the descriptor DFTI.DftiCommitDescriptor(desc); // The data to be transformed var x_normal = new double[rows * columns]; var x_transformed = new double[rows * columns]; // Initialize the data array for (int y = 0; y < rows; y++) // actually v for (int x = 0; x < columns; x++) // actually u x_normal[y*rows + x] = ((frameImageData[y, x] > 100.0) ? 100.0 : frameImageData[y, x]) + baseline; // Forward transform DFTI.DftiComputeForward(desc, x_normal, x_transformed); // Backward transform DFTI.DftiComputeBackward(desc, x_transformed, x_normal); DFTI.DftiFreeDescriptor(ref desc); Method 2: Complex var desc = new IntPtr(); int precision = DFTI.DOUBLE; int forward_domain = DFTI.COMPLEX; int dimension = 2; int[] len = {rows, columns}; // Create the new DTFI descriptor int ret = DFTI.DftiCreateDescriptor(ref desc, precision, forward_domain, dimension, len); // Setup the scale factor long transform_size = rows * columns; double scale_factor = 1.0 / transform_size; DFTI.DftiSetValue(desc, DFTI.BACKWARD_SCALE, scale_factor); // Try floating-point and GetValue function double backward_scale = 0.0; DFTI.DftiGetValue(desc, DFTI.BACKWARD_SCALE, ref backward_scale); // Setup the transform parameters DFTI.DftiSetValue(desc, DFTI.PLACEMENT, DFTI.NOT_INPLACE); DFTI.DftiSetValue(desc, DFTI.PACKED_FORMAT, DFTI.PACK_FORMAT); DFTI.DftiSetValue(desc, DFTI.COMPLEX_STORAGE, DFTI.REAL_REAL); // Commit the descriptor DFTI.DftiCommitDescriptor(desc); // The data to be transformed var x_normal_real = new double[rows * columns]; var x_normal_imaginary = new double[rows * columns]; var x_transformed_real = new double[rows * columns]; var x_transformed_imaginary = new double[rows * columns]; // Initialize the data array for (int y = 0; y < rows; y++) // actually v for (int x = 0; x < columns; x++) // actually u x_normal_real[y*rows + x] = ((frameImageData[y, x] > 100.0) ? 100.0 : frameImageData[y, x]) + baseline; for (int z = 0; z < rows * columns; z++) x_normal_imaginary[0] = 0.0; // Forward transform DFTI.DftiComputeForward(desc, x_normal_real, x_normal_imaginary, x_transformed_real, x_transformed_imaginary); // Backward transform DFTI.DftiComputeBackward(desc, x_transformed_real, x_transformed_imaginary, x_normal_real, x_normal_imaginary); DFTI.DftiFreeDescriptor(ref desc);

↧

Dynamically linking MKL

July 1, 2013, 5:11 pm

Latest and popular articles on Intel Technologies

≫ Next: Compiling R with serial MKL (failed due to zdotu error?)

≪ Previous: FFT scale factor usage is inconsistant

Hi is it possible to dynamically link MKL? In particular I want a drop in replacement for blas and lapack on my system. With ATLAS I am able to just create symlinks to the appropriate libraries for libblas.so and liblapack.so and most applications will work correctly. Is this possible with any of the MKL libraries? Thanks.

↧

Compiling R with serial MKL (failed due to zdotu error?)

July 2, 2013, 6:23 am

Latest and popular articles on Intel Technologies

≫ Next: MKL FFT library performance vary from run to run by almost 100% difference

≪ Previous: Dynamically linking MKL

Hi!

I'm trying to get R 3.0.1 to compile using composer_xe_2013.2.146 using the serial MKL. I followed the instructions from http://software.intel.com/en-us/articles/using-intel-mkl-with-r , yet at the configure-step, I get

checking whether double complex BLAS can be used... no

configure then goes on to compile R without linking the MKL (i.e., the resulting binary uses R's own BLAS implementation, which offers subpar performance).

Searching around the web, it would appear that I am far from the only one having run this issue, yet I was unable to find any solutions for it. (I found an old discussion about the topic here: http://software.intel.com/en-us/forums/topic/326016 ... yet it didn't include any solution steps, and in fact I believe in the end the user ended up with an R version that didn't even link with the MKL).

I investigated the configure-failure, and it stems from the following failed test:

a Fortran-file (the attached conftestf.f) calls zdotu, and checks the results against a manual dot-product. It sets a failure flag if the zdotu-result and the manual dot-product don't match. This is then called from the C file conftest.c (I'm assuming to test Fortran<>C interfacing or something), which returns the failure-flag as a return code.

I've attached the files to this post (slightly modified so the failure flag gets printed to stdout). I compile the files with:

export LIBDIR="/apps/intel/compiler/composer_xe_2013.2.146/mkl/lib/intel64/:/apps/intel/compiler/composer_xe_2013.2.146/compiler/lib/intel64/"
ifort -c conftestf.f -L$LIBDIR -lmkl_gf_lp64 -lmkl_sequential -lmkl_core
icc -c conftest.c -L$LIBDIR -lmkl_gf_lp64 -lmkl_sequential -lmkl_core
icc conftestf.o conftest.o -o conftest -L$LIBDIR -lmkl_gf_lp64 -lmkl_sequential -lmkl_core
./conftest

The call to ./conftest will print out the failure flag, which on my machine will always output 1. This makes the configure-test fail and leads R to ignore the MKL I want to link it with.

For full reference, this is how I call ./configure:

/apps/intel/compiler/composer_xe_2013.2.146/bin/compilervars_global.sh intel64
/apps/intel/compiler/composer_xe_2013.2.146/mkl/bin/vars/mklvars.sh intel64
export CC='icc -std=c99 '
export F77='ifort '
export CXX='icpc '
export FC='ifort '
export CPPFLAGS="-O3 -DNDEBUG -g -march=native "
export CFLAGS=$CPPFLAGS
export FCFLAGS=$CPPFLAGS
export FFLAGS=$CPPFLAGS
export MKL_LIB_PATH=/apps/intel/compiler/composer_xe_2013.2.146/mkl/lib/intel64:/apps/intel/compiler/composer_xe_2013.2.146/compiler/lib/intel64/
export LD_LIBRARY_PATH=$MKL_LIB_PATH
export MKL=" -L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_sequential -lmkl_core "
./configure --with-blas="$MKL" --enable-threads=posix --with-lapack --enable-memory-profiling --enable-R-shlib

Attachment	Size
Download conftest.c	847 bytes
Download conftestf.f	415 bytes

↧

MKL FFT library performance vary from run to run by almost 100% difference

July 2, 2013, 12:05 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL Library Capability

≪ Previous: Compiling R with serial MKL (failed due to zdotu error?)

Hi there,

I'm trying to use MKL 1D FFT library, e.g., I call 1M batch of size 1K FFT using MKL single precision.

If I just run the library call the performance was very steady and very fast, say, 0.3 seconds on my machine.

However, if I include the library call in my application, which is multi-threaded, the performance of the library call would vary from 0.3-0.6 seconds with 0.5 seconds occuring most often.

I was wondering if anyone else had experienced this or I was making mistakes and maybe there is a way to achieve good steady performance?

Thanks in advance!

↧

MKL Library Capability

July 5, 2013, 12:47 pm

Latest and popular articles on Intel Technologies

≫ Next: Segfault with 8 threads linking against MKL 11.0.3

≪ Previous: MKL FFT library performance vary from run to run by almost 100% difference

We are developing a finite element analysis application involving matrix calculations. We are hoping to use MKL from Intel for our requirements.
We need to calculate

• Matrix Inverse
• Matrix Multiplication
• Matrix solving

Input matrix dimensions will be in the order of 10^8

We would like to know will MKL library be able to handle such inputs and calculations.

↧

Segfault with 8 threads linking against MKL 11.0.3

July 8, 2013, 4:46 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso gives error when parallel factorization control set to 1

≪ Previous: MKL Library Capability

Dear all,

I am experiencing a segfault on Linux with an application of mine when I link it against MKL shipped with composer_xe_2013.3.163 (Update 3 - March 2013), which should be 11.0.3 according to http://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-....

My application is multi-threaded and it uses pthreads. The segfault happens in cblas_dgemm when I spawn 8 threads: runs with 1, 2 or 4 threads work fine. I am linking against libmkl_intel_lp64.so, libmkl_core.so, libmkl_sequential.so. I have the following environment:

MKL_DISABLE_FAST_MM=1
MKL_SERIAL=YES
MKL_NUM_THREADS=1

The same binary compiled with composer_xe_2013.3.163 runs perfectly on 8 threads if I point LD_LIBRARY_PATH to the MKL libraries shipped with Intel Compilers version 11.1.069. So it really seems to be a version-specific issue.

I have tried to set:

ulimit -s unlimited
MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1"
OMP_NUM_THREADS=1
OMP_DYNAMIC=FALSE
MKL_DYNAMIC=FALSE
OMP_NESTED=FALSE

but it makes no difference. Here I attach the valgrind trace:

==20175== Thread 3:
==20175== Invalid read of size 8
==20175==    at 0x53DB0DA: mkl_serv_malloc (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x860C01B: mkl_blas_mc_dgemm_get_bufs (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8684768: mkl_blas_mc_xdgemm_par (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8683B4B: mkl_blas_mc_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x53ED8DB: mkl_blas_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x662A7CE: mkl_blas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_sequential.so)
==20175==    by 0x4CF0AA8: DGEMM (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x4D02452: cblas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x455204: pred_y_values (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x4689F7: lmo_cv_thread (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x3B0A00683C: start_thread (in /lib64/libpthread-2.5.so)
==20175==    by 0x3B094D4F8C: clone (in /lib64/libc-2.5.so)
==20175== Address 0xd0 is not stack'd, malloc'd or (recently) free'd
==20175==
==20175==
==20175== Process terminating with default action of signal 11 (SIGSEGV)
==20175== Access not within mapped region at address 0xD0
==20175==    at 0x53DB0DA: mkl_serv_malloc (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x860C01B: mkl_blas_mc_dgemm_get_bufs (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8684768: mkl_blas_mc_xdgemm_par (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8683B4B: mkl_blas_mc_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x53ED8DB: mkl_blas_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x662A7CE: mkl_blas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_sequential.so)
==20175==    by 0x4CF0AA8: DGEMM (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x4D02452: cblas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x455204: pred_y_values (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x4689F7: lmo_cv_thread (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x3B0A00683C: start_thread (in /lib64/libpthread-2.5.so)
==20175==    by 0x3B094D4F8C: clone (in /lib64/libc-2.5.so)

I consistenly get this error on "Address 0xd0".
As I mentioned, my program works perfectly when linked against older Intel MKL versions, as well as ATLAS or Sun Performance LIbrary.
I would be very glad if you could indicate a way to solve my problem.

Thanks, best regards,
Paolo

↧

Pardiso gives error when parallel factorization control set to 1

July 9, 2013, 12:26 am

Latest and popular articles on Intel Technologies

≫ Next: Using MKL libs in Octave to run on Xeon PHI

≪ Previous: Segfault with 8 threads linking against MKL 11.0.3

Hi,

I atteched a test program that give '-1' error code after PARDISO symbolic analysis stage if iparm[23] is set to 1. The error code is '0' if I set iparm[23] to '0'.

I am using Linux and MKL from composer_xe_2013.4.183.

Could anyone help me finding out the problem here?

Thanks & Regards,

Xin

Attachment	Size
Download pardisoanalysisparallelerror.7z	30.64 MB

↧

Using MKL libs in Octave to run on Xeon PHI

July 9, 2013, 6:00 am

Latest and popular articles on Intel Technologies

≫ Next: Problem in calling MKL function of dgesvd

≪ Previous: Pardiso gives error when parallel factorization control set to 1

Hello,

Folowing the article http://software.intel.com/en-us/articles/using-intel-mkl-in-gnu-octave to compile and link mkl libs with Octave.

After installing Octave I've checked with "ldd /usr/local/bin/octave" that all mkl libraries are correctly linked.

- Then export the environment variables to enable MIC automatic offload:
Following the documentacion Setting Environment Variables for Automatic Offload
export MKL_MIC_ENABLE=1

- Finally I've executed Octave and run a simple matrix multiplication (3000x3000 matrix size, using DGEMM in BLAS mkl libraries). Using micsmc tool, we can see that no coprocessor core it's working, so the automatic offload isn't doing properly .

To ensure that Octave is using mkl dgemm function I've debugged the execution of a simple matrix multiplication. And as expected the function is correctly called: Breakpoint 2, 0x00007ffff1d67980 in dgemm_ () from /opt/intel/parallel_studio_xe_2013_update3/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so

But all the work is done in host processor and Xeon Phi coprocessor doesn't do anything.

I've perform one more test using an example dgemm program:

Using the dgemm example included in <install-dir>/Samples/en-US/mkl/tutorials.zip -> dgemm_example.c. I modified the code to call dgemm function instead of cblas_dgemm, after compiling and linking it with mkl libraries after a first test, debugging the application and with environment variable MKL_MIC_ENABLE set to 1 we can see the following line:

0x00007ffff77ad980 in dgemm_ () from /opt/intel/parallel_studio_xe_2013_update3/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so

So the simple program dgemm_example.c is calling exactly the same mkl function of libmkl_intel_lp64.so mkl library. And the execution is being perform in the coprocessor with no problem!

Can you give me some support to help us to understand why automatic offload is not working in Octave?

Thanks for the help.

↧

Problem in calling MKL function of dgesvd

July 9, 2013, 2:24 pm

Latest and popular articles on Intel Technologies

≫ Next: Segmentation Fault in MKL PBLAS/ScaLAPACK

≪ Previous: Using MKL libs in Octave to run on Xeon PHI

hi,

I followed the Intel manual to call the the MKL routine of "dgesvd" in my code as

call dgesvd( 'S', 'N', m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, info)

If I set the compiler option with /iface:cref, the code works well. However, if the compiler option is set as /iface:cref /iface:mixed_str_len_arg, the code will report error during calling dgesvd as "Access violation reading location 0x0000000000000001". Because I have to keep the latter compiler option, how can I resolve the problem?

Thanks in advance for any suggestion!

Jing

↧

Segmentation Fault in MKL PBLAS/ScaLAPACK

July 10, 2013, 9:27 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Math Kernel Library 11.0 update 5 is now available

≪ Previous: Problem in calling MKL function of dgesvd

Hi,

I am trying to use MKL PBLAS/ScaLAPACK routine as proposed in the following link: http://software.intel.com/en-us/articles/using-cluster-mkl-pblasscalapack-fortran-routine-in-your-c-program. The source code (downloadable from the same site) is also attached to this post.

I am using the Intel® Composer 2011.2.137, compiler icc 12.0.2 20110112, and OpenMPI 1.4.3.

According to the Intel® Math Kernel Library Link Line Advisor I am compiling by

mpicc -w -o pdgemv pdgemv.c -I$(MKLROOT)/include -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -limf -lm -openmp -DMKL_ILP64

Compiling is fine, but running the program via

mpirun -n 4 ./pdgemv

causes the following segmentation fault:

[node266:15074] *** Process received signal ***
[node266:15074] Signal: Segmentation fault (11)
[node266:15074] Signal code: Address not mapped (1)
[node266:15074] Failing at address: 0x44000098
[node266:15074] [ 0] /lib64/libpthread.so.0 [0x3f8420eb10]
[node266:15074] [ 1] /openmpi/1.4.3/intel--co-2011.2.137--binary/lib/libmpi.so.0(MPI_Comm_size+0x5a) [0x2abdef96c17a]
[node266:15074] [ 2] /intel/co-2011.2.137/binary/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.so(ilp64_Cblacs_pinfo+0x92) [0x2abdef3be4a2]
[node266:15074] *** End of error message ***

I don't understand what is wrong, hope someone can help me. Thanks and kind regards.

Massi

Attachment	Size
Download pdgemv.c	2.23 KB

↧

Intel® Math Kernel Library 11.0 update 5 is now available

June 24, 2013, 3:43 pm

Latest and popular articles on Intel Technologies

≫ Next: 2D FFT on 4D data real to complex and backward

≪ Previous: Segmentation Fault in MKL PBLAS/ScaLAPACK

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance. The Intel MKL 11.0 Update 5 packages are now ready for download. Intel MKL is available as a stand-alone product and as a part of the Intel® Parallel Studio XE 2013, Intel® C++ Studio XE 2013, Intel® Composer XE 2013, Intel® Fortran Composer XE 2013, and Intel® C++ Composer XE 2013. Please visit the Intel® Software Evaluation Center to evaluate this product.

What's New in Intel® MKL 11.0 Update 5 : Release Notes

↧

2D FFT on 4D data real to complex and backward

July 11, 2013, 1:12 pm

Latest and popular articles on Intel Technologies

≫ Next: matdescra(1) = 'D' in mkl_dcsrmv doesn't work?

≪ Previous: Intel® Math Kernel Library 11.0 update 5 is now available

Please can I ask for help with the 2D transform on 4D data? I have an array theta(x,y,dz,z) and need to do to the FFT over x and y, I have written for this an test code:

real :: Xin(x,y), Xout(x*y)
complex :: Yin(x,y), Yout(x*y)

dxT = x/2 + 1
L(1) = x
L(2) = y
strides_out(1) = 0
strides_out(2) = 1
strides_out(3) = dxT

do i=1,dz
do j = 1,z
    Xin = theta(:,:,i,j)
    StatusExp = DftiCreateDescriptor( FFT_HANDLE, DFTI_SINGLE,DFTI_REAL, 2, L )
    StatusExp = DftiSetValue(FFT_HANDLE,DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX)
    StatusExp = DftiSetValue( FFT_HANDLE, DFTI_PLACEMENT, DFTI_NOT_INPLACE )
    StatusExp = DftiSetValue(FFT_HANDLE, DFTI_OUTPUT_STRIDES, strides_out)
    StatusExp = DftiCommitDescriptor(FFT_HANDLE)
    StatusExp = DftiComputeForward(FFT_HANDLE, Xin(:,1), Yout)
    do k=1,dxT
      do l=1,y
   thetaFFT(k,l) = Yout(k + (l-1)*dxT)
   if(k>1) thetaFFT(x+2-k,l) = conjg(thetaFFT(k,l))
      enddo
    enddo
    StatusExp = DftiFreeDescriptor(FFT_HANDLE)
enddo
enddo

!***************************************************************************
! Backward

L(1) = x
L(2) = y
strides_out(1) = 0
strides_out(2) = 1
strides_out(3) = x

do i=1,dz
do j = 1,z
    Yin = thetaFFT(:,:,i,j)
    StatusExp = DftiCreateDescriptor( FFT_HANDLE, DFTI_SINGLE,DFTI_REAL, 2, L )
    StatusExp = DftiSetValue(FFT_HANDLE,DFTI_REAL_STORAGE, DFTI_REAL_REAL)
    StatusExp = DftiSetValue( FFT_HANDLE, DFTI_PLACEMENT, DFTI_NOT_INPLACE )
    StatusExp = DftiSetValue(FFT_HANDLE, DFTI_OUTPUT_STRIDES, strides_out)
    StatusExp = DftiCommitDescriptor(FFT_HANDLE)
    StatusExp = DftiComputeBackward(FFT_HANDLE, Yin(:,1), Xout)
    do k=1,x
      do l=1,y
   theta(k,l) = Xout(ix + (l-1)*dxT)/(x*y)
      enddo
    enddo
    StatusExp = DftiFreeDescriptor(FFT_HANDLE)
enddo
enddo

Why I don't get my testing array theta back when running this routine?
I send theta = 1., procedure with it forward and backward FFt and get back a theta containing random numbers about 2. What's wrong?

Many thanks for any idea

↧

matdescra(1) = 'D' in mkl_dcsrmv doesn't work?

July 12, 2013, 7:15 am

Latest and popular articles on Intel Technologies

≫ Next: Intel MKL + ATL/COM DLL regsrv32 error in debug build, but not in release...

≪ Previous: 2D FFT on 4D data real to complex and backward

Hi,

I'm trying to write the equivalent of this matlab code in F#, with ILP64 MKL:

S = temp + (I - diag(diag(temp)))

where S, temp are matrices in Rn, I is the identity matrix in Rn, and diag(diag(temp)) is the matrix containing only the main diagonal of temp

In other words, given the matrix

[ a b c
d e f
g h i ]

I want the result

[1 b c
d 1 f
g h 1 ]

for any square sparse matrix.

I think I can perform part of this operation using mkl_dcsrmv, which computes

y := alpha*A*x + beta*y

where x and y are the values array of the identity matrix, alpha is -1.0 and beta is 1.0. According to the manual, I can operate on the diagonal by setting the first element of matdescra to 'D'.

Given the matrix

[1 2 3
4 5 6
7 8 9 ]

my calculation correctly returns the vector [-5 -14 -3] for a General matrix, but when I set matdescra(1) to D, I get [1 1 1] where I expect [0 -4 -8]

It's as though setting matdescra(1) to 'D' returns, not the main diagonal of my matrix, but [0 0 0].

I include some F# code that reproduces the issue, error handling and memory cleanup elided.

let test_dcsrmv (structure:char) =
let p x = PinnedArray.of_array x

let mutable transOption = 'N'
let mutable m = 3
let mutable k = 3
let mutable alpha = -1.0
let mutable beta = 1.0
let madescra = [| structure; '_'; 'N'; 'F'|] |> PinnedArray.of_array

let indx = p [|1; 2; 3;
1; 2; 3;
1; 2; 3; |]

let vals = p [|
                     1.0; 2.0; 3.0;
                     4.0; 5.0; 6.0;
                     7.0; 8.0; 9.0; |]

let pntrb = p [|1; 4; 7;|]
let pntre = p [|4; 7; 10|]

let x = p [| 1.0; 1.0; 1.0 |]
let y = [| 1.0; 1.0; 1.0 |]
let y_handle = p y
mkl_dcsrmv(&&transOption, &&m, &&k, &&alpha, madescra.Ptr, vals.Ptr, indx.Ptr, pntrb.Ptr, pntre.Ptr, x.Ptr, &&beta, y_handle.Ptr)

let general = test_dcsrmv 'G' // returns [-5 -14 -23]
let diagonal = test_dcsrmv 'D' // returns [1 1 1]

↧

Intel MKL + ATL/COM DLL regsrv32 error in debug build, but not in release...

July 14, 2013, 9:43 am

Latest and popular articles on Intel Technologies

≫ Next: Microsoft Visual Studio 2012 Compatibility?

≪ Previous: matdescra(1) = 'D' in mkl_dcsrmv doesn't work?

Hello all,

I have a strange problem, similar to the one from this post :

http://software.intel.com/en-us/forums/topic/283594

I have windows 8 64bits pro, I am using visual studio 2010 ultimate, and compiling a 32 bits ATL/COM dll.

The MKL was linked to my visual studio 2010 project thanks to the integration of MKL to visual studio 2010, so that I just went to the properties of my project and choose (for all configurations) to use the sequential MKL library. (I wasn't even able to find in my project where the linking is explicitely appearing because it does not appear at the usual places I was modifying to link the mkl to my previous project, at the "pre-integration" time...)

When I build in debug mode, when I build for the first time (or when I do a rebuild) I have the following output :

"C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\Microsoft.CppCommon.targets(744,5): warning MSB3073: The command "regsvr32 /s "C:\CODAGE\win8\mvs2010\MyBS\Toto\Debug\Toto.dll"" exited with code 3.

C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\Microsoft.CppCommon.targets(756,5): error MSB8011: Failed to register output. Please try enabling Per-user Redirection or register the component from a command prompt with elevated permissions."

and when I just build for the second time (or when I do a simple build) I have no output error like the previous ones, but the problem remains, because dll doesn't even let itself being referenced in VBA, indicating that the debug one is completely f***ed up...

So I first tried to execute the "regsrv32 /s" command on my debug dll with the highest permissions, which failed, and then I tried to enable per-user redirection also, with the same result.

Then I searched on google and found the link I was mentionning at the beginning of my post :

http://software.intel.com/en-us/forums/topic/283594

I followed what was advised there and renamed the 64bits include, lib and bin directories, without success --> still the same error in debug build. Then I found this intel fortran related post :

http://software.intel.com/en-us/forums/topic/285673

where the advise was to pass the problematic dll to dependancy walker. I did it, and had this log message :

"Error: At least one module has an unresolved import due to a missing export function in an implicitly dependent module.
Error: Modules with different CPU types were found.
Warning: At least one delay-load dependency module was not found.
Warning: At least one module has an unresolved import due to a missing export function in a delay-load dependent module."

I search everywhere in dependancy walker, but did not find anything... I should mention that I have exactly the same issue when I build in release mode.

Help would be greatly appreciated !!

Thx a lot !

LvM

PS : I only have this related to the MKL in the code of the project compiling my dll :

#include "mkl_vsl_functions.h"
#include "mkl_vsl.h"

↧

Microsoft Visual Studio 2012 Compatibility?

July 16, 2013, 6:01 am

Latest and popular articles on Intel Technologies

≫ Next: MKL 11.0 update 1 DGEMM, AVX2 instruction set

≪ Previous: Intel MKL + ATL/COM DLL regsrv32 error in debug build, but not in release...

Hi,

Is MSVS2012 a supported environment for MKL 11.0 update 5? I am trying to get MKL 11.0 update 5 working with Visual Studio 2012. To get my environment variables setup, I have followed the setup instructions here:

http://software.intel.com/en-us/articles/intel-mkl-110-getting-started/#Environment

and I started following the instructions here:

http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-compiling-and-linking-with-microsoft-visual-cc

However, when I right click on a C++ project in visual studio 2012, I do not have the "Intel C++ Composer XE 2013 -> Select Build Components" options. MSVS2012 does have the option "HELP -> Intel C++ Composer XE 2013 -> Intel C++ Composer XE Help", but I cannot find any of the select build components options that the tutorials use. Any help would be appreciated.

Cheers,

Matt

Attachment	Size
Download helpcorrect.png	37.95 KB
Download noselectoption.png	46.82 KB

↧

MKL 11.0 update 1 DGEMM, AVX2 instruction set

July 16, 2013, 9:46 pm

Latest and popular articles on Intel Technologies

≫ Next: Yet another tale of LAPACK95 linker woe ("undefined reference")

≪ Previous: Microsoft Visual Studio 2012 Compatibility?

* OS: Windows 7

* CPU: i7 4770k Haswell

* MKL: 11.0 Update 1

There is a unknown program crash while calling DGEMM function with a parameter for transpose.

So DGEMM ( "N",...) is OK but DGEMM ( "T",...) fails.

AVX2 instruction set running on Haswell CPU is bug suspicious because AVX or SSE2 instruction set runs well.

mkl_cbwr_set(MKL_CBWR_AVX); // code path change

Is there any suitable way to solve this problem?

↧

Yet another tale of LAPACK95 linker woe ("undefined reference")

July 18, 2013, 12:36 am

Latest and popular articles on Intel Technologies

≫ Next: minimum / optimal block size for ScaLAPACK and BLAS?

≪ Previous: MKL 11.0 update 1 DGEMM, AVX2 instruction set

Once again, Eclipse has died wiping out the magical settings I used to link my Fortran code against the LAPACK95 libraries. I've resuscitated Eclipse but I'm back where I was 8 weeks ago poking futilely at the Link Advisor and Eclipse's compile and link options. (i.e. this is a linker options problem, not an Eclipse problem.)

The environment is Linux Mint, 64-bit. Before running Eclipse, environment variables are set with:

source /opt/intel/bin/compilervars.sh intel64
source /opt/intel/mkl/bin/mklvars.sh intel64 mod

This sets MKLROOT to

/opt/intel/composer_xe_2013.2.146/mkl

and LD_LIBRARY_PATH to

/opt/intel/composer_xe_2013.2.146/compiler/lib/intel64:/opt/intel/composer_xe_2013.2.146/mkl/lib/intel64:/opt/intel/composer_xe_2013.2.146/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013.2.146/mpirt/lib/intel64:/opt/intel/composer_xe_2013.2.146/ipp/../compiler/lib/intel64:/opt/intel/composer_xe_2013.2.146/ipp/lib/intel64:/opt/intel/composer_xe_2013.2.146/compiler/lib/intel64:/opt/intel/composer_xe_2013.2.146/mkl/lib/intel64:/opt/intel/composer_xe_2013.2.146/tbb/lib/intel64

This is not a new configuration; it's been a stable working setup for months. Well, except for Eclipse occasionally starting on fire.

I used the Link Advisor to find the options to create a dynamically-linked binary with 32-bit integers (lp64) for the Intel64 architecture.

The last relevant bit of console output before the build process dies is:

Building file: ../src/fate.f90
Invoking: Intel(R) Intel(R) 64 Fortran Compiler
ifort -g -O0 -fpp -DDEBUG -warn declarations -warn unused -warn uncalled -ftrapuv -save -fpe0 -fp-model source -traceback -c -pg -Ddebug -stand -heap-arrays -I/opt/intel/composer_xe_2013.2.146/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2013.2.146/mkl/include -axSSE4.1 -o "src/fate.o""../src/fate.f90"
Finished building: ../src/fate.f90

Building target: F2064a
Invoking: Intel(R) Fortran Linker
ifort -L/opt/intel/composer_xe_2013.2.146/mkl/lib/intel64 /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_blas95_lp64.a /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_lapack95_lp64.a -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -pg -o "F2064a"<giant redacted list of object files>
./src/m_numcon.o: In function `dsolv':
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1491: undefined reference to `dgesvx1_mkl95_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1506: undefined reference to `dgesv1_mkl95_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1521: undefined reference to `dgetrf_mkl95_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1527: undefined reference to `dgetrs1_mkl95_'
make: *** [F2064a] Error 1

I get the same results whether the options are put before the source files or after.

I tried again, this time using the suggested options for creating a statically-linked binary.

Similarly, the last relevant bit of console output before the build process dies is:

Building target: F2064a
Invoking: Intel(R) Fortran Linker
ifort /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_blas95_lp64.a /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-group /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -pg -o "F2064a"<same giant list of redacted object files>
./src/m_numcon.o: In function `tridag':
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:695: undefined reference to `dgtsv_'
./src/m_numcon.o: In function `dsolv_sparse':
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1230: undefined reference to `dss_create_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1239: undefined reference to `dss_define_structure_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1249: undefined reference to `dss_reorder_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1258: undefined reference to `dss_factor_real_d_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1267: undefined reference to `dss_solve_real_d__'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1315: undefined reference to `dss_delete_'
./src/m_numcon.o: In function `dsolv':
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1462: undefined reference to `dgesvx_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1477: undefined reference to `dgesv_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1491: undefined reference to `dgesvx1_mkl95_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1506: undefined reference to `dgesv1_mkl95_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1521: undefined reference to `dgetrf_mkl95_'
/home/apthorpe/workspace/F2064a/Debug_Intel64/../src/m_numcon.f90:1527: undefined reference to `dgetrs1_mkl95_'
make: *** [F2064a] Error 1

I know what I'm missing is small but I cannot for the life of me remember what byzantine and unmemorable bit of configuration needs to go where, and it's nothing that a cursory search could find. I'm at the end of my rope with this so any help would be greatly appreciated.

To summarize:

1) A solution exists. This was working fine until a few hours ago

2) This is a problem with linker options, not with the IDE

3) I've tried using the Link Advisor suggestions for several different methods of linking against LAPACK95 and none of them work

Thanks much,

-- Bob

Aside: I'd have hoped by now that dev tools could scan the filesystem and sensibly sort out their dependencies by now; dynamic languages already do a pretty good job of this. It doesn't seem that compilers or linkers have gotten any smarter in the past 25 years or so, at least when it comes to chasing down libraries or include files. There is a reasonably small number of places these files can be and the compiler/linker should have a fair idea of what combinations of libraries are compatible with each other, especially those supplied by a vendor. A naive scan of my filesystem shows roughly 7200 shared objects and 500 static libraries. Is it really that Herculean of an effort to scan the files with find, nm, etc. and suggest compiler options or provide better guidance that "couldn't find it, didn't look too hard, gave up"? I'm not arguing that we should be ignorant of our toolset but I'd really rather be working on my code than playing "bring me a rock" with the linker (again). Or hosing down the charred remains of my IDE.

↧

minimum / optimal block size for ScaLAPACK and BLAS?

July 18, 2013, 9:01 pm

Latest and popular articles on Intel Technologies

≫ Next: Core Access Limit for PARDISO Solver?

≪ Previous: Yet another tale of LAPACK95 linker woe ("undefined reference")

ScaLAPACK arrays are distributed in a block-cyclic fashion over the process "grid". ScaLAPACK then uses the PBLAS and BLACS to perform BLAS-like operations, but in a distributed SPMD fashion, which become a mix of communication between processes, and BLAS operations within the processes, more-or-less.

So the size of the block is going to affect the performance of the communication and the BLAS calls, but the degree to which it does depends on the implementation. The MKL implementation is a black-box to the end user (me). And I don't have an ATLAS-like search tool to point me in the right direction toward what block size I should be using, especially when the parameters are things like { Gig-ethernet vs 10G infiniband vs ....} and {westmere vs sandy/ivy-bridge vs haswell } etc.

So... are there any guidelines for choice of block size when using MKL ScaLAPACK, LAPACK, and BLAS ?

E.g. is it important for the ScaLAPACK block size to be a cache-friendly size (e.g no larger than 1/2 of L1 or L2, etc)?
Or alternatively does the ScaLAPACK block size affect primarily the load balancing as an operation that works on successively smaller areas of a matrix as many of the algorithms do? But are not relevant to the efficiency of block-matrix multiplies at the BLAS level?
Perhaps the MKL Level-3 BLAS calls are themselves made less-sensitive to large block sizes? (E.g. because there is re-blocking within gemm() etc, anyway where threads are exploited by OpenMP etc ... maybe the MKL BLAS is already subdividing (re-blocking) to be as effcient as it can given that it gets a large enough block?
If I want to avoid such hypothesized re-blocking, because for some reason there are places where I can manage this block size "for free" as a side-effect of the way my code is structured, is there an optimal block size for MKL level-3 BLAS calls?

↧

Core Access Limit for PARDISO Solver?

July 23, 2013, 4:21 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL Data fitting, log-linear interpolation

≪ Previous: minimum / optimal block size for ScaLAPACK and BLAS?

I encountered a very strange problem and I would like to hear opinions and suggestions from the Gurus:

I use Visual Composer XE, ver 12.1, coded the PARDISO solver in my program on a Windows 7 machine. The program is running on HP DL980 servers with Windows Server 2008R2 OS, each server has 10 CPUs, or 80 cores, and 1TB RAM. The program works well, each session can access 50% of all cores, or 40 cores. Now the IS&T personel turns on the HT (hyper threading) switch, supposely to speed up the execution. The task manager now shows 160 logical cores, instead of 80, BUT, each program session can access only 20 cores, or use only 12.5% of total cores available! So instead of speed up, it slows down.

My questions are: Is this access limit caused by some internal conflict between PARDISO's numerical routines and the HT technology? Does PARDISO have any built-in limit on how many cores it can access, especially for the HT enabled machines? Is there any workaround/solution to increase/improve the number of cores that a program can access? And leads and suggestions on how to solving this issue are greatly appreciated.

↧