Unhandled exception at 0x011BB3FA (mkl_core.dll)

June 26, 2017, 5:05 pm

Latest and popular articles on Intel Technologies

≫ Next: Direct Sparse Solver for Clusters Crash when using MPI Nested Dissection Algorithm

≪ Previous: How can I interrupt / abort LAPACK and BLAS methods which do not support callback ?

Hello World!

I have installed Intel Parallel Studio XE 2017. I'm having some problems with MKL, when I compile the code in c++ using Visual Studio 2013 I get the following error:

Unhandled exception at 0x0162B3FA (mkl_core.dll) in ConsoleApplication2.exe: 0xC0000005: Access violation writing location 0xEB36BCAC.

And I have no idea how to fix it, I have tryed many different combinations of linking paths for the compiler and I have read the manual, but still no help.

Could anyone provide any support or mention if you have this problem as well?

Thank you,

Eduardo Alves.

Zone:

Thread Topic:

Help Me

↧

Direct Sparse Solver for Clusters Crash when using MPI Nested Dissection Algorithm

June 10, 2017, 11:39 am

Latest and popular articles on Intel Technologies

≫ Next: Disable the permutation during the schur complement computation

≪ Previous: Unhandled exception at 0x011BB3FA (mkl_core.dll)

I have a code that calls the the Direct Sparse Solver for Clusters Interface.I have an error when I run it using the option for the MPI based nested dissection. Documentation can be found here: https://software.intel.com/en-us/mkl-developer-reference-c-cluster-sparse-solver-iparm-parameter

When i have iparam[1] = 3 everything works fine.

When I set it iparam[1] = 10, I get no errors, no warnings, no output when msglvl is set to one. I assume this is because the system is crashing really hard.

I am using the 64-bit interface of the solver and not using the MPI-based dissection is not an option (my matrix has 50 billion non-zero elements and is 12 billion by 12 billion). I am using the Latest version of the MKL cluster library.

I just spent two weeks modifying the code to remove overlaps in the matrix elements to use this feature.

What Is going wrong?

Edit #1: Changed iparam[39] to iparam[1].

Thread Topic:

Bug Report

↧

Disable the permutation during the schur complement computation

June 28, 2017, 6:34 am

Latest and popular articles on Intel Technologies

≫ Next: Crash while calling pdmr2d for a large matrix

≪ Previous: Direct Sparse Solver for Clusters Crash when using MPI Nested Dissection Algorithm

Hi All,

I am using Pardiso from MKL to compute the Schur complement of a matrix, and I realize that it permutes the matrix during the reordering and numerical factorization.

Is there a way to disable that permutation ? I am doing a partial factorization of a submatrix involved in a more large project, and I just want to compute the Schur complement without any ordering or numerical pivoting.

Thanks

Thread Topic:

Question

↧

Crash while calling pdmr2d for a large matrix

June 30, 2017, 4:35 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_link_tool quiet output?

≪ Previous: Disable the permutation during the schur complement computation

Hi,

I recently parallelized my fortran90 code using Intel MKL ScaLAPACK.

With relatively smaller matrices it worked fine,

however when I ran my code for a double-precision real matrix of size 40656 by 40656, it crashed.

The error messages are as follows:

MKL_SCALAPACK_ALLOCATE in mr2d_malloc.c is unsuccessful, size = 13223282688

I have tried to resolve this problem or to find a workaround for a week, but I couldn't.

Any help or comments would be greatly appreciated.

Zone:

Server

Thread Topic:

Bug Report

↧

mkl_link_tool quiet output?

July 5, 2017, 6:41 am

Latest and popular articles on Intel Technologies

≫ Next: MKL, macOS, and GNU Fortran

≪ Previous: Crash while calling pdmr2d for a large matrix

All,

Perhaps a FAQ, but my search-fu is lacking. My question is: is there a way to get "quiet" output from the MKL Link Tool? For example:

(1259) $ $MKLROOT/tools/mkl_link_tool -libs --compiler=gnu_f --parallel=no

       Intel(R) Math Kernel Library (Intel(R) MKL) Link Tool v4.4
       ==========================================================

Output
======

Linking line:
 -L$(MKLROOT)/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl

Now, luckily the thing we care about is in stdout and the rest is stderr, but this does look a bit ungainly in, say, log output:

(1260) $ setenv YAYA `$MKLROOT/tools/mkl_link_tool -libs --compiler=gnu_f --parallel=no -libs --noteno`

       Intel(R) Math Kernel Library (Intel(R) MKL) Link Tool v4.4
       ==========================================================

Output
======

Linking line:
(1261) $ echo $YAYA
-L$(MKLROOT)/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl

I'd love it if I could pass in --quiet and then nothing would be output to stderr, only the link line. (Oh, and by the way, I am in a tcsh environment (for legacy reasons) so '2>/dev/null' is not an option thanks to csh being csh. But I could workaround that. I just would like a --quiet so I can be lazy!)

↧

MKL, macOS, and GNU Fortran

July 5, 2017, 6:48 am

Latest and popular articles on Intel Technologies

≫ Next: Intel MKL DFT gives incorrect result on odd number sampling points

≪ Previous: mkl_link_tool quiet output?

Intel MKL Gurus,

A quick question: why is GNU Fortran not an accepted compiler for MKL on macOS?

(1264) $ $MKLROOT/tools/mkl_link_tool --os=mac -libs --compiler=gnu_f --parallel=no

       Intel(R) Math Kernel Library (Intel(R) MKL) Link Tool v4.4
       ==========================================================

Not supported compiler for current configuration:gnu_f. Supported values for compiler: intel_c|intel_f|gnu_c|clang|pgi_f|pgi_c

Configuration
=============

OS:                     mac
Architecture:           intel64
Compiler:               gnu_f
Linking:                dynamic
Interface layer:        lp64
Parallel:               no
OpenMP library:         iomp5

I'm sure there is a good reason, but as I use GNU Fortran a lot on my Macbook (hey, free compiler) and MKL is free(-ish) at least for my experimenting/fiddling so it would be a nice option to use if I could.

Thanks,

Matt

↧

Intel MKL DFT gives incorrect result on odd number sampling points

July 7, 2017, 8:15 am

Latest and popular articles on Intel Technologies

≫ Next: How do I link Intel MKL and libdl with gold linker?

≪ Previous: MKL, macOS, and GNU Fortran

HI, all:

I'm using MKL DFTI to do 2D complex in-place DFTs. I found that when the size of my data array is odd, say 89x89 or 45x45, DFT results are incorrect while when the size of my data array is even say 48x48 or 80x80, DFT results are correct. I read the MKL manual, it says that MKL supports arbitrary size of DFTs. I wonder what could cause this odd/even problem. I appreciate any help or hint.

Thanks,

Jianhua

↧

How do I link Intel MKL and libdl with gold linker?

July 7, 2017, 3:55 pm

Latest and popular articles on Intel Technologies

≫ Next: How does l_mklb work when running across a cluster?

≪ Previous: Intel MKL DFT gives incorrect result on odd number sampling points

I'm having a problem linking Intel MKL and libdl using the gold linker on CentOS:

When I run this script:

#!/bin/bash

MKL_INC=$MKL_INSTALL_DIR/include
MKL_LIB=$MKL_INSTALL_DIR/lib

. /opt/rh/devtoolset-6/enable

cat > t.c << end_mkltest

#include <dlfcn.h>
#include "mkl_service.h"

int main() {
    dlerror();              /* use libdl */
    mkl_set_num_threads(1); /* use mkl   */
}

end_mkltest

gcc -I$MKL_INC -c t.c -o t.o
gcc -L$MKL_LIB -fuse-ld=gold t.o -lmkl_rt -ldl

I get:

libmkl_rt.so: error: undefined reference to 'calloc'
libmkl_rt.so: error: undefined reference to 'realloc'
libmkl_rt.so: error: undefined reference to 'malloc'
libmkl_rt.so: error: undefined reference to 'free'

We're using:

CentOS 7.3
devtoolset-6
mkl-2017.2.174.tar.bz2

Any ideas?

↧

How does l_mklb work when running across a cluster?

July 9, 2017, 10:06 am

Latest and popular articles on Intel Technologies

≫ Next: mkl convolution problems

≪ Previous: How do I link Intel MKL and libdl with gold linker?

Hi all,

Currently attempting to run l_mklb across a 110x node cluster, but I seem to be missing the understanding of the best syntax to run with.

Relevant items:

20 Ps, 22 Qs, NB=192, 1237056 Ns...

Inside the runme_intel64_static I set:

export MPI_PROC_NUM=440

export MPI_PER_NODE=4

#mpirun -perhost ${MPI_PER_NODE} -np ${MPI_PROC_NUM} ./runme_intel64_prv "$@" | tee -a $OUT <-- This was the original command

mpirun -np ${MPI_PROC_NUM} -machinefile hostlist /mnt/shared/benchmarks/runme_intel64_prv "$@" | tee -a $OUT

Right now on a 110 node cluster with 128GB RAM per node on Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz nodes... I'm seeing starting numbers of around 150TFlops.

I would expect to see more... So I guess my question is:

What are the best settings for runme_intel64_static?

On a normal HPL run i'd set the number of processes to the actual number of cores in the system but if I do that using runme_intel64_static, I totally oversubscribe the nodes and the performance goes through the floor.

If someone can explain what each variable does inside the script so I can work out how to saturate the cluster efficiently, that would be great.

Thread Topic:

Help Me

↧

mkl convolution problems

July 9, 2017, 8:29 pm

Latest and popular articles on Intel Technologies

≫ Next: Pardiso is much slower than Multi frontal Solver

≪ Previous: How does l_mklb work when running across a cluster?

Hi all,

when I used DNN-Operations of math kernel library on win7, visual studio2015, it worked well.

convolution and maxpooling functions get same results, but faster.

but, the convolution functions get different value changed on win10,

do you have any ideas?

Thanks.

Zone:

Windows*

↧

Pardiso is much slower than Multi frontal Solver

July 10, 2017, 3:46 pm

Latest and popular articles on Intel Technologies

≫ Next: FFT incorrect output [regression]

≪ Previous: mkl convolution problems

Hallo

I use Pardiso for solving Matrix which comes from full coupled 3D Biot's equation. For small problem, Pardiso has no problem. However, when size of matrix increases, the problem appears.

Currently, the matrix is around 1e6 x 1e6 (1 mil x 1 mil), and nnz=77589160 (around 77.5 mil). Phase 1 (reordering with iparm(2)=3 does not take too much time, however Phase 2 - Factorization takes so long to finish. In my case, with core i7 - 6800k and 64GB Ram, it took around 10 mins for Phase 2.

I noticed that the CPU ran only single core. I did some research on Intel forum and found that, the reason came from fill-in process. Then, I tried to compare with Direct Solver of Ansys, which I believe that is Multifrontal solver. Because the finite element mesh was exported from Ansys, so the matrix size is exactly the same. Ansys needed only 64 seconds for everything. Here is log from Ansys.

Here is the log from Pardiso. Pardiso ran single thread, used more memory than Ansys, and much slower. Ansys also used Metis as reordering method. According to this article, Pardiso should be as fast as multifrontal solver. So what is the wrong thing here? What did I wrongly config?

Thank you very much.

--------------------

Pham Hung

↧

FFT incorrect output [regression]

July 12, 2017, 1:25 am

Latest and popular articles on Intel Technologies

≫ Next: Lining against correct version of libmkl_blacs_*.so

≪ Previous: Pardiso is much slower than Multi frontal Solver

Hello everyone,

I've noticed that the MKL is giving me a wrong result while calculating 3D real-to-complex in-place FFT with specific sizes. I've found a test program on this forums to confirm it's not a bug in my code, attaching the source code with settings I've used.

The error is present using both Windows and GNU/Linux, and all the Intel MKL versions following 2017.0.0 are broken (the 2017.0.0 version is the last working for me). I can use a newer Intel C Compiler but have to link the older MKL library in order to get a working code so it seems to be a bug in the library itself.

Compiling on GNU/Linux with:

icc basic_sp_real_dft_3d.c -DMKL_ILP64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -o dfttest -qopenmp

Output:

Intel(R) Math Kernel Library Version 2017.0.3 Product Build 20170413 for Intel(R) 64 architecture applications
Example basic_sp_real_dft_3d
Forward-Backward single-precision in-place 3D real FFT
Configuration parameters:
 DFTI_PRECISION                = DFTI_SINGLE
 DFTI_FORWARD_DOMAIN           = DFTI_REAL
 DFTI_DIMENSION                = 3
 DFTI_LENGTHS                  = {360, 384, 433}
 DFTI_PLACEMENT                = DFTI_INPLACE
 DFTI_CONJUGATE_EVEN_STORAGE   = DFTI_COMPLEX_COMPLEX
Create DFTI descriptor
Set configuration: CCE storage
Set input  strides = { 0, 166656, 434, 1 }
Set output strides = { 0, 83328, 217, 1 }
Commit the descriptor
Allocate data array
Initialize data for r2c transform
Compute real-to-complex in-place transform
Verify the result
 Check if err is below errthr 7.7e-06
 Verified,  maximum error was 2.67e-07
Change strides to compute backward transform
Commit the descriptor
Initialize data for c2r transform
Compute backward transform
Verify the result
 Check if err is below errthr 7.7e-06
 x[359][2][0]:  expected 0,  got 0.0003608387,  err 0.000361
 Verification FAILED
 ERROR, status = 1
Free DFTI descriptor
Free data array
TEST FAILED

Thanks for pointing out if I'm doing anything wrong. It would be nice to get a working version otherwise :).

Attachment	Size
Download basic_sp_real_dft_3d.c	11.02 KB

↧

Lining against correct version of libmkl_blacs_*.so

July 6, 2017, 6:04 am

Latest and popular articles on Intel Technologies

≫ Next: Setting MKL to use with Intel ParallelAccelerator

≪ Previous: FFT incorrect output [regression]

Dear folks,

I am about to build an MPI executable with ifort/mpif90, linking it to the MKL and the OpenMPI libs. I came to the point where a serious problem wrt. to linking with the correct version of the mkl_blacs library occurs. I. e., I try to link with the library libmkl_blacs_ilp64.so (one of the MKL libraries), but the ifort/mpif90 inserts an incorrect library symbol into the ELF object file of the executable:

libmkl_blacs_ilp64.so => not found

How should I tell the ifort/mpif90 compiler to link with libmkl_blacs_openmpi_ilp64.so and not wirh libmkl_blacs_ilp64.so?

Thanks for your help

Sebastian

Thread Topic:

Help Me

↧

Setting MKL to use with Intel ParallelAccelerator

July 12, 2017, 2:26 am

Latest and popular articles on Intel Technologies

≫ Next: undefined reference to `sgemm' Eigen3 + MKL

≪ Previous: Lining against correct version of libmkl_blacs_*.so

I am programming in Julia and for Parallel computing i am using ParallelAccelerator.jl package. I was able to setup OpenBLAS with g++. Kindly help me out to set up Intel MKL with g++. I have installed Julia using macOS dmg package hence I am unable find Make.inc file. This is article from Intel which explains setting up Julia with Intel MKL I am unable to setup Julia to use MKL instead its currently using OpenBLAS as you can see from the below output.

julia> versioninfo()
Julia Version 0.5.2
Commit f4c6c9d4bb (2017-05-06 16:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

julia> Pkg.build("ParallelAccelerator")
INFO: Building ParallelAccelerator
ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
System installed BLAS found
Checking for OpenMP support...
OpenMP support found in g++-7
Max OpenMP threads: 8
Using g++-7 to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.

This is how i setup openblas

export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/opt/openblas/include/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/opt/openblas/lib/

Kindly let me know if there any other way to Setup MKL.

Thank You
Regards
Saran

Zone:

Thread Topic:

Help Me

↧

undefined reference to `sgemm' Eigen3 + MKL

July 13, 2017, 8:27 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso sometimes slow reordering with weighted matching

≪ Previous: Setting MKL to use with Intel ParallelAccelerator

Hi,

I have a project that runs fine in Windows with Eigen 3. When enabling MKL in Visual Studio with EIGEN_USE_MKL_ALL, I see speed-ups of about 5-15 percent.

When trying to get this to run in Linux, it causes problems during linking. I am using Eigen3 (3.2.92) together with MKL (2017.4.196) in an Ubuntu 14.04 project. For the preprocessor I defined in the CMakeFile:

ADD_DEFINITIONS(-DEIGEN_USE_MKL_ALL) - I also tried instead ADD_DEFINITIONS(-DEIGEN_USE_BLAS) as this is the part I am benefitting the most.

For CXX_CMAKE_CXX_FLAGS I added -D TBB_USE_THREADING_TOOLS -DMKL_LP64

Added include directory: /opt/intel/compilers_and_libraries/linux/mkl/include

Added linker directory: /opt/intel/compilers_and_libraries/linux/mkl/lib/intel64

TARGET_LINK_LIBRARIES(visualizer mkl_core mkl_tbb_thread mkl_def mkl_intel_lp64 mkl_intel_ilp64 tbb ${catkin_LIBRARIES} ${SDL2_LIBRARIES} ${OPENGL_LIBRARIES} ${GLEW_LIBRARIES} ${PNG_LIBRARY} ${PROJECT_NAME} )

I also tried just using intel_lp64 or just using intel_ilp64, which doesn't seem to change the behaviour of this. What is the difference anyway?

When I build it (using catkin build), I get the following:

/home/elch/catkin-ws2/devel/.private/o_m/lib/libo_m.a(Qq.cpp.o): In function `Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>::run(long, long, long, float const*, long, float const*, long, float*, long, float, Eigen::internal::level3_blocking<float, float>&, Eigen::internal::GemmParallelInfo<long>*)':
/usr/include/eigen3/Eigen/src/Core/products/GeneralMatrixMatrix_MKL.h:112: undefined reference to `sgemm'
collect2: error: ld returned 1 exit status
make[2]: *** [/home/elch/catkin-ws2/devel/.private/o_m/lib/o_m/visualizer] Error 1
make[1]: *** [CMakeFiles/visualizer.dir/all] Error 2
make: *** [all] Error 2

Any advice?

Thanks!

↧

Pardiso sometimes slow reordering with weighted matching

July 13, 2017, 10:02 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_get_max_threads() return value unclear

≪ Previous: undefined reference to `sgemm' Eigen3 + MKL

I have two matrices with the same sparsity structure but different values that take significantly different amounts of time (7s vs 37s) during the reordering phase. The diagnostics report that all of the time differences is inside of "Time spent in allocation of internal data structures (malloc)". I find the time difference surprising, and I find it surprising that it would be in 'malloc'.

Is this all expected? Is there something I can do to increase performance here?

If I turn off weighted matching, the difference goes away and reordering is faster in both cases. I would simply turn weighted matching off, but I have found it's necessary to get accurate solutions on many of the other problems our users construct.

The matrices should be quite similar. They are nearby timesteps of a dynamic non-linear FEM simulation of an elastic cloth hanging under gravity.

If I run the same physical scenario on a low resolution cloth (20'000 triangles instead of 180'000 triangles), the slowdown doesn't appear.

My iparm settings are all zero, except for these ones:

iparm[0] = 1; // use my settings
iparm[1] = 2; // METIS
iparm[9] = 8; // pivot perturbation of 1.0E-8
iparm[10] = 1; // scaling
iparm[12] = 1; // weighted matching
iparm[17] = -1; // enable reporting
iparm[20] = 1;  // enable reporting

Reording phase for the "fast" matrix:

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.051754 s
Time spent in reordering of the initial matrix (reorder)         : 0.002212 s
Time spent in symbolic factorization (symbfct)                   : 0.527827 s
Time spent in data preparations for factorization (parlist)      : 0.034113 s
Time spent in allocation of internal data structures (malloc)    : 6.139631 s
Time spent in additional calculations                            : 0.451844 s
Total time spent                                                 : 7.207380 s

Reording phase for the "slow" matrix:

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.051663 s
Time spent in reordering of the initial matrix (reorder)         : 0.002180 s
Time spent in symbolic factorization (symbfct)                   : 0.529333 s
Time spent in data preparations for factorization (parlist)      : 0.033952 s
Time spent in allocation of internal data structures (malloc)    : 35.833652 s
Time spent in additional calculations                            : 0.451910 s
Total time spent                                                 : 36.902690 s

Matrix statistics after reordering phase for both matrices are exactly the same. The factorization and solution phases have almost the exact same runtimes for both matrices.

Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP< Linear system Ax = b >
             number of equations:           1351803
             number of non-zeros in A:      9191733
             number of non-zeros in A (%): 0.000503

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    1115919
             size of largest supernode:               2555
             number of non-zeros in L:                93923643
             number of non-zeros in U:                1
             number of non-zeros in L+U:              93923644

Summary: ( factorization phase )
================
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 1.511711 s
Time spent in allocation of internal data structures (malloc)    : 0.000519 s
Time spent in additional calculations                            : 0.000004 s
Total time spent                                                 : 1.512234 s

Summary: ( solution phase )
================
Time spent in direct solver at solve step (solve)                : 0.229972 s
Time spent in additional calculations                            : 0.499647 s
Total time spent                                                 : 0.729619 s

I'm running Windows 10, compiling and linking with MSVC using toolset Visual Studio 2015 (v140). I'm not sure how to double-check the MKL version I'm using. MKL is in a folder named "compilers_and_libraries_2016.4.246". I have "Intel Parallel Studio 2016 Update 4 Profession Edition for Windows" installed. I have "Intel VTune Amplifier 2017 for Windows Update 4" installed. The only download that Intel Software Manager is recommending to me is an update for parallel studio 2015, which I ignore because I think it's old.

My compile line looks like:

/Yu"stdafx.h" /MP /GS /W3 /Gy /Zc:wchar_t /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\include" /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\include" /Zi /Gm- /O2 /sdl /Fd"x64\Develop\vc140.pdb" /Zc:inline /fp:precise ... more defines ... /D "NDEBUG" /D "_CONSOLE" /D "WIN32_LEAN_AND_MEAN" /D "NOMINMAX" /D "_USE_MATH_DEFINES" /D "_SCL_SECURE_NO_WARNINGS" /D "_CRT_SECURE_NO_WARNINGS" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /openmp- /Fa"x64\Develop\" /EHsc /nologo /Fo"x64\Develop\" /Fp"x64\Develop\solveLinearSystem.pch"

My link line looks like:

/OUT:"x64\Develop\solveLinearSystem.exe" /MANIFEST /NXCOMPAT /PDB:"x64\Develop\solveLinearSystem.pdb" /DYNAMICBASE ... some libs ... "tbb.lib" ... some libs ... "mkl_core.lib""mkl_tbb_thread.lib""mkl_intel_lp64.lib" ... some libs ... "kernel32.lib""user32.lib""gdi32.lib""winspool.lib""comdlg32.lib""advapi32.lib""shell32.lib""ole32.lib""oleaut32.lib""uuid.lib""odbc32.lib""odbccp32.lib" /DEBUG /MACHINE:X64 /OPT:NOREF /INCREMENTAL /PGD:"x64\Develop\solveLinearSystem.pgd" /SUBSYSTEM:CONSOLE /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /ManifestFile:"x64\Develop\solveLinearSystem.exe.intermediate.manifest" /OPT:NOICF /ERRORREPORT:PROMPT /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\lib\intel64\vc14" /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\lib\intel64_win" /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\compiler\lib\intel64_win" /TLBID:1

I'm happy to share the matrices if that's helpful. I have them in plain text, zipped to ~60MB each.

Thread Topic:

Question

↧

mkl_get_max_threads() return value unclear

July 14, 2017, 2:22 am

Latest and popular articles on Intel Technologies

≫ Next: About MKL bidiagonal decomposition

≪ Previous: Pardiso sometimes slow reordering with weighted matching

I'm a bit confused about the method

mkl_get_max_threads()

According to the help documentation, it should return the number of physical cores (provided that dynamic adjustment is enabled).

However, when I run the code below on my 4-core machine, I get back "1" for "nT". How so?

			if ( 1 == mkl_get_dynamic( ) )
			{
				long nT = mkl_get_max_threads();
				mkl_set_num_threads( nT );
			}

Zone:

Windows*

Thread Topic:

Question

↧

About MKL bidiagonal decomposition

July 18, 2017, 10:24 am

Latest and popular articles on Intel Technologies

≫ Next: Adding write statements to a subroutine breaks the MKL lib

≪ Previous: mkl_get_max_threads() return value unclear

Dear support team,

I met some problem when trying to run the ?unmbr() function from MKL, the bidiagonal terms could be calculated correctly, but I was unable to mutiply them back. Could you help me with it?

Thanks a lot!

float x_real[6 * 4] = { -0.57, -1.28, -0.39,  0.25,
    -1.93,  1.08, -0.31, -2.14,
     2.30,  0.24,  0.40, -0.35,
     1.93,  0.64, -0.66,  0.08,
     0.15,  0.30,  0.15, -2.13,
     0.02,  1.03, -1.43,  0.50};

int length[2] = { 4, 6 };

P = (MKL_Complex8*)MKL_malloc(sizeof(MKL_Complex8)*6*6, 32; 
Q = (MKL_Complex8*)MKL_malloc(sizeof(MKL_Complex8)*6*6, 32);
TP = (MKL_Complex8*)MKL_malloc(sizeof(MKL_Complex8*6, 32);
TQ = (MKL_Complex8*)MKL_malloc(sizeof(MKL_Complex8*6, 32);

for (int j = 0; j < length[0] * length[1]; j++)
{
 Spec[j].real = x_real[j];
 Spec[j].imag = 0;
} 

LAPACKE_cgebrd(LAPACK_ROW_MAJOR, length[1], length[0], Spec, length[0], s1, s2, TP, TQ); //here is correct

for (j = 0; j < length[1] * length[0] - 1; j++)
{
  P[j].real = Spec[j].real;
  P[j].imag = Spec[j].imag;
  Q[j].real = Spec[j].real;
  Q[j].imag = Spec[j].imag;

}


LAPACKE_cunmbr(LAPACK_ROW_MAJOR, 'Q', 'L', 'N', length[1], length[0], length[0], Q, length[0], TQ, S, length[1]);


LAPACKE_cunmbr(LAPACK_ROW_MAJOR, 'P', 'R', 'C', length[1], length[0], length[1], P, length[0], TP, S, length[1]);

↧

Adding write statements to a subroutine breaks the MKL lib

July 18, 2017, 10:35 am

Latest and popular articles on Intel Technologies

≫ Next: Facing error when running PARDISO at phase11

≪ Previous: About MKL bidiagonal decomposition

Hi all,

Classic strange behaviour with write statements in a subroutine. Adding them breaks the MKL library.

Without the writes, the dolfyn CFD code (written in f2003) runs:

 Flag   Step    Res U     Res V     Res W     Mass      Res k     Res eps   Res T          U         V         W         P         k        eps        T
     :     1:  1.36E-04  7.41E-05  0.00E+00  5.71E-02  0.00E+00  0.00E+00  0.00E+00     4.58E+00 -1.65E+00  0.00E+00 -1.39E+01  0.00E+00  0.00E+00  0.00E+00
     :     2:  9.50E-04  1.02E-03  0.00E+00  3.35E-02  0.00E+00  0.00E+00  0.00E+00     4.73E+00 -1.57E+00  0.00E+00 -1.47E+01  0.00E+00  0.00E+00  0.00E+00

Adding them:

 *** Error: Failed preconditioner of mkl solver         -103
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 *** Error: Failed in mkl solver dfgmres, i =         302 RCI          -1
 Flag   Step    Res U     Res V     Res W     Mass      Res k     Res eps   Res T          U         V         W         P         k        eps        T
     :     1:  1.36E-04  7.41E-05  0.00E+00  3.61E-01  0.00E+00  0.00E+00  0.00E+00     4.68E+00 -2.52E+00  0.00E+00 -2.22E+01  0.00E+00  0.00E+00  0.00E+00
 *** Error: Failed preconditioner of mkl solver         -103
     :     2:  1.24E-03  1.54E-03  0.00E+00  6.94E-02  0.00E+00  0.00E+00  0.00E+00     4.92E+00 -1.41E+00  0.00E+00  1.13E+01  0.00E+00  0.00E+00  0.00E+00
 *** Error: Failed preconditioner of mkl solver         -103

The difference in the two subroutines is:

$ diff solver_mkl_OK.f90 solver_mkl_FOUT.f90
199c199<        write(IOdef,*)'*** Error: Failed preconditioner of mkl solver'
--->        write(IOdef,*)'*** Error: Failed preconditioner of mkl solver ',IERR
238a239>      write(IOdbg,*)'*** start solver dfgmres'
242a244>        !    dfgmres( n,    x,  b,  RCI_request,    ipar,    dpar,    tmp)
243a246>        write(IOdbg,*)'*** solver dfgmres returns with ',RCI_request
250a254,256>          ! multiply the matrix by tmp(ipar(22)), put the result in tmp(ipar(23)),>          ! and return the control to the dfgmres routine>
259c265,266<          write(iodef,*)'*** Error: Failed in mkl solver'
--->          write(IOdbg,*)'*** Error: Failed in mkl solver dfgmres, i =',i,'RCI',RCI_request>          write(iodef,*)'*** Error: Failed in mkl solver dfgmres, i =',i,'RCI',RCI_request
270c277<      write(IOdbg,*)'Leave MKL solve P'
--->      write(IOdbg,*)'Leave MKL solve P',RCI_request

As you can see no significant differences.

The full code, with two versions of subroutines, is in the tarball along with a small demo data set. Unpack it on a linux box with ifort. Change into test and hit:

$ ./go.sh

To reproduce the bug (or not).

$ uname -a
Linux tennekes 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 16.04.2 LTS
Intel i7-6700HQ CPU @ 2.60GHz
Lenovo Y700 16GB

Any hints or suggestions welcome, but this might interest the developers more.

Thank you!

Henk

Attachment	Size
Download test.tar.gz	638.6 KB

↧

Facing error when running PARDISO at phase11

July 20, 2017, 1:26 am

Latest and popular articles on Intel Technologies

≫ Next: Shared Library for Intel fftw wrapper

≪ Previous: Adding write statements to a subroutine breaks the MKL lib

Hi,

I am coding with Fortran 90 in Microsoft VS2008. In order to solve a sparse matrix, the PARDISO solver in MKL has been used. The matrix to be solved is a complex number one.

When I tried to run the solver under, it stopped at phase=11 due to an error as following:

"Access violation writing location 0xcdcdcdd5."

The disassembly is stopped at subroutine _mkl_pds_metis_pqueueupdateup of MKL.

Could anybody suggests about what could be the possible reason for such an error?

Thank you!

↧