Warming up strategy for MIC dgemm call

September 25, 2014, 3:40 am

Latest and popular articles on Intel Technologies

≫ Next: Performance of offloaded MKL FFTs on the MIC, anyone?

≪ Previous: Getting very low efficiency for mkl_dcsrmv function

In my computation, I manually offload some computation to MIC using offload pragmas. Offloaded computation also involves a call to MKL's Double precision general matrix-matrix multiplication (dgemm). Work between host CPU and MIC is divided based on performance model. Performance model rely on DGEMM performance ( in Gigaflops/sec), which is recorded by running a microbenchmark for various operand sizes (m,n and k) (done offline) .

Before the actual computation is started, I run a warm up dgemm call on largest operand sizes I will encounter in our computation ( which in my case is n=m~10000 and k~200). Even after the warm up call, I observe that for some dgemm computation still performance is unexpectedly low.

k0 =2, m 2405 n 903 ,k 192, flop rate 67.2766

k0 =2, m 2405 n 903 ,k 192, flop rate 440.115

k0 =17, m 2422 n 1066 ,k 192, flop rate 67.5244

k0 =17, m 2422 n 1066 ,k 192, flop rate 599.45

k0 =346, m 2812 n 1280 ,k 2, flop rate 1.49697

k0 =346, m 2812 n 1280 ,k 2, flop rate 15.2189

Above are some anomalous performance observed. m,n,k are dimensions of dgemm call. ( k0 is iteration number (irrelevant for present discussion)). Note that I run each of them twice, and the second time the measured flop rate corroborate nicely with estimated value. However, in real computation, I may not have an option to do dgemm twice.

I am trying to understand what might cause such behaviour. Can such performance anomaly be mitigated by warming up dgemm for different sizes? If so, what sizes should I ran for warming up dgemm? What is minimum number of call that is required? (I'm presently trying trial and error, assuming that performance anomaly can be mitigated by performing a series of warm up of suitable sizes.)

( Computation is iterative in nature; thus a large number of offloads are performed. And if I incorrectly estimate of time taken by computation on MIC, this may cause a load imbalance between host CPU and MIC, that may have a cascade effect on subsequent iterations due to nature of computation )

↧

Performance of offloaded MKL FFTs on the MIC, anyone?

September 25, 2014, 10:18 am

Latest and popular articles on Intel Technologies

≫ Next: Linking against both the sequential and threaded mkl

≪ Previous: Warming up strategy for MIC dgemm call

My initial experiments offloading MKL FFTs into the MIC (using C language in Linux) give me approximately 9.3 GFLOPS of performance, judging by the reported [MIC Time] numbers when I set the environment variable OFFLOAD_REPORT to 1 (or 2). This is about 0.46% of the advertized peak performance of 2 TFLOPS. But in fact, it is much less than that if I take into account the time for the data movement inside the offload section [CPU Time in the "report").

Am I missing something?

I am curious to know if my numbers are way off or consistent with other benchmarks (I could not find any).

I would appreciate it if someone could point me to related information or to know if someone had a different (or similar) experience.

The bottom line is that I hope I need to do something to drastically improve its performance, but I ran out of ideas. Any help will be appreciated.

Thanks!

Fernando

↧

Linking against both the sequential and threaded mkl

September 26, 2014, 1:18 am

Latest and popular articles on Intel Technologies

≫ Next: djacobix only uses 4 threads on a 16 CPUs virtual machine

≪ Previous: Performance of offloaded MKL FFTs on the MIC, anyone?

I have two dlls that link against the static mkl libraries. One of the dlls links against the sequential version and the other against the multi-threaded version. Those two dlls are then loaded in to the same process where multiple threads of that process may be using the sequential version and multi-threaded version concurrently. Does anybody know whether this is safe to do please?

Kind regards

Mark

↧

djacobix only uses 4 threads on a 16 CPUs virtual machine

September 29, 2014, 3:21 pm

Latest and popular articles on Intel Technologies

≫ Next: phase 13 does not work in cluster_sparse_solver

≪ Previous: Linking against both the sequential and threaded mkl

Hello,

I'm using djacobix in Intel MKL. My testing machine is a virtual Windows Server 2012 with 16 CPUs, . I'm use the following statements in my code:

mkl_set_dynamic(0);

mkl_set_num_threads(12);

But when it runs, djacobix only uses 4 threads at a time. I found the topic "Why the MKL can only call 4 threads?" (https://software.intel.com/en-us/forums/topic/288645). It mentioned that "MKL uses just 1 thread per core". I set the environment variable "KMP_AFFINITY=verbose" as suggested, and it gave me the following outputs:

OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.

OMP: Info #205: KMP_AFFINITY: cpuid leaf 11 not supported - decoding legacy APIC ids.

OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid info

OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #156: KMP_AFFINITY: 16 available OS procs

OMP: Info #157: KMP_AFFINITY: Uniform topology

OMP: Info #159: KMP_AFFINITY: 16 packages x 1 cores/pkg x 1 threads/core (16 total cores)

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 0 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 1 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 5 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

2OMP: Info #242: KMP_AFFINITY: pid 2828 thread 3 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

2OMP: Info #242: KMP_AFFINITY: pid 2828 thread 4 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

04OMP: Info #242: KMP_AFFINITY: pid 2828 thread 6 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 7 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 2 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 8 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 9 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 10 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

OMP: Info #242: KMP_AFFINITY: pid 2828 thread 11 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

Is it possible to use 12 threads in djacobix on this machine?

Thanks.

↧

phase 13 does not work in cluster_sparse_solver

September 29, 2014, 3:44 pm

Latest and popular articles on Intel Technologies

≫ Next: how to split matrix in cluster_sparse_solver, similar number of rows or similar number of nonzero elements

≪ Previous: djacobix only uses 4 threads on a 16 CPUs virtual machine

Hi:

I take mkl/examplescluster_sparse_solver/cl_solver_unsym_distr_c.c and compile it with mpic++ -cxx=icpc -mkl -xHost and it works well. From the mkl document, I know that I can combine phase 11, phase 22 and phase 33 together to phase 13. However, after I use mpiexec -n 2 to run the later case, I get:

Fatal error in PMPI_Bcast: Other MPI error, error stack:

PMPI_Bcast(1606)......: MPI_Bcast(buf=0x7fff159c2100, count=100, MPI_LONG_LONG_INT, root=0, MPI_COMM_WORLD) failed

MPIR_Bcast_impl(1458).:

MPIR_Bcast(1482)......:

MPIR_Bcast_intra(1253):

MPIR_SMP_Bcast(1167)..: Failure during collective

I use mpicxx for MPICH version 3.1 and mkl 11.2, icpc 15.0.0 on linux64

I want to ask is it a bug?

↧

how to split matrix in cluster_sparse_solver, similar number of rows or similar number of nonzero elements

September 29, 2014, 6:52 pm

Latest and popular articles on Intel Technologies

≫ Next: dgemm - Intel inspector: "uninitialized memory acces"

≪ Previous: phase 13 does not work in cluster_sparse_solver

Hi:

I need to use cluster_sparse_solver to solve a larger complex symmetric matrix. So I need to split matrix into two parts and put them into two different compute nodes respectively. The first part is from row 1 to row n, the second part is from row n to the last row. Since the selection of row n is arbitrary, which selection will give the best performance, tow compute nodes have similar number of rows or similar number of nonzero elements?

↧

dgemm - Intel inspector: "uninitialized memory acces"

October 1, 2014, 7:48 am

Latest and popular articles on Intel Technologies

≫ Next: Problem building 64 bit numpy using MKL and vc11 (Windows)

≪ Previous: how to split matrix in cluster_sparse_solver, similar number of rows or similar number of nonzero elements

Hi,

I launched an Memory Error Analysis with Intel Inspector on simple test of the dgemm function and i got this warning:"uninitialized memory access"located on the dgemm call.

Here is my test (really simple !) :

void dgemm_test()
{
  double *A,*B,*C;
  double alpha, beta;
  int m,i;

  m = 10;
  A = (double*)mkl_malloc((m*m)*sizeof(double),128);
  B = (double*)mkl_malloc((m*m)*sizeof(double),128);
  C = (double*)mkl_malloc((m*m)*sizeof(double),128);

  for(i=0; i<m*m; i++){
    A[i] = (double)(rand() % (m*m)) / (double)(m*m);
    B[i] = (double)(rand() % (m*m)) / (double)(m*m);
    C[i] = 0.0;
  }

  alpha = 1.0;
  beta = 0.0;
  cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, m, m, alpha, A, m, B, m, beta, C, m);

  mkl_free(A);
  mkl_free(B);
  mkl_free(C);
}

Each buffer seems to be initialized... Do I have to be worried about this warning ? Is it an expected behavior ?

My configuration: Intel Inspector XE 2013. MKL: 11.1.2 (32bit mode). CPU: Xeon E5-1620. OS: W7 64bit (SP1).

Thanks in advance for your help !

↧

Problem building 64 bit numpy using MKL and vc11 (Windows)

October 2, 2014, 5:42 pm

Latest and popular articles on Intel Technologies

≫ Next: pardiso_getdiag

≪ Previous: dgemm - Intel inspector: "uninitialized memory acces"

Hi, I’ve built numpy (1.9) 64 bit using vc11, the Intel Fortran compiler and the MKL ‘mkl_rt’ library.

*why? (see end of message for the reason, if interested)

Any advice or assistance would be greatly appreciated. If I can offer additional information, I will happily do so.

The build appears to go just fine (no errors noted), and numpy loads into python just fine as well.

(I note a warning: ### Warning: python_xerbla.c is disabled ### -- however, it doesn’t appear to be problematic?)

I have also confirmed that numpy sees the mkl blas and lapack libs.

>>> numpy.__config__.show()

lapack_opt_info:

libraries = ['mkl_lapack', 'mkl_rt']

library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64']

define_macros = [('SCIPY_MKL_H', None)]

include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include']

blas_opt_info:

libraries = ['mkl_rt']

library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64']

define_macros = [('SCIPY_MKL_H', None)]

include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include']

openblas_lapack_info:

NOT AVAILABLE

lapack_mkl_info:

libraries = ['mkl_lapack', 'mkl_rt']

library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64']

define_macros = [('SCIPY_MKL_H', None)]

include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include']

blas_mkl_info:

libraries = ['mkl_rt']

library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64']

define_macros = [('SCIPY_MKL_H', None)]

include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include']

mkl_info:

libraries = ['mkl_rt']

library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64']

define_macros = [('SCIPY_MKL_H', None)]

include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include']

Everything *looks* to be in order upon casual inspection (*I think*, please correct me if I’m wrong!)

However, there is no performance boost when running a few different tests in numpy (singular value decomposition, for example), and only a single thread appears to be in play.

Running numpy.test(‘full’) reveals 21 errors.

For instance,

LINK : fatal error LNK1104: cannot open file 'ifconsol.lib'

And, the other being a recurring error with f2py,

ERROR: test_size.TestSizeSumExample.test_transpose

----------------------------------------------------------------------

Traceback (most recent call last):

File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\nose\case.py", line 371, in setUp

try_run(self.inst, ('setup', 'setUp'))

File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\nose\util.py", line 478, in try_run

return func()

File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\numpy\f2py\tests\util.py", line 353, in setUp

module_name=self.module_name)

File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\numpy\f2py\tests\util.py", line 80, in wrapper

raise ret

RuntimeError: Running f2py failed: ['-m', '_test_ext_module_5403', 'c:\\users\\jareyn~1\\appdata\\local\\temp\\tmpvykewl\\foo.f90']

Reading .f2py_f2cmap ...

Mapping "real(kind=rk)" to "double"

Succesfully applied user defined changes from .f2py_f2cmap

Everything that requires configuration appears to be in agreement with this Intel Application Note, minus use of the Intel C++ compiler:

https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl

I have also referenced the Windows build docs on scipy.org:

http://www.scipy.org/scipylib/building/windows.html#building-scipy

Some info about my configuration:

site.cfg:

include_dirs = C:\Program Files (x86)\Intel\Composer XE 2015\mkl\include

library_dirs = C:\Program Files (x86)\Intel\Composer XE 2015\mkl\lib\intel64

mkl_libs = mkl_rt

PATH = (paths separated by line for easy reading)

C:\Program Files\Side Effects Software\Houdini 13.0.509\python27;

C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\Scripts;

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64;

C:\Program Files (x86)\Intel\Composer XE 2015\bin\intel64

LD_LIBRARY_PATH =

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64;

C:\Program Files (x86)\Intel\Composer XE 2015\bin\intel64;

C:\Program Files (x86)\Intel\Composer XE 2015\mkl\lib\intel64;

C:\Program Files (x86)\Intel\Composer XE 2015\compiler\lib\intel64

Thank you in advance for your time,

-Jay

PS - Please note that I've cross-posted this message to the numpy-discussion mailing list as well. I will post any useful information I receive here.

=====

*why am I doing this?

The reason I’m doing this is because I need numpy with MKL to run with the version of python that comes packaged with Houdini (Python 2.7.5 (default, Oct 24 2013, 17:49:49) [MSC v.1700 64 bit (AMD64)] on win32).

So, downloading a prebuilt 64 bit numpy isn’t an option due to the unavailability of a compatible compiler version.

↧

pardiso_getdiag

October 3, 2014, 8:42 am

Latest and popular articles on Intel Technologies

≫ Next: Xcode config scheme for composer 2015

≪ Previous: Problem building 64 bit numpy using MKL and vc11 (Windows)

Hello

I am calling "pardiso_getdiag( pt, df, da, mnum, error)" after PARDISO call (phase = 22) and it returns diagonal terms of initial and final (factorized) matrix.

As noticed, these results returned are not in the same order of the original matrix (they are permuted).

Is there a way to get these results in the same order of original matrix.

Bulent

↧

Xcode config scheme for composer 2015

October 6, 2014, 3:22 am

Latest and popular articles on Intel Technologies

≫ Next: mkl metis SIGSEGV

≪ Previous: pardiso_getdiag

I'm successfully using composer 2015 Fortran via command line but I'm attempting to integrate into Xcode 5 (Xcode 6.x is not working yet)

I'm using one of the demo programs to test compilation and linking with libs (using random no generators , mersenne twister etc)

If I try to run in Xcode I get the following linker error after setting the DYLD_LIBRARY_PATH in Xcode

Undefined symbols for architecture x86_64:"_vdrnguniform_", referenced from:
      _MAIN__ in iforta5IzJW.o"_vsldeletestream_", referenced from:
      _MAIN__ in iforta5IzJW.o"_vslnewstream_", referenced from:
      _MAIN__ in iforta5IzJW.o
ld: symbol(s) not found for architecture x86_64

I can compile and run from the command line using the following

ifort mkl_vsl_uniform.f90   $MKLROOT/lib/libmkl_blas95_ilp64.a $MKLROOT/lib/libmkl_intel_ilp64.a $MKLROOT/lib/libmkl_core.a $MKLROOT/lib/libmkl_intel_thread.a   -o mkl_vsl_uniform

Removing $MKLROOT/lib/libmkl_intel_ilp64.a from the compilation gives the same error. So my question, is my DYLD_LIBRARY_PATH set correctly or am I missing a linker setting somewhere? The following is my current command line env

      Intel(R) Math Kernel Library (Intel(R) MKL) Link Tool v4.0
       ==========================================================

Output
======

Compiler option(s):
 -I/opt/intel/composer_xe_2015.0.077/mkl/include

Linking line:
 -L/opt/intel/composer_xe_2015.0.077/mkl/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm

Environment variable(s):
export DYLD_LIBRARY_PATH=/opt/intel/composer_xe_2015.0.077/mkl/../compiler/lib:/opt/intel/composer_xe_2015.0.077/mkl/lib:$DYLD_LIBRARY_PATH;

↧

mkl metis SIGSEGV

October 6, 2014, 12:31 pm

Latest and popular articles on Intel Technologies

≫ Next: Question about MKL FEAST MPI

≪ Previous: Xcode config scheme for composer 2015

We have a symmetric matrix (attached). MKL crashes on metis(see the attached callstack). Can you take a look?

symmetric_matrix.txt:

first line: number of eqautions (n)

second line: index based (0 means 0 based)

third line: number of nonzeros (nz)

rest: non zeros in triplet format (r c a)

Thanks,

Attachment	Size
Download symmetric_matrix.txt	298.75 KB
Download callback.txt	870 bytes

↧

Question about MKL FEAST MPI

October 6, 2014, 1:59 pm

Latest and popular articles on Intel Technologies

≫ Next: Standalone MKL availability

≪ Previous: mkl metis SIGSEGV

Hi,

I'm trying to figure out how to use the MPI FEAST eigensolver in MKL. In the C-MPI/3_sparse examples in FEAST, the matrices are created on all processors and not distributed. Is there any way to distribute sparse matrices over all processes to solve for eigenvalues?

Thanks,

Harshad

↧

Standalone MKL availability

October 6, 2014, 5:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Citation for non-linear optimizor "dtrnlspbc_solve"

≪ Previous: Question about MKL FEAST MPI

I am sure MKL was listed as a product a few weeks ago. Now it is not available anywhere on the DZ site (see https://software.intel.com/en-us/articles/try-buy-tools, for example).

Has it been discontinued as a standalone product? That would be an immense tragedy :(

↧

Citation for non-linear optimizor "dtrnlspbc_solve"

October 7, 2014, 10:45 am

Latest and popular articles on Intel Technologies

≫ Next: Elements of functional analysis

≪ Previous: Standalone MKL availability

Hello,

I'm using dtrnlspbc_solve to optimize some ecological model parameters. I checked the manual (Intel(R) Math Kernel Library Reference Manual) and it mentions that this solver uses the trust-region algorithm. Is there a paper or a book that I can cite when I describe this solver? Thanks.

Maosi

↧

Elements of functional analysis

October 8, 2014, 5:29 am

Latest and popular articles on Intel Technologies

≫ Next: cluster_sparse_solver can not release physical memory for rank=0 process

≪ Previous: Citation for non-linear optimizor "dtrnlspbc_solve"

Good day!

I'm sorry if my question being improper but I really want to know if there is any features for functional manipulating in Intel MKL? More precisely, I've got now a task to solve a system of non-linear equations. To do that I have to obtain partial derivatives of my functions. Can I do this using a standart method of MKL and what is the way of such a realization?

↧

cluster_sparse_solver can not release physical memory for rank=0 process

October 8, 2014, 4:41 pm

Latest and popular articles on Intel Technologies

≫ Next: [icc + MKL 11.1 + NumPy] numpy.test() error: undefined symbol _intel_fast_memset

≪ Previous: Elements of functional analysis

Hi:

I find out that cluster_sparse_solver can not release physical memory for rank=0 process.

I apply two processes and two threads for each process to solve mtype=6, complex and symmetric matrix and use distribute assembled matrix input format as well as distribute RHS elements. The full example is in the attachment file.

The test I do is use literation statement to do the same calculation again and again. The result for every loop is correct. And the physical memory for rank=1 remains the same for each loop. However, the physical memory for rank=0 keeps on climbing. My computer has 16G memory and the physical memory occupation is shown below:

loop rank=0(%) rank=1(%)

phase 11 phase 23

0 4.7 6.5 4.6

1 5.7 7.4

2 6.6 8.3

3 7.5 9.3

4 8.4 10.2

5 9.4 11.2

I use the following command to compile: mpic++ -cxx=icpc -std=c++1y -mkl -xHost plain.cpp

and to run: mpiexec -n 2 ./a.out

I use mpich 3.1, mkl 11.2, icpc 15.0.0 on linux 64

Attachment	Size
Download mkl.tar.bz2	63.82 MB

↧

[icc + MKL 11.1 + NumPy] numpy.test() error: undefined symbol _intel_fast_memset

October 9, 2014, 6:28 am

Latest and popular articles on Intel Technologies

≫ Next: jacobian calculation

≪ Previous: cluster_sparse_solver can not release physical memory for rank=0 process

I am trying to compile NumPy 1.8.1 with icc 14.0.1 20131008 and MKL 11.1.
The compilation succeeds, but when I run numpy.test() the test fails with this error message:

ImportError: /tmp/tmpPXvMOC/test_array_from_pyobj_ext.so: undefined symbol: _intel_fast_memset

I am following Intel's instructions, I run intel's compilervars.sh to setup the environment and then build numpy with this command:

python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install --prefix=$HOME/.local 2>&1 | tee -a BUILD.log

With nm I found out that the _intel_fast_memset symbol is defined in libirc.so. I tried to add irc in site.cfg, but the error persists.
I also tried to compile numpy 1.9.0 and I still get the same error. I also tried to manually specify the libs in site.cfg instead of relying on mkl_rt, but the result is always the same.

The lib directories of the intel compiler are correctly set in the LD_LIBRARY_PATH, and the intel dynamic libraries are linked to _dotblas.so.

ldd build/lib.linux-x86_64-2.7/numpy/core/_dotblas.so
	linux-vdso.so.1 =>  (0x00007fffa1dff000)
	libmkl_def.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_def.so (0x00007f8e60281000)
	libmkl_intel_lp64.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f8e5fb3d000)
	libmkl_intel_thread.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_thread.so (0x00007f8e5eb7e000)
	libmkl_core.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_core.so (0x00007f8e5d4c0000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8e5d28e000)
	libpython2.7.so.1.0 => /cineca/prod/tools/python/2.7.5/gnu--4.6.3/lib/libpython2.7.so.1.0 (0x00007f8e5ceb4000)
	libimf.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libimf.so (0x00007f8e5c9ec000)
	libsvml.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libsvml.so (0x00007f8e5bdf5000)
	libirng.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libirng.so (0x00007f8e5bbed000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f8e5b969000)
	libiomp5.so => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libiomp5.so (0x00007f8e5b64e000)
	libgcc_s.so.1 => /cineca/prod/compilers/gnu/4.6.3/none/lib64/libgcc_s.so.1 (0x00007f8e5b439000)
	libintlc.so.5 => /cineca/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libintlc.so.5 (0x00007f8e5b1e2000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f8e5ae4e000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f8e5ac4a000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003b0a000000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007f8e5aa46000)

My site.cfg, the numpy.test() log and the build log are attached, please let me know if you need anything else.
Thank you in advance for your help!

Attachment	Size
Download BUILD.log	243.24 KB
Download numpy.test_.log	6.74 KB
Download site.cfg__0.txt	829 bytes

↧

jacobian calculation

October 12, 2014, 5:40 am

Latest and popular articles on Intel Technologies

≫ Next: Help with Sparse matrix vector multiplication using MKL DIA routine

≪ Previous: [icc + MKL 11.1 + NumPy] numpy.test() error: undefined symbol _intel_fast_memset

I have been doing some development of neural nets using ifort. The current topic being Levenberg Marquardt in which the jacobian is normally calculated by backpropagation and then used to create a quasi Hessian and an eqn of form Ax = b is solved to find x, the delta in network weights.

I came across the jacobi family of routines the other day and was wondering if they would be useful as an alternate way of jacobian calculation. Most textbooks say that the jacobian can be calculated vi backpropagation or finite differences. Is djacobi using finite differences to calculate the jacobian?

Is one method superior to another ie backprop vs finite difference - obviously with neural nets one of the key factors is speed and performance of algorithm

↧

Help with Sparse matrix vector multiplication using MKL DIA routine

October 13, 2014, 9:34 am

Latest and popular articles on Intel Technologies

≫ Next: ACADEMICS

≪ Previous: jacobian calculation

I am using the MKL library to perform the sparse matrix vector multiplication using diagonal format, When I use the MKL mkl_sdiagemv function I get a "MKL ERROR: Parameter 4 was incorrect on entry to MKL_SDIAGEMV. " error.

void mm_read(char* filename, int *m, int *n, int *nnz, int **rowptrs, int **colinds, float **vals, float **adia, int **distance, int idiag, int ndiag) {
// open file
FILE* mmfile = fopen(filename, "r");
assert(mmfile != NULL && "Read matrix file.");

// read MatrixMarket header
int status;
MM_typecode matcode;
status = mm_read_banner(mmfile, &matcode);
assert(status == 0 && "Parsed banner.");



status = mm_read_mtx_crd_size(mmfile, m, n, nnz);
assert(status == 0 && "Parsed matrix m, n, and nnz.");
printf("- matrix is %d-by-%d with %d nnz.\n", *m, *n, *nnz);


int *coo_rows = (int*) malloc(*nnz * sizeof(int));
int *coo_cols = (int*) malloc(*nnz * sizeof(int));
float *coo_vals = (float*) malloc(*nnz * sizeof(float));

// read COO values
int i = 0;
for ( i = 0; i < *nnz; i++)
    status = fscanf(mmfile, "%d %d %g\n",&coo_rows[i], &coo_cols[i], &coo_vals[i]);


*rowptrs = (int*) malloc((*m+1)*sizeof(int));
*colinds = (int*) malloc(*nnz*sizeof(int));
*vals = (float*) malloc(*nnz*sizeof(int));

// convert to CSR matrix
int info;
int job[] = {
    2, // job(1)=2 (coo->csr with sorting)
    1, // job(2)=1 (one-based indexing for csr matrix)
    1, // job(3)=1 (one-based indexing for coo matrix)
    0, // empty
    *nnz, // job(5)=nnz (sets nnz for csr matrix)
    0  // job(6)=0 (all output arrays filled)
};

       mkl_scsrcoo(  job,   m, *vals,  *colinds, *rowptrs,  nnz,   coo_vals,   coo_rows,    coo_cols,  &info);
      assert(info == 0 && "Converted COO->CSR");



 // DIA matrix dimensions and values

ndiag = 4;
idiag = 3;

*adia = (float*) malloc(*nnz * idiag * sizeof(int));
*distance = (int* ) malloc(idiag * sizeof(int));


       int job1[] = {
    0, // job(0)=2
    1, // job(1)=1
    1, // job(2)=1
    2, // empty3
    *nnz, // empty4
    10, // job(5)=nnz
};

mkl_scsrdia (job1, m, *vals, *colinds, *rowptrs,  *adia,  &ndiag, *distance, &idiag, *vals, *colinds, *rowptrs, &info);
  assert(info == 0 && "Converted CSR->DIA");


// free COO matrix
free(coo_rows);
free(coo_cols);
free(coo_vals);   }



 float * randvec(int n) {
float *v = (float*) malloc(n * sizeof(float));
int i = 0;
for (i = 0; i < n; i++)
    v[i] = rand() / (float) RAND_MAX;
return v;
}


   int main(int argc, char* argv[]) {

// require filename for matrix to test
if (argc != 2) {
    fprintf(stderr, "Usage: %s MATRIX.mm\n", argv[0]);
    exit(1);
}

int m, n, nnz;
int *rowptrs, *colinds;
float *vals;

float *adia;
int  *distance;
int idiag,  ndiag;
// read matrix from file
mm_read(argv[1], &m, &n, &nnz, &rowptrs, &colinds, &vals, &adia, &distance, &idiag, &ndiag);

// allocate vectors for computation
float *v = randvec(n);
float *cpu_answer = (float*) malloc(m*sizeof(float));


struct timeval start, end;
printf (" Running Intel(R) MKL from 1 to %i threads \n\n", mkl_get_max_threads());


mkl_sdiagemv ((char*)"N", &m, adia, &idiag, distance, &ndiag, v, cpu_answer);


// release memory
free(rowptrs);
free(colinds);
free(vals);
free(cpu_answer);
free(v);
free(adia);
free(distance);

return 0; }

↧

ACADEMICS

October 14, 2014, 3:24 am

Latest and popular articles on Intel Technologies

≫ Next: Using the Feast Eigen Solver

≪ Previous: Help with Sparse matrix vector multiplication using MKL DIA routine

Hello

I have not been able to find the compatible mkl libraries for ubuntu 14.04 LTS version. Due to this i had to downgrade my system to 13.04 version of ubuntu. Therefore, i would like to request you to please provide me with link from where i can download the mkl for 14.04, 64 bit ubuntu version.

Thank you

AKSHAY BHATNAGAR

↧