FFT and MKL Problems

May 30, 2016, 2:57 pm

Latest and popular articles on Intel Technologies

≪ Previous: Help with vdrnggaussian: floating divide by zero

The program does not return from the call from the main program to the subroutine FFT().

MKLVARS.BAT does not seem to set the path or environment variables.

This is the latest preview and VS 2015.

Attachment	Size
Download Program073 - Wulf.zip	8.15 MB

↧

Performance bug in GEMM?

June 4, 2016, 4:30 am

Latest and popular articles on Intel Technologies

≫ Next: mkl and ipp merge modules

≪ Previous: FFT and MKL Problems

Hi all,

I just noticed a potential performance bug in the DGEMM implementation of MKL (16.0.1)
when using a single thread. I merely want to make someone at Intel aware of it, in case it is of interest.

Strangely DGEMM performs better for beta=1 than for beta=0 in certain
situations. Here is an example:

Intel(R) Xeon(R) CPU E5-2650:
m=72, n=373248, k=72, beta=0.00 : 14.25 GF
m=72, n=373248, k=72, beta=1.00 : 18.36 GF

Intel(R) Xeon(R) CPU E5-2650:
m=72, n=373248, k=72, beta=0.00 : 19.25 GF
m=72, n=373248, k=72, beta=1.00 : 28.34 GF

As you can see, the performance difference is significant. It is
actually so significant that it pays off to set C to zero explicitly
before calling MKL and then using the more efficient beta=1
implementation instead.

Here is a quick test driver:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

extern "C"
int dgemm_(char *transa, char *transb, int *m, int *
      n, int *k, double *alpha, double *a, int *lda,
      double *b, int *ldb, double *beta, double *c, int *ldc);

void trashCache(float* trash1, float* trash2, int nTotal){
   for(int i = 0; i < nTotal; i ++)
      trash1[i] += 0.99 * trash2[i];
}

int main(int argc, char ** argv)
{
  if(argc < 2 ){
   printf("Usage: <beta>\n");
   exit(-1);
  }
  float *trash1, *trash2;
  int nTotal = 1024*1024*100;
  trash1 = (float*) malloc(sizeof(float)*nTotal);
  trash2 = (float*) malloc(sizeof(float)*nTotal);

  int m = 72;
  int n = 72*72*72;
  int k = 72;
  double flops = 2.E-9 * m*n*k;
  double alpha=1;
  double beta=atof(argv[1]);
  double *A, *B, *C;
  int ret = posix_memalign((void**) &A, 64, sizeof(double) * m*k);
  ret += posix_memalign((void**) &B, 64, sizeof(double) * n*k);
  ret += posix_memalign((void**) &C, 64, sizeof(double) * m*n);

  double minTime = 1e100;
  for (int i=0; i<3; i++){
     trashCache(trash1, trash2, nTotal);
     double t = omp_get_wtime();
     dgemm_("T", "N", &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
     t = omp_get_wtime() - t;
     minTime = (minTime < t) ? minTime : t;
  }
  printf("m=%d, n=%d, k=%d, beta=%.2f : %.2lf GF\n", m,n,k,beta,flops/minTime);

  free(A);
  free(B);
  free(C);
  free(trash1);
  free(trash2);
  return 0;
}

Best, Paul

↧

mkl and ipp merge modules

June 3, 2016, 3:33 pm

Latest and popular articles on Intel Technologies

≫ Next: Printing Constant from mkl library

≪ Previous: Performance bug in GEMM?

I was tasked with creating an installer for an application that was built using the mkl libraries. It seems to depend on "compiler, "mkl", and "ipp". I see that the compiler part of it has a merge module which I can use directly, but I cannot seem to find a "mkl" or "ipp" merge module. Do they exist?

↧

Printing Constant from mkl library

June 8, 2016, 12:06 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL Quad precision

≪ Previous: mkl and ipp merge modules

Hi.

I am trying to compile the following program (test.c) to find value of a constant in the mkl 10.0.4.023.

#include<stdio.h>
#include "mkl.h"
main()
{
	int a = SPARSE_INDEX_BASE_ZERO;
	printf("%d", a);

}

with the following commands:

source compilervars.sh intel64

icc test.c -DMKL_ILP64 -I${MKLROOT}/include -L${MKLROOT}/lib/mic -lmkl_intel_ilp64 -lmkl_core -l mkl_sequential -lpthread -lm -ldl

The message that I get is:

-bash: MKLROOT: command not found
-bash: MKLROOT: command not found
find_constant.c(7): error: identifier "SPARSE_INDEX_BASE_ZERO" is undefined
int a = SPARSE_INDEX_BASE_ZERO;
^

compilation aborted for find_constant.c (code 2)

Is the program that I write correct?

If not, how to find out the value of the constants used in mkl?

Thank you.

↧

MKL Quad precision

June 8, 2016, 11:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Unable to find reference to fftw3 subroutine

≪ Previous: Printing Constant from mkl library

Hello,

we are using MKL libraries in our C# code for our purposes, in particular "?gesv"(zgesv) , linear system resolution. In some cases the results precision is not enought so I would like to know if it is possible to encrease this for example implementing a quad precision calculation.

I don't know how to do that and if it is possible, this is why I ask to someone here if this can be done and how.

Thank you very much

Gianluca

↧

Unable to find reference to fftw3 subroutine

June 9, 2016, 4:56 am

Latest and popular articles on Intel Technologies

≫ Next: Inspector Executor COO

≪ Previous: MKL Quad precision

Hello,
In my application i am trying to use Intel MKL wrappers for FFTW3 to call from Fortran code and followed these steps:

To build the fftw3 wrapper library, I followed Intel procedure explained in https://software.intel.com/en-us/node/471470#68C58A01-0636-463B-8A6B-38C376D600B9
I built using following command within the 'fftw3x_cdft' directory under $MKLROOT/interfaces :

make libintel64

which will be compiled by Intel compiler and Intel MPI and places the built library "libfftw3x_cdft_lp64.a" in
$MKLROOT/lib/intel64/ and

$MKLROOT/lib/intel64_lin/

I have added required links to MKL as follows:

-Wl,--start-group  ${MKLROOT}/lib/intel64/libfftw3x_cdft_ilp64.a ${MKLROOT}/lib/intel64/libmkl_cdft_core.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -I${MKLROOT}/include -I${MKLROOT}/include/fftw -mkl -lfftw3x_cdft_lp64

after building I am trying to use the library without any modifications in the fftw3 calling convention in Fortran at two places as below:

 call fftw_mpi_execute_dft_r2c(p,q,r)
 call fftw_mpi_execute_dft_c2r(x,y,z)

But while compiling the application its throwing error in following function:

inv.o: In function `inverse_fourier_':
inv.f90:(.text+0x86d): undefined reference to `fftw_mpi_execute_dft_c2r'So, I tried checking whether the library is having wrapper for the subroutines or not.
Surprisingly it has the wrapper definition for both, but the build process didn't build the second wrapper subroutine and binary didn't have code for it.
I confirmed this with greping under $MKLROOT:

$ grep -rn "fftw_mpi_execute_dft_r2c" *

include/fftw/fftw3-mpi.f03:409:    subroutine fftw_mpi_execute_dft_r2c(p,in,out) bind(C, name='fftw_mpi_execute_dft_r2c')
include/fftw/fftw3-mpi.f03:414:    end subroutine fftw_mpi_execute_dft_r2c
Binary file interfaces/fftw3x_cdft/obj_intel64_lp64/execute.o matches
Binary file interfaces/fftw3x_cdft/obj_intel64_ilp64/execute.o matches
Binary file lib/intel64_lin/obj_intel64_lp64/execute.o matches
Binary file lib/intel64_lin/libfftw3x_cdft_ilp64.a matches
Binary file lib/intel64_lin/libfftw3x_cdft_lp64.a matches
Binary file lib/intel64/obj_intel64_lp64/execute.o matches
Binary file lib/intel64/libfftw3x_cdft_ilp64.a matches
Binary file lib/intel64/libfftw3x_cdft_lp64.a matches

$ grep -rn "fftw_mpi_execute_dft_c2r" *

include/fftw/fftw3-mpi.f03:416:    subroutine fftw_mpi_execute_dft_c2r(p,in,out) bind(C, name='fftw_mpi_execute_dft_c2r')
include/fftw/fftw3-mpi.f03:421:    end subroutine fftw_mpi_execute_dft_c2r

Why the the build process not creating executable code for the c2r subroutine? Could any one help me out how to resolve the error.

↧

Inspector Executor COO

June 9, 2016, 10:31 am

Latest and popular articles on Intel Technologies

≫ Next: Memory buffer for in-place multi-dimensional FFT on clusters

≪ Previous: Unable to find reference to fftw3 subroutine

Hi,

I tried Inspector Executor using CSR storage format and it is fine. But when I tried the COO version, I get

SPARSE_STATUS_NOT_SUPPORTED error, in the mkl_sparse_optimize function. Here is a my code:

//Get COO Arrays

//IF you need the implementation of this function I can provide it, I can sort based on column or row used both same error

    ExtractCoo(argv[1], &nr, &nc, &nnz, &rows, &columns, &A);

    float * x = (float*)malloc(nc * sizeof(float)),
        *y = (float*)malloc(nr * sizeof(float));


    //3) Prepare X Array
    for (int i = 0; i < nc; i++)
    {
        x[i] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);

    }

CALL_AND_CHECK_STATUS(
            mkl_sparse_s_create_coo(&cooInternal, SPARSE_INDEX_BASE_ONE, nr, nc,nnz, rows, columns, A),"Error in csrCreate \n");

 CALL_AND_CHECK_STATUS(
            mkl_sparse_set_mv_hint(cooInternal, SPARSE_OPERATION_NON_TRANSPOSE, martixDescription, runs),
            "Error after Sparse Hint \n");

 mkl_sparse_set_memory_hint (cooInternal, SPARSE_MEMORY_AGGRESSIVE);

CALL_AND_CHECK_STATUS(
            mkl_sparse_optimize(cooInternal),
            "Error after MKL_SPARSE_OPTIMIZE \n"); // Here I get an error

 // Cold Start
mkl_sparse_s_mv(SPARSE_OPERATION_NON_TRANSPOSE, 1, cooInternal, martixDescription, x, 0, y);

 for (int i = 0; i < runs; i++)
{
            stime = dsecnd();
            mkl_sparse_s_mv(SPARSE_OPERATION_NON_TRANSPOSE, 1, cooInternal, martixDescription, x, 0, y);
            etime = dsecnd();
            runResults[i] = (etime - stime);
 }

So what am I doing wrong ?

Thanks,

Mohammad Almasri

↧

Memory buffer for in-place multi-dimensional FFT on clusters

June 12, 2016, 2:23 am

Latest and popular articles on Intel Technologies

≫ Next: testing -safe to delete 20160614 - no attachment

≪ Previous: Inspector Executor COO

Hi,

I'm trying to compute in-place FFT of 3-dim arrays on clusters. As far as I have tried using the MKL FFTW3 wrapper, a buffer memory of the same amount as the original array seems to be allocated on creating a FFTW plan. Due to the limitation of available memory, I would like to reduce the size of memory buffer.

Is there any way to control the size of buffer memory using the MKL FFTW3 wrapper? If this is not possible with the FFTW3 wrapper, I would also like to know if this is possible at the level of the MKL FFT routines.

Best, -T

↧

testing -safe to delete 20160614 - no attachment

June 14, 2016, 11:55 am

Latest and popular articles on Intel Technologies

≫ Next: Directly calling mkl from python, and try to use more than one thread.

≪ Previous: Memory buffer for in-place multi-dimensional FFT on clusters

test safe to delete

↧

Directly calling mkl from python, and try to use more than one thread.

June 14, 2016, 12:32 pm

Latest and popular articles on Intel Technologies

≫ Next: Dense matrix multiply tansposed Sparse Matrix?

≪ Previous: testing -safe to delete 20160614 - no attachment

I am doing some sparse matrix calculation, and called mkl directly from python.

That worked, but only a single thread is used. When I use the top command, one of the cpu core has 100% usage, other cpu cores has about 0% usage.

How to make the mkl function use multiple threads?

I have tried setting the OMP_NUM_THREADS, MKL_NUM_THREADS, MKL_DOMAIN_NUM_THREADS environmental variables to 12.

The code also try to set number of mkl threads to 12 by mkl.mkl_set_num_threads(byref(c_int(num_cpu)))

Does the sparse matrix routines of mkl support multithreading calculation?

The mkl is the 2016 version.

Thank you.

The code is below:

from ctypes import *
import scipy.sparse as spsp
import numpy as np
import multiprocessing as mp

# Load the share library
mkl = cdll.LoadLibrary("libmkl_rt.so")


def get_csr_handle2(data, indices, indptr, shape):
	a_pointer   = data.ctypes.data_as(POINTER(c_float))
	ja_pointer  = indices.ctypes.data_as(POINTER(c_int))
	ia_pointer  = indptr.ctypes.data_as(POINTER(c_int))
	return (a_pointer, ja_pointer, ia_pointer, shape)


def get_csr_handle(A,clear=False):
	if clear == True:
		A.indptr[:] = 0
		A.indices[:] = 0
		A.data[:] = 0
	return get_csr_handle2(A.data, A.indices, A.indptr, A.shape)


def csr_t_dot_csr(A_handle, C_handle, nz=None):
	# Calculate (A.T).dot(A) and put result into C
	#
	# This uses one-based indexing
	#
	# Both C.data and A.data must be in np.float32 type.
	#
	# Number of nonzero elements in C must be greater than
	#     or equal to the size of C.data
	#
	# size of C.indptr must be greater than or equal to
	#     1 + (num rows of A).
	#
	# C_data    = np.zeros((nz), dtype=np.single)
	# C_indices = np.zeros((nz), dtype=np.int32)
	# C_indptr  = np.zeros((m+1),dtype=np.int32)

	(a_pointer, ja_pointer, ia_pointer, A_shape) = A_handle
	(c_pointer, jc_pointer, ic_pointer, C_shape) = C_handle

	trans_pointer   = byref(c_char('T'))
	sort_pointer    = byref(c_int(0))

	(m, n)          = A_shape
	sort_pointer        = byref(c_int(0))
	m_pointer           = byref(c_int(m))     # Number of rows of matrix A
	n_pointer           = byref(c_int(n))     # Number of columns of matrix A
	k_pointer           = byref(c_int(n))     # Number of columns of matrix B
	                                          # should be n when trans='T'
						  # Otherwise, I guess should be m
	###
	b_pointer   = a_pointer
	jb_pointer  = ja_pointer
	ib_pointer  = ia_pointer
	###
	if nz == None:
		nz = n*n #*n # m*m # Number of nonzero elements expected
			 # probably can use lower value for sparse
			 # matrices.
	nzmax_pointer   = byref(c_int(nz))
	 # length of arrays c and jc. (which are data and
	 # indices of csr_matrix). So this is the number of
	 # nonzero elements of matrix C
	 #
	 # This parameter is used only if request=0.
	 # The routine stops calculation if the number of
	 # elements in the result matrix C exceeds the
	 # specified value of nzmax.

	info = c_int(-3)
	info_pointer = byref(info)
	request_pointer_list = [byref(c_int(0)), byref(c_int(1)), byref(c_int(2))]
	return_list = []
	for ii in [0]:
		request_pointer = request_pointer_list[ii]
		ret = mkl.mkl_scsrmultcsr(trans_pointer, request_pointer, sort_pointer,
				    m_pointer, n_pointer, k_pointer,
				    a_pointer, ja_pointer, ia_pointer,
				    b_pointer, jb_pointer, ib_pointer,
				    c_pointer, jc_pointer, ic_pointer,
				    nzmax_pointer, info_pointer)
		info_val = info.value
		return_list += [ (ret,info_val) ]
	return return_list

def test():
	num_cpu = 12
	mkl.mkl_set_num_threads(byref(c_int(num_cpu))) # try to set number of mkl threads
	print "mkl get max thread:", mkl.mkl_get_max_threads()
	test_csr_t_dot_csr()

def test_csr_t_dot_csr():
	AA = np.random.choice([0,1], size=(12,750000), replace=True, p=[0.99,0.01])
	A_original = spsp.csr_matrix(AA)
	A = A_original.astype(np.float32).tocsc()
	A = spsp.csr_matrix( (A.data, A.indices, A.indptr) )

	A.indptr  += 1 # convert to 1-based indexing
	A.indices += 1 # convert to 1-based indexing
	A_ptrs = get_csr_handle(A)

	C = spsp.csr_matrix( np.ones((12,12)), dtype=np.float32)
	C_ptrs = get_csr_handle(C, clear=True)

	print "=call mkl function="

	while (True):
		return_list = csr_t_dot_csr(A_ptrs, C_ptrs)

if __name__ == "__main__":
	test()

So far, numpy linked with mkl can use multiple threads in the following code without setting any environment variables.

import ctypes
mkl = ctypes.cdll.LoadLibrary("libmkl_rt.so")
print mkl.mkl_get_max_threads()
import numpy as np

a = np.random.normal( 0,1, (100,1000))

while True:
        a.dot(a.T)

↧

Dense matrix multiply tansposed Sparse Matrix?

June 14, 2016, 3:48 pm

Latest and popular articles on Intel Technologies

≫ Next: complex auto correlation using MKL

≪ Previous: Directly calling mkl from python, and try to use more than one thread.

Hi,

I want to use MKL Sparse to compute a dense matrix multiply transposed sparse matrix. Particularly, if I have input A(dense, m-by-k), B(sparse, n-by-k), I want to compute:

C = A * B'

where C is a m-by-n dense matrix. Notice that the position of matrix is different from what mkl_?csrmm(https://software.intel.com/zh-cn/node/520832) defines. I am thinking about transpose the A and C and transpose them back after computation. But there is too much memory consumption when A and C are both big. Any thoughts on that?

Thanks!

↧

complex auto correlation using MKL

June 15, 2016, 10:28 am

Latest and popular articles on Intel Technologies

≫ Next: Dummy Libraries Suddenly are Desired

≪ Previous: Dense matrix multiply tansposed Sparse Matrix?

I am trying to use MKL for autocorrelation of complex vectors XX:

complex(kind=4),allocatable,dimension(:) :: XX, CXX

COMPLEX(kind=4),allocatable,dimension(:) :: CRXX

.......

! Do the 1 dimensional complex autocorrelation.
status = vslccorrnewtask1d(task, VSL_CORR_MODE_FFT, NZONE, NZONE, NZONE)
status = vslccorrexec1d(task, XX, 1, XX, 1, CRXX, 1)

I have a very naive implementation of auto correlation with complex numbers which I used to verify the result of MKL's implementation. And it suggest that the result is incorrect. The only way I can make it work is to pass conjugate of XX instead of XX as the second input vector. This doesn't make too much sense to me and also the result seems to be scaled by 2.

Appreciate if some one can point out what I have done wrong in the above code and how can I get correct result?

Regards,

↧

Dummy Libraries Suddenly are Desired

June 15, 2016, 7:06 pm

Latest and popular articles on Intel Technologies

≫ Next: Gaussian Random Numbers

≪ Previous: complex auto correlation using MKL

All,

Earlier today, I went and recompiled some code after something was accidentally deleted. Much to my surprise, the following errors occured:

icpc: error #10236 File not found: '/opt/intel/composer_xe_2015.2.164/mkl/lib/em64t/libmkl_lapack.a'

icpc: error #10236 File not found: '/opt/intel/composer_xe_2015.2.164/mkl/lib/em64t/libmkl_em64t.a'

icpc: error #10236 File not found: '/opt/intel/composer_xe_2015.2.164/mkl/lib/em64t/libguide.a'

Now I now what you're all thinking. "Gee, check out all the documentation from before that says those are dummy libraries that were dropped in 2010...". Here's the strange thing. I've never had em64t in my path at all. My path is mkl/lib/intel64, and this code has worked perfectly for some time with the current compilers. So I'm very confused. At first I thought 'I'll just link em64t to intel64', but I would need to still create those dummy libraries. I've attempted to look into this...but I'll be honest, I'm not too familiar with how to create my own links (yes I read the documentation and exactly what MKLINCLUDE or MKLPATH are meant to be are never really called out). Has anyone gone through the trouble to create these dummy libraries themselves? If so, what is the exact syntax that should be used? I think I can figure it out. MKL Path in my case would be /opt/intel/composer_xe_2015.2.164/mkl/lib/...but I'm not sure what MKLINCLUDE would be. Lastly, if I know what MKLINCLUDE was I think I could create libmkl_em64t, but I can't figure out how to do the others. I looked here to see layers for libmkl_em64t.

↧

Gaussian Random Numbers

June 16, 2016, 7:18 am

Latest and popular articles on Intel Technologies

≫ Next: Writing c extension for python that calls mkl

≪ Previous: Dummy Libraries Suddenly are Desired

Dear Intel:

I have been using the BM Gaussian Random Number routines. These were developed in about '58 for the Army by Muller and Box at Princeton. Your routines reference one of their minor notes as providing the basis for your routines. The real math is contained in Technical Report 9 and 13 from the Princeton -- Number 9 is scanned but is quite hard to get a hold off - one has to ask for it as a PDF and the quality is not good and 13 - we have asked for it to be scanned and hopefully it will become available.

I strongly suggest you get a hold of copies and get them onto your website, they provide a lot of detail that makes using them a lot easier.

John

↧

Writing c extension for python that calls mkl

June 16, 2016, 9:46 am

Latest and popular articles on Intel Technologies

≫ Next: dfeast_scsrev problem

≪ Previous: Gaussian Random Numbers

I tried directly calling the mkl from python with ctypes, but in that case, mkl can only use a single cpu. The cause of that problem is unknown.

I am writing a c extension for python that calls the mkl as an alternative approach.

The following c extension can be imported into python without problem. However, when I call the function, it created the following error message:

Intel MKL FATAL ERROR: Cannot load libmkl_mc.so or libmkl_def.so

What is the correct options for the icc compiler that I should use in setup.py?

I get some of the options in the setup.py from the intel link line advisor. I can't put all the options into setup.py.

mkl_helper.h

#include "Python.h"
#include "mkl.h"
#include "numpy/arrayobject.h"

static PyObject* test4 (PyObject *self, PyObject *args)
{
	// test4 (m, n,
	//        a, ja, ia,
	//        c, jc, ic)

	PyArrayObject *shape_array;
	PyArrayObject *a_array;   // csr_matrix.data
	PyArrayObject *ja_array;  // csr_matrix.indices
	PyArrayObject *ia_array;  // csr_matrix.indptr
	PyArrayObject *c_array;
	PyArrayObject *jc_array;
	PyArrayObject *ic_array;

	if (!PyArg_ParseTuple(args, "O!O!O!O!O!O!O!",&PyArray_Type, &shape_array,&PyArray_Type, &a_array,&PyArray_Type, &ja_array,&PyArray_Type, &ia_array,&PyArray_Type, &c_array,&PyArray_Type, &jc_array,&PyArray_Type, &ic_array))
	{
		return NULL;
	}

	int  * ptr_int     = shape_array->data;
	int m               = ptr_int[0];
	int n               = ptr_int[1];
	int k               = n;

	float *  a_data_ptr =  a_array->data;
	float * ja_data_ptr = ja_array->data;
	float * ia_data_ptr = ia_array->data;
	float *  c_data_ptr =  c_array->data;
	float * jc_data_ptr = jc_array->data;
	float * ic_data_ptr = ic_array->data;

	char trans  = 'T';
	int sort    = 0;
	int nzmax   = n*n;
	int info    = -3;
	int request = 0;

	mkl_scsrmultcsr(&trans, &request, &sort,&m, &n, &k,
			    a_data_ptr, ja_data_ptr, ia_data_ptr,
			    a_data_ptr, ja_data_ptr, ia_data_ptr,
			    c_data_ptr, jc_data_ptr, ic_data_ptr,&nzmax, &info);

	return PyInt_FromLong(info);
}


static struct PyMethodDef methods[] = {
    {"test4", test4, METH_VARARGS, "test2(arr1)\n take a numpy array and return its shape as a tuple"},
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC
initmkl_helper (void)
{
    (void)Py_InitModule("mkl_helper", methods);
    import_array();
}

setup.py

from distutils.core import setup, Extension
import numpy as np
extra_link_args=["-Bstatic","-I${MKLROOT}/include", "-L{$MKLROOT}/lib/intel64/"]
extra_link_args += ["-mkl"]
extra_link_args += ["-lrt" ]
extra_link_args += ["-L${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a", "-L${MKLROOT}/lib/intel64/libmkl_core.a", "-L${MKLROOT}/lib/intel64/libmkl_intel_thread.a", "-lpthread", "-lm", "-ldl"]

extra_link_args += ["-DMKL_ILP64", "-qopenmp" ,"-I${MKLROOT}/include"]

ext_modules = [ Extension('mkl_helper', sources = ['mkl_helper.c'], extra_link_args=extra_link_args) ]


setup(
        name = 'mkl_helper',
        version = '1.0',
        include_dirs = [np.get_include()], #Add Include path of numpy
        ext_modules = ext_modules
)

test.py

import mkl_helper
import numpy as np

import numpy as np
import scipy.sparse as spsp

def get_csr_handle2(data, indices, indptr, shape):
        a_pointer   = data.ctypes.data_as(POINTER(c_float))
        ja_pointer  = indices.ctypes.data_as(POINTER(c_int))
        ia_pointer  = indptr.ctypes.data_as(POINTER(c_int))
        return (a_pointer, ja_pointer, ia_pointer, shape)

def get_csr_handle(A,clear=False):
        if clear == True:
                A.indptr[:] = 0
                A.indices[:] = 0
                A.data[:] = 0
        return get_csr_handle2(A.data, A.indices, A.indptr, A.shape)

print "test4"

test_size = 2
AA = np.random.choice([0,1], size=(test_size,750000), replace=True, p=[0.99,0.01])
A_original = spsp.csr_matrix(AA)
A = A_original.astype(np.float32).tocsc()
A = spsp.csr_matrix( (A.data, A.indices, A.indptr) )

A.indptr  += 1 # convert to 1-based indexing
A.indices += 1 # convert to 1-based indexing

C = spsp.csr_matrix( np.ones((test_size,test_size)), dtype=np.float32)

↧

dfeast_scsrev problem

June 17, 2016, 2:42 am

Latest and popular articles on Intel Technologies

≫ Next: Linpack runs only on cores, not threads?

≪ Previous: Writing c extension for python that calls mkl

Hi Everybody
I have a problem regarding eigenvalue calculations for my system with dfeast_scsrev
as you know this function supposed to solve standard eigenvalue problem for sparse matrices.
My code was working fine previously but since I changed to bigger system(65536x65536) I have a segmentation fault. when I run my code with gdb I will see that the memory issue is regarding function mkl_feast_dfeast_srci () which I do not call it in my program.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff54aa08a in mkl_feast_dfeast_srci () from /opt/apps/intel/2016-2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64/libmkl_core.so

this function belongs to Reverse Communication Interface which I am not using directly. Does any of you have any experience?
Thank you in advance.
Regards,

↧

Linpack runs only on cores, not threads?

June 17, 2016, 2:42 am

Latest and popular articles on Intel Technologies

≫ Next: License with Mathematica

≪ Previous: dfeast_scsrev problem

hi there,

I use the linpack binaries quite a long time for various stuff, most times for performance and stability diagnostics.

I recognized the with current versions of the binaries and current CPUs linpack does only run on the cores and not on th HT threads anymore (quite sure that this was different earlier); e.g. with a E5-2683 v4 (16 cores 32 threads) I see only 16 used CPUs in the O/S...

Any idea why this changed? Reason is CPU or binary?

Thank you for your help

Regards

Martin

↧

License with Mathematica

June 17, 2016, 5:33 am

Latest and popular articles on Intel Technologies

≫ Next: scalapack libraries not available

≪ Previous: Linpack runs only on cores, not threads?

Hi everybody,

I do have a licence question.

We are using functions from a Mathematica script inside our application. These generated DLLs from Mathematica(Wolfram) are using the MKL-dlls from Intel.

If we want to distribute this application, do we need to purchase an Intel-license by ourselves? Or is this already included in the license which is used by Wolfram/Mathematica?

Any help is appreciated!,

Kind regards,

Frank

↧

scalapack libraries not available

June 20, 2016, 10:39 am

Latest and popular articles on Intel Technologies

≫ Next: Poisson solver on a sphere

≪ Previous: License with Mathematica

Hi there,

I have downloaded and installed parallel_studio_xe_2016_update3. Then I have migrated a MPI project to the new machine. When I am trying to link with mkl_blas95_lp64.lib mkl_lapack95_lp64.lib mkl_scalapack_lp64.lib mkl_intel_lp64.lib mkl_core.lib mkl_sequential.lib mkl_blacs_msmpi_lp64.lib impi.lib

as the link line adviser https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor recommends - it is missing:

>ipo: error #11018: Cannot open mkl_blacs_msmpi_lp64.lib
1>LINK : fatal error LNK1181: cannot open input file 'mkl_scalapack_lp64.lib'

Then I have checked the paths but could not find these libraries in the installation. What might be wrong ?

I am going to rerun the installation to see whether I missed something. Is update3 a full fledged installation or do I have to install another package first ?

Many thanks in advance !

↧

Poisson solver on a sphere

June 22, 2016, 1:44 am

Latest and popular articles on Intel Technologies

≫ Next: BACON outlier detection

≪ Previous: scalapack libraries not available

Hi~

I'm trying to use Poisson Solver Routines to solver Poisson equation on a whole sphere. After reading the manual, I compile and execute the example code 'd_sph_with_poles_f.f90' with command 'make libintel64 function=d_sph_with_poles_f threading=sequential'. The solution of Helmholtz problem seems to be OK, but when I set q = 0.0D0 to solve the Poisson problem, I got the following error message:

"The problem is degenerate due to rounding errors. The approximate solution that provides the minimal Euclidean norm of the solution will be computed"

Disable 'if (stat.ne.0) goto 999' would allow me to get the solution. I change the parameter to 'np=360,nt=180' and plot the result. It seems to be OK.

I also test another Poisson problem:

u(ip,it) = cos(theta_i)*((sin(theta_i))**2.0)*cos(2.0*(phi_i-1.5))
f(ip,it) = -cos(2.0*(phi_i-1.5))*cos(theta_i)*(-4.0+4.0*(cos(theta_i))**2.0-8.0*(sin(theta_i))**2.0)

the result is also satisfying.

I found a previous topic about this problem ( https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/... ), which solve the problem in a similar way.

My question is : why did this warning occur? May the solution of Poisson problem solved by those routines still be trusted?

Thank you!

↧