Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

matrix inverse FLOPS

$
0
0

Hi , 

What should be the required FLOPS for 16x16 MKL_Complex8 matrix inversion using cpotrf and than cpotri ?

How many CPU clocks it should take on ATOM E3826 CPU and I5-3470 CPU ?

Is there any performance difference using Linux 32bit operating system vs Linux 64bit operating system ? (for those specific CPUs)

Thanks , Nimrod

 


Problems while using PARDISO

$
0
0

Hi,

I have a question about the PARDISO under Windows. I have written a fortran code which is to solve a very large sparse symmetric problem. And I used a subroutine to specify the parameters for PARDISO and solve the system. Following the demo given in the mkl folder, I included the mkl_pardiso.f90 file at the very beginning of the main program and also used the mkl_pardiso file. If I build the solution, it will give the following error message

Error  1  Compilation Aborted (code 1). 

And then if I comment out the mkl_pardiso.f90 inclusion, the error message disappears and the code can run without any further error and the solution results are correct. I also tried to clean the successfully built solution and keep the mkl_pardiso.f90 file commented out and then rebuild the solution. This time, there came more errors:

Error    3     error #6404: This name does not have a type, and must have an explicit type.   [PT]    

Error    2     error #6457: This derived type name has not been declared.   [MKL_PARDISO_HANDLE]   

Error    4     error #6458: This name must be the name of a variable with a derived type (structure type)   [PT]     

Error    1     error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [MKL_PARDISO]  

Error    5    Compilation Aborted (code 1)

So, the questions is why this happens and how can I correctly use the PARDISO in my subroutine? Any thoughts will be much appreciated.

 

warnings and remarks from MKL, what is the reason?

$
0
0

Dear all,

I am testing the MUMPS Fortran 90 interface for some simple block right hand side solution however I get a buch of warnings and remarks. These are all related to spblas, lapack, rci interfaces.

What could be the reason of these warnings and remarks? Attached file is an example of the output of the make process. You can examine the compile flags and direct me as necessary.

Best regards,

Umut

AttachmentSize
Downloadmkl_simple.log12.05 KB

How to do batch FFT

$
0
0

Hi,

i would like to know is there any way i can do batch FFT. CUFFT provides a way to do batch FFT. I want to know is there any function exists in MKL which serves my purpose.

 

Thanks

sivaramakrishna.

Cannot find BLAS on a machine with MKL when installing scipy via pip

$
0
0

I installed Intel MKL and other libraries for a customized numpy. Here is my `~/.numpy-site.cfg`:

    [DEFAULT]
    library_dirs = /usr/lib:/usr/local/lib
    include_dirs = /usr/include:/usr/local/include
    
    [mkl]
    library_dirs = /opt/intel/mkl/lib/intel64/
    include_dirs = /opt/intel/mkl/include/
    mkl_libs = mkl_intel_ilp64, mkl_intel_thread, mkl_core, mkl_rt
    lapack_libs =
    
    [amd]
    amd_libs = amd
    
    [umfpack]
    umfpack_libs = umfpack
    
    [djbfft]
    include_dirs = /usr/local/djbfft/include
    library_dirs = /usr/local/djbfft/lib

This configuration file seems OK during the installation of numpy. But when I was installing scipy via `pip3 install scipy`, it reported that

    numpy.distutils.system_info.BlasNotFoundError:
    
        Blas (http://www.netlib.org/blas/) libraries not found.
    
        Directories to search for the libraries can be specified in the
    
        numpy/distutils/site.cfg file (section [blas]) or by setting
    
        the BLAS environment variable.



In my mind MKL is an implementation of Blas so just mentioning MKL should be fine. I've tried

 1. `export LD_LIBRARY_PATH=/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH‌​`

 2. `export BLAS=/opt/intel/mkl/lib/intel64`

 3. Copy the content in the `[mkl]` section and paste into the `[blas]` section in the file `~/.numpy-site.cfg`

But none of these works. So what is going wrong? Does scipy respect `~/.numpy-site.cfg`? Thank you.

Internal consistency check failure when using DSS

$
0
0

Hi there,

I am working on a project to use DSS to solve large sparse linear equations. However I met with internal consistency check failure. I could see no problem with my codes. Could anyone help me to figure the problem out? Thank you so much. The attached is my project file.

[[{"fid":"316170","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"class":"file media-element file-default"}}]]

Inner boundary condition setting for solving the Poisson equation

$
0
0

Dears, I need your help with the inner boundary condition setting.

For simplicity, we take a 2-D Poisson equation for example, which correspond to the “s_Poisson_2D_f.f90” in the MKL library.

Case A

In case A, it is clear that we can assign the following array to set the boundary condition.

 For example in s_Poisson_2D_f.f90

bd_ax(iy) = 1.0E0,

bd_bx(iy) = 1.0E0 ,

bd_ay(ix) = -2.0*pi*sin(2*pi*(ix-1)/nx),

bd_by(ix) =  2.0*pi*sin(2*pi*(ix-1)/nx)

Case B

In case B , in the analysis volume, there are some other charge sources in the analysis volume. The Potential of the black box is set to 0. And the line charge denity of the long wire is given.

So how to solve the problem of case B with  “s_Poisson_2D_f.f90” in the MKL library?

 

AttachmentSize
Download1.jpg9.22 KB
Download2_0.jpg14.17 KB

Out of memory error with Cpzgemr2d

$
0
0

Hello everybody, I am trying to distribute over a BLACS grid a square complex (double precision) matrix making use of the Cpzgemr2d function. This global matrix resides on a single "master" node and I need to distribute it over my grid as preliminary step of a linear system solution calculation. Everything works fine if I try to run my code with matrix of size of about 2GB or smaller, making use of various BLACS grids (2x2,3x3,4x4, etc...) with row/column block size equal to 64. However, increasing the matrix size for example to 4 GB, for the same grid and data block configuration, I receive an "xxmr2d:out of memory" error message after a call to Cpzgemr2d, which is not able to complete the matrix distribution procedure. Memory availability on my test machine is quite large (more than 40 GB per MPI task) and I am only allocating the mermory required for global and local matrix data blocks, so I guess that memory itself should not be the cause of my problem.

Is there any size limit for the above data distribution procedure? My test machine is a Linux cluster and I am working with MKL Scalapack 64 bit library.

Many thanks in advance for your help!


dss_solver gives non-stable solve

$
0
0

Hi there,

I am using the mkl_dss.f77 to solve large sparse non-linear equations by iteratively calling the dss_solver. I just spotted that although I set the same initial values, I could not obtain a same result for every time. And I found this is due to the dss_solver gives results with slightly difference every time and the difference adds up in the iteration process which leads to a non-stable solve for the large sparse non-linear equations.

How can I make the mkl_dss solver to have repetitive accuracy? Anyone could give me some advice on this subject? 

IMPI dapl fabric error

$
0
0

Hi, I'm trying to run HPL benchmark on an Ivybridge Xeon processor with 2 Xeon Phi 7120P MIC cards. I'm using offload xhpl binary from Intel Linpack.

It throws following error

$ bash runme_offload_intel64

This is a SAMPLE run script.  Change it to reflect the correct number

of CPUs/threads, number of nodes, MPI processes per node, etc..



MPI_RANK_FOR_NODE=1 NODE=1, CORE=, MIC=1, SHARE=

MPI_RANK_FOR_NODE=0 NODE=0, CORE=, MIC=0, SHARE=

[1] MPI startup(): dapl fabric is not available and fallback fabric is not enabled

[0] MPI startup(): dapl fabric is not available and fallback fabric is not enabled

I checked the same errors on this forum and got to know that to unset I_MPI_DEVICES variable. This made the HPL to run. But performance is very low, just 50%. On the other node, with same hardware, HPL efficiency is 84%. Following is the short output of openibd status from both systems, which shows the difference.

ON NODE with HPL 84%                                                 ON NODE with HPL 50%

Currently active MLX4_EN devices:                               Currently active MLX4_EN devices:

                                                                                        | eth0

Can some one guide me how to resolve it?

 

PARDISO optimization?

$
0
0

Here goes,

Relevant information (I hope):

  • Using Pardiso to solve linear set of equations Ax = b
  • matrix A can be extremely large & sparse, nonsymmetric, and is also put into csr format using mkl routine

Below is the section of code where I call Pardiso, it works great everything is fine. I was really hoping someone could look at my set up and let me know if there is something I can do to make it even faster, or if it is as fast as it's going to get.

I also use mkl_set_num_threads(n) above the code to make use of multiple cores according to the desired n.

CODE

//    Call pardiso solver

        MKL_INT pt[64], iparm[64];

        for(i = 0; i < 64; i++) {

            pt[i] = 0;

            iparm[i] = 0;

        }

        MKL_INT *perm;

        perm = (MKL_INT*)mkl_malloc(m*sizeof(MKL_INT),16);

        if (perm == NULL) 

        {

            cout << ">>> error allocating perm"<< endl;

            return (0);

        }       

MKL_INT maxfct, mnum, mtype, phase, nrhs, msglvl, error;

        maxfct = 1;

        mnum = 1;

        mtype = 11;

        nrhs = 1;

        msglvl = 1;

        iparm[0] = 1;

        iparm[1] = 3;

        iparm[26] = 1;

        iparm[34] = 1;

        iparm[60] = 0;

        error = 0;

        

        //    Pardiso Direct Solver

        phase = 13;

        pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, acsr, ia, ja, perm, &nrhs, iparm, &msglvl, b, solution, &error);

        if (error != 0) {

            cout << "ERROR during symbolic factorization: "<< error << endl;

        }

 

Again all I am really seeking is whether or not my call could be any better, or if this is optimal. Let me know if there is any additional information needed, and I will be checking my email frequently today to respond quickly.

Sincerely, Jared

FFT not working for N=2048

$
0
0

Hello,

I have a problem with mkl FFT. It stops working without a hint for N=2048.

Here is my implementation:

	MKL_LONG status;
	int nThreads = omp_get_num_threads();
	int th=0;

	DFTI_DESCRIPTOR_HANDLE handle = 0;
	status=DftiCreateDescriptor(&handle, DFTI_SINGLE, DFTI_COMPLEX,1,(MKL_LONG) nChannels);
	status=DftiSetValue(handle, DFTI_PLACEMENT, DFTI_INPLACE);
	status=DftiSetValue (handle, DFTI_NUMBER_OF_USER_THREADS, nThreads);
	status=DftiCommitDescriptor(handle);	
	#pragma omp parallel for shared(nChannels,nThreads, input) private(th)
	for (th = 0; th < nSpectra; th++) {
		DftiComputeForward(handle, &input[th*nChannels]);
	}
	DftiFreeDescriptor(&handle);

where N=nChannels. It works fine for N=1024 as well as for N=1536. All statuses are zero until  DftiComputeForward which stops and will not end.

Thanks.

 

Explicit SVD vs SVD with eigen solver

$
0
0

Hi, 

could you clarify two following questions.

  1. Consider following mathematical problem. Given general full rank square non symmetric matrix A of 13 000 x 13 000 size I want to find its SVD with all singular values and all right/left eigen vectors. But when I solve it with some driver SVD routine (e.g. LAPACKE_sgesdd) it takes about 2 x slower then I solve two eigen decomposition problems at a time: for A*A' (getting left eigen vectors of A) and A'*A (getting right eigen vectors of A) matrices.

    Is it normal behavior or I miss something and there is more proper/fast way to find SVD for given matrix type explicitly (via SVD driver routine)?
  2. Considering the same problem, is there any way in MKL to find small subset (let it be 1 left and right eigen vectors for example) of all SVD right/left eigen vectors with the biggest singular values, saving "considerable" amount of time (at least 30%) ?

Thank you in advance.

Access to previous versions of Intel MKL

$
0
0

I want to use DLL and Libs of Intel MKL version 10.3.7.1 in my software program. If I purchase a license copy of latest version of Intel MKL for example 11.0, then do I get access to previous version i.e. 10.3.7.1.

Please let me know if any body knows about this. Thank you in advance.

Problem using DSS

$
0
0

Hi!

I'm taking my first steps at MKL and after installation and linking libraries (I have Ubuntu 12.04, 64-bit) I tried to modify the direct solver dss_sym_c.c so that I could solve my sparse symmetric system which is stored in a binary file (1 based index). I changed CSR system but after that an unclassifiable error uccured at the "reorder step" of the program. What can be the reason of it?

Here's the code:

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include "mkl_dss.h"
#include "mkl_types.h"

MKL_INT
main ()
{
  MKL_INT i;
  _MKL_DSS_HANDLE_t handle;
  _INTEGER_t error;
  _CHARACTER_t statIn[] = "determinant";
  _DOUBLE_PRECISION_t statOut[5];
  MKL_INT opt = MKL_DSS_DEFAULTS;
  MKL_INT sym = MKL_DSS_SYMMETRIC;
  MKL_INT type = MKL_DSS_POSITIVE_DEFINITE;
  MKL_INT nRows;
  MKL_INT nCols;
  MKL_INT nNonZeros;
  MKL_INT nRhs;
  _INTEGER_t *rowIndex;
  _INTEGER_t *columns;
  _DOUBLE_PRECISION_t *values;
  _DOUBLE_PRECISION_t *rhs;
  _DOUBLE_PRECISION_t *solValues;

  FILE *In;
  MKL_INT n, nz;
  MKL_INT *I = NULL, *J = NULL;
  _DOUBLE_PRECISION_t *val = NULL, *sol = NULL, *RHS = NULL;

  if ( ( In = fopen ("matr_rhs_binary", "rb" ) ) == NULL ) {
    fprintf (stderr, "Error opening file\n");		
    return (-1);
  }

  fread ( &n, sizeof(MKL_INT), 1, In );
  printf ("n = %d\n", n);	

  fread ( &nz, sizeof(MKL_INT), 1, In );
  printf ("nz = %d\n", nz);

  I = (MKL_INT *) malloc ( sizeof (I[0]) * (n + 1));                             // csr row pointers for matrix A
  J = (MKL_INT *) malloc ( sizeof (J[0]) * nz);                                  // csr column indices for matrix A
  val = (_DOUBLE_PRECISION_t *) malloc ( sizeof (val[0]) * nz);                  // csr values for matrix A	
  sol = (_DOUBLE_PRECISION_t *) malloc ( sizeof (sol[0]) * n);
  RHS = (_DOUBLE_PRECISION_t *) malloc ( sizeof (RHS[0]) * n);

  for (i = 0; i < n + 1; i++)
    fread ( &(I[i]), sizeof(I[0]), 1, In );
  for (i = 0; i < nz; i++)
    fread ( &(val[i]), sizeof(val[0]), 1, In );
  for (i = 0; i < nz; i++)
    fread ( &(J[i]), sizeof(J[0]), 1, In );
  for (i = 0; i < n; i++)
    fread ( &(RHS[i]), sizeof(RHS[0]), 1, In);	
	
  fclose (In);

  nRows = n;
  nCols = n;
  nNonZeros = nz;
  nRhs = n;

  rowIndex = (_INTEGER_t *) malloc ( sizeof (rowIndex[0]) * (nRows + 1) );
  columns = (_INTEGER_t *) malloc ( sizeof (columns[0]) * nNonZeros );
  values = (_DOUBLE_PRECISION_t *) malloc ( sizeof (values[0]) * nNonZeros );
  rhs = (_DOUBLE_PRECISION_t *) malloc ( sizeof (rhs[0]) * nCols );
  solValues = (_DOUBLE_PRECISION_t *) malloc ( sizeof (solValues[0]) * nCols );

  for (i = 0; i < n; i++) {
    rowIndex[i] = I[i];
    rhs[i] = RHS[i];
  }
  for (i = 0; i < nz; i++) {
    columns[i] = J[i];
    values[i] = val[i];
  }

  error = dss_create (handle, opt);
  if (error != MKL_DSS_SUCCESS)
    goto printError;

  error = dss_define_structure (handle, sym, rowIndex, nRows, nCols,
				columns, nNonZeros);
  if (error != MKL_DSS_SUCCESS)
    goto printError;

  error = dss_reorder (handle, opt, 0);
  if (error != MKL_DSS_SUCCESS)
    goto printError;

  error = dss_factor_real (handle, type, values);
  if (error != MKL_DSS_SUCCESS)
    goto printError;

  error = dss_solve_real (handle, opt, rhs, nRhs, solValues);
  if (error != MKL_DSS_SUCCESS)
    goto printError;
printf ("check\n");

  if (nRows < nNonZeros)
    {
      error = dss_statistics (handle, opt, statIn, statOut);
      if (error != MKL_DSS_SUCCESS)
	goto printError;

      printf (" determinant power is %g \n", statOut[0]);
      printf (" determinant base is %g \n", statOut[1]);
      printf (" Determinant is %g \n", (pow (10.0, statOut[0])) * statOut[1]);
    }

  error = dss_delete (handle, opt);
  if (error != MKL_DSS_SUCCESS)
    goto printError;

  printf (" Solution array: ");
  for (i = 0; i < nCols; i++)
    printf (" %g", solValues[i]);
  printf ("\n");
  exit (0);
printError:
  printf ("Solver returned error code %d\n", error);
  exit (1);
}

And another question: When I compile pardiso_sym_c.c, several errors occur: undefined references to functions like atan2, sin, cos, log10, etc though I compile with an -lm option. Is it because of:

/usr/bin/ld: skipping incompatible /opt/intel/composer_xe_2013_sp1.0.080/mkl/lib/ia32/libmkl_intel_thread.so when searching for -lmkl_intel_thread
/usr/bin/ld: skipping incompatible /opt/intel/composer_xe_2013_sp1.0.080/mkl/lib/ia32/libmkl_intel_thread.a when searching for -lmkl_intel_thread
/usr/bin/ld: skipping incompatible /opt/intel/composer_xe_2013_sp1.0.080/mkl/lib/ia32/libmkl_core.so when searching for -lmkl_core
/usr/bin/ld: skipping incompatible /opt/intel/composer_xe_2013_sp1.0.080/mkl/lib/ia32/libmkl_core.a when searching for -lmkl_core

How can I fix it?

Thanks! Any help is appreciated!

 


Internal consistency check failure when using DSS

$
0
0

Hi there,

I am working on a project to use DSS to solve large sparse linear equations. However I met with internal consistency check failure. I could see no problem with my codes. Could anyone help me to figure the problem out? Thank you so much. The attached is my project file.

[[{"fid":"316170","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"class":"file media-element file-default"}}]]

Inner boundary condition setting for solving the Poisson equation

$
0
0

Dears, I need your help with the inner boundary condition setting.

For simplicity, we take a 2-D Poisson equation for example, which correspond to the “s_Poisson_2D_f.f90” in the MKL library.

Case A

In case A, it is clear that we can assign the following array to set the boundary condition.

 For example in s_Poisson_2D_f.f90

bd_ax(iy) = 1.0E0,

bd_bx(iy) = 1.0E0 ,

bd_ay(ix) = -2.0*pi*sin(2*pi*(ix-1)/nx),

bd_by(ix) =  2.0*pi*sin(2*pi*(ix-1)/nx)

Case B

In case B , in the analysis volume, there are some other charge sources in the analysis volume. The Potential of the black box is set to 0. And the line charge denity of the long wire is given.

So how to solve the problem of case B with  “s_Poisson_2D_f.f90” in the MKL library?

 

AttachmentSize
Download1.jpg9.22 KB
Download2_0.jpg14.17 KB

Out of memory error with Cpzgemr2d

$
0
0

Hello everybody, I am trying to distribute over a BLACS grid a square complex (double precision) matrix making use of the Cpzgemr2d function. This global matrix resides on a single "master" node and I need to distribute it over my grid as preliminary step of a linear system solution calculation. Everything works fine if I try to run my code with matrix of size of about 2GB or smaller, making use of various BLACS grids (2x2,3x3,4x4, etc...) with row/column block size equal to 64. However, increasing the matrix size for example to 4 GB, for the same grid and data block configuration, I receive an "xxmr2d:out of memory" error message after a call to Cpzgemr2d, which is not able to complete the matrix distribution procedure. Memory availability on my test machine is quite large (more than 40 GB per MPI task) and I am only allocating the mermory required for global and local matrix data blocks, so I guess that memory itself should not be the cause of my problem.

Is there any size limit for the above data distribution procedure? My test machine is a Linux cluster and I am working with MKL Scalapack 64 bit library.

Many thanks in advance for your help!

dss_solver gives non-stable solve

$
0
0

Hi there,

I am using the mkl_dss.f77 to solve large sparse non-linear equations by iteratively calling the dss_solver. I just spotted that although I set the same initial values, I could not obtain a same result for every time. And I found this is due to the dss_solver gives results with slightly difference every time and the difference adds up in the iteration process which leads to a non-stable solve for the large sparse non-linear equations.

How can I make the mkl_dss solver to have repetitive accuracy? Anyone could give me some advice on this subject? 

PARDISO optimization?

$
0
0

Here goes,

Relevant information (I hope):

  • Using Pardiso to solve linear set of equations Ax = b
  • matrix A can be extremely large & sparse, nonsymmetric, and is also put into csr format using mkl routine

Below is the section of code where I call Pardiso, it works great everything is fine. I was really hoping someone could look at my set up and let me know if there is something I can do to make it even faster, or if it is as fast as it's going to get.

I also use mkl_set_num_threads(n) above the code to make use of multiple cores according to the desired n.

CODE

//    Call pardiso solver

        MKL_INT pt[64], iparm[64];

        for(i = 0; i < 64; i++) {

            pt[i] = 0;

            iparm[i] = 0;

        }

        MKL_INT *perm;

        perm = (MKL_INT*)mkl_malloc(m*sizeof(MKL_INT),16);

        if (perm == NULL) 

        {

            cout << ">>> error allocating perm"<< endl;

            return (0);

        }       

MKL_INT maxfct, mnum, mtype, phase, nrhs, msglvl, error;

        maxfct = 1;

        mnum = 1;

        mtype = 11;

        nrhs = 1;

        msglvl = 1;

        iparm[0] = 1;

        iparm[1] = 3;

        iparm[26] = 1;

        iparm[34] = 1;

        iparm[60] = 0;

        error = 0;

        

        //    Pardiso Direct Solver

        phase = 13;

        pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, acsr, ia, ja, perm, &nrhs, iparm, &msglvl, b, solution, &error);

        if (error != 0) {

            cout << "ERROR during symbolic factorization: "<< error << endl;

        }

 

Again all I am really seeking is whether or not my call could be any better, or if this is optimal. Let me know if there is any additional information needed, and I will be checking my email frequently today to respond quickly.

Sincerely, Jared

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>