Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

How to delete mkl out-of-core temporary files

$
0
0

Hi,

I'm using pardiso_64 in MKL 11.3.3.1.

When run in out-of-core mode, temporary files are generated. I have set the environment variable MKL_PARDISO_OOC_KEEP_FILE = 1 according to https://software.intel.com/en-us/mkl-developer-reference-c-intel-mkl-par... in order to delete temporary files when computations are complete. (I have also tried = 0 as "keep file  = 1" seemed counter intuitive, but that's besides the point).

I assumed that files would be deleted when I called pardiso_64 with phase=-1 in order to release internal memory. But the files are not deleted.

How can I get pardiso to delete temporary out-of-core files?

Best,
Jens

Thread Topic: 

Help Me

problem when using cluster solver with distributed CSR format of input data

$
0
0

The following error encountered when using cluster solver:

ERROR during symbolic factorization: -2

The example (cl_solver_sym_distr_f.f) provided with Cluster compiler works well. But the matrix I provided in the attachment produces the above error. 

I used Intel MPI to compile the code (also in the attachment). The executable file compiled through IntelMPI yields the above error. But it can run using OpenMPI's mpirun.

Could you help me check my code? Both the source code and data are in the attachment.

Thank you very much!

Qian

 

Runtime error when building with mkl dylib

$
0
0

When I built my program with MKL statically (.a files), everything was fine. But, when I built the code with those .dylib files and ran it, I got following errors:

dyld: Symbol not found: _MKL_Detect_Cpu_Global_Lock
  Referenced from: /opt/intel/compilers_and_libraries_2017.4.181/mac/mkl/lib/libmkl_intel_lp64.dylib
  Expected in: flat namespace
 in /opt/intel/compilers_and_libraries_2017.4.181/mac/mkl/lib/libmkl_intel_lp64.dylib
Abort trap: 6

What does this usually imply? I tried to add the directory containing .dylib files into $PATH, but it didn't help.

 

Thread Topic: 

Question

config_number_of_transforms.c example typo

Trust Region Size Parameter Choise

$
0
0

Along this period, we have developed a calculation method that uses the Trust Region MKL API (with constraints).
We have build a procedure that works with this TR algorithm.
The procedure is called 4 times. Each time it is called it uses the previous results as input and the trust region size parameter change consequently: 100, 10, 1, 0.1 

We know that there is not a criteria of choise "region size parameter" so we decided to adopt this rule.

is this choise correct, or it is enough to use alway only one value?

maybe the order of these values can change something!

 

Thank you very much

Gianluca

KMP_AFFINITY

$
0
0

Hi,

Does anyone know I can internally change KMP_AFFINITY in the sub-process invoked from my program?  My experiment shows it does not work with intel compiler but however it is under gcc compiler.

here is the example:

let's say I have KMP_AFFINITY=scatter, which is for my main process. Then inside main process before invoking another executable as the sub-process, putenv is used to modify KMP_AFFINITY=none for the sub-process.

is this supposed to work? my run shows the KMP_AFFINITY=none does not apply to the sub-process if intel compiler is used to compile and link my main program. but it is with gcc compiler.

when I double check the environment, in the sub-process, there is one extra environment variable for my exe with intel compiler

__KMP_REGISTERED_LIB_23907=0xacfa1d0-cafe8af0-libiomp5.a

 

what does this guy do and how to explain such difference? Thank you

Hongwei

 

MKL library

$
0
0

Sir

I have to install a code. it requiyes linking of lapack n blas file.
the code was written in 2009 using mkl 8 version. according to it for linking paths are

LROOT = /opt/intel/mkl/lib/intel64/
LAPACK = -lmkl_lapack -lmkl
BLAS = -L$(LROOT) -lmkl_intel64 -lguide -lpthread

LFLAGS = $(LIBSCE) $(BLAS) $(LAPACK)

now i am having 2016 version of mkl. it does not have guide, mkl, pthread etc.
i know
-lmkl_lapack is replaced by lmkl_lapack95_ilp64

how to modify the commands as per 2016 version to link n compile

thanks

ab

Thread Topic: 

How-To

is the implemented Trust Region method deterministic?

$
0
0

In this case results cannot change if initial condition and constraints are the same ... 

this is what happen in our implementation. 

Thank you

Gianluca


Direct Sparse Solver for Clusters Crash when using MPI Nested Dissection Algorithm

$
0
0

I have a code that calls the the Direct Sparse Solver for Clusters Interface.I have an error when I run it using the option for the MPI based nested dissection. Documentation can be found here: https://software.intel.com/en-us/mkl-developer-reference-c-cluster-sparse-solver-iparm-parameter

When i have I param[39] set to 3 everything works fine.

When I set it to 10, I get no errors, no warnings, no output when msglvl is set to one. I assume this is because the system is crashing really hard.

I am using the 64-bit interface of the solver and not using the MPI-based dissection is not an option (my matrix has 50 billion non-zero elements and is 12 billion by 12 billion). I am using the Latest version of the MKL cluster library.

I just spent two weeks modifying the code to remove overlaps in the matrix elements to use this feature.

What Is going wrong?

 

 

Thread Topic: 

Bug Report

Small matrix speed optimization

$
0
0

Hello all,

since I can run now the mkl library 2017, I have a couple of follow up questions that hopefully deserve a thread of their own. I am doing some mode matching, and consequently I need matrix inversions on matrices of the order 10x10 up to 50x50 most of the time (the maximum size would be somewhere aroud 200x200 but very rarely, they will almost exclusively be in the 10-50 range). I have optimized my non mkl parts of the code so they are under 10% of all the simulation time, so any speedup on the mkl functions would be greatly beneficial, if possible. Now I have set the mkl_num_threads to max, release mode, ia32, O2 optimization, optimized for speed and so forth to make it as fast as I can/know to make currently. I only have a couple of matrices to invert per frequency point (5 to 6), and the code must execute one frequency point at a time. My questions are as follows:

1) Is there a way to improve  the performance of the mkl functions in any way (by setting some flags in the program itself, or in Visual studio, or am I missing some functions that are better in this situation, or something else completely), either in the mkl 2017 or the old mkl 10.0.012 that I have? I am using cblas_zgemm, cblas_zdscal, zgetri, zgetrf, vzSqrt, cblas_zaxpy, but most of the time is spent in matrix inversion, so zgetri, zgetrf take most of the time. Are there better functions than these or can I set some additional flags to make them faster?

2) Since the optimization is for speed and not size, I of course expected the output (exe) to be bigger, but can someone explain and/or help me optimize the output size for the exe in mkl 2017, because it is 3x bigger than with the old mkl 10.0.012? This represents a small problem unfortunatelly, and if possible I would be very happy if it can be mittigated in any way (without optimize for size)

3) Some of the matrices are symmetrical, and I was hopping that the symmetrical versions of zgetri, zgetrf, that is zsytrf and zsytri, I would be able to theoretically get a 2x speedup, but for some reason the speed is the same. Is this expected? Are my matrices too small for any noticeable effect? Am I missing something? In both cases I feed the functions the full matrix to invert, and while I am debugging I can see that only half of the matrix elements are calculated for the symmetrical functions, and I fill the symmetric elements, but there is no speed up.

Any information, even if not good is welcome. Thank you all in advance.

Thread Topic: 

How-To

Benchmarking MKL GEMV

$
0
0

Hello,

I am trying to compare my own implementation of GEMV with the MKL. For benchmarking I use the following code:

size_t M = 64; // rows
size_t N = 2; // columns

// allocate memory
float *matrix = (float*) mkl_malloc(M*N * sizeof(float), 64);
float *vector = (float*) mkl_malloc(N   * sizeof(float), 64);
float *result = (float*) mkl_malloc(M   * sizeof(float), 64);

// execute warm up calls
for (size_t i = 0; i < NUM_WARMUPS; ++i) {
    cblas_sgemv(CblasRowMajor, CblasNoTrans, M, N, 1.0f,
                matrix, N,
                vector, 1,
                0.0f,
                result, 1);
}

// measure runtime
float avg_runtime = 0;
for (size_t i = 0; i < NUM_EVALUATIONS; ++i) {
    auto start = dsecnd();
    cblas_sgemv(CblasRowMajor, CblasNoTrans, M, N, 1.0f,
                matrix, N,
                vector, 1,
                0.0f,
                result, 1);
    auto end = dsecnd();
    float runtime = (end - start) * 1000;

    avg_runtime += runtime;
}
avg_runtime /= NUM_EVALUATIONS;
std::cout << "avg_runtime: "<< avg_runtime << std::endl;

// free buffers
mkl_free(matrix);
mkl_free(vector);
mkl_free(result);

On my system this gives me an average runtime of around 0.0003ms with the first evaluation taking around 0.002ms. Because the average seemed really fast, even for the small input size, I printed the runtimes of all 200 evaluations to make sure my calculation of the average value was correct. If I add a 

std::cout << runtime << std::endl;

in line 29 the measured runtimes are way higher and every one of the 200 evaluations takes around 0.002ms. This seems more plausible compared to other libraries and my own implementation.

It seems like the compiler does some optimization to my code and notices that I call the routine with the exact same input multiple times. Can anyone confirm this? What is the suggested way of benchmarking MKL routines?

Thanks in advance!

Thread Topic: 

Question

TR solver question

$
0
0

When the solver returns RCI_Request = 2 to calculate the Jacobian, can I assume that x has not changed since the previous calculation of the function value?

It would be a great performance boost if I could because my calculations of function value and jacobian are not seperable and I could just use my stored jacobian.

Struggling to get Automatic Off load working with MIC/MKL 2017

$
0
0

I have a MIC card in a Microway XEON Workstation which seems to be functioning as expected (see micinfo debug output)

 

After updating to MKL 2017 Update 3, I am struggling to set AO to function. I created a simple DGEMM test program and have been calling DGEMM with  square matrix sizes up to 16384, and cannot get AO to "kick-in".

In prior versions of MKL, I could see AO working  at sizes of about 4096x4096 on this same machine.

The following env vars are set.

MKL_MIC_ENABLE=1
OFFLOAD_REPORT=2
MKL_MIC_DISABLE_HOST_FALLBACK=1
MIC_LD_LIBRARY_PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017\windows\mkl\lib\intel64_win_mic

 

>micinfo
MicInfo Utility Log
Copyright 2011-2013 Intel Corporation All Rights Reserved.

Created Wed Jun 14 11:34:46 2017

        System Info
                HOST OS                 : Windows
                OS Version              : Microsoft Windows 7 Professi
                Driver Version          : 3.3.30726.0
                MPSS Version            : 3.3.30726.0
                Host Physical Memory    : 32709 MB

Device No: 0, Device Name: mic0

        Version
                Flash Version            : 2.1.02.0390
                SMC Firmware Version     : 1.16.5078
                SMC Boot Loader Version  : 1.8.4326
                uOS Version              : 2.6.38.8+mpss3.3
                Device Serial Number     : ADKC32800563

        Board
                Vendor ID                : 0x8086
                Device ID                : 0x225d
                Subsystem ID             : 0x3608
                Coprocessor Stepping ID  : 2
                PCIe Width               : x16
                PCIe Speed               : 5 GT/s
                PCIe Max payload size    : 256 bytes
                PCIe Max read req size   : 512 bytes
                Coprocessor Model        : 0x01
                Coprocessor Model Ext    : 0x00
                Coprocessor Type         : 0x00
                Coprocessor Family       : 0x0b
                Coprocessor Family Ext   : 0x00
                Coprocessor Stepping     : C0
                Board SKU                : C0PRQ-3120/3140 P/A
                ECC Mode                 : Enabled
                SMC HW Revision          : Product 300W Active CS

        Cores
                Total No of Active Cores : 57
                Voltage                  : 1039000 uV
                Frequency                : 1100000 kHz
              

 

how to deal with real and structurally symmetric matrix's parameters in PARDISO

$
0
0

Hi

I want to use  PARDISO to solve a problem, in which the matrix is real and structurally symmetric. I read the PARDISO Version 5.0.0  Reference Sheet — Fortran, and Parallel Sparse Direct Solver PARDISO | User Guide Version 5.0.0. I also lean some code which solve problem with symmetric or non-symmetric matrices. I know the parameter mtype need to be 1, since my matrix is real and structurally symmetric. But I do not know how to deal with other parameters.

Any help would be appreciated.

Regards, 
rf.qian

Thread Topic: 

How-To

Intel MKL DftiComputeForward how to get full transform matrix from CCE format in C

$
0
0

I'm trying to implement a 2 dimensional fourier transform via use of MKL FFT functions.

I'm interested in transforming from the space domain (i.e., my input signal is a 2D MxN matrix of `double`s) to the frequency domain (i.e., a 2D MxN output matrix of complexes with double accuracy, `MKL_Complex16`) and then back to the space domain after some filtering.

Based on the examples provided by intel's MKL implementation (i.e., basic_dp_real_dft_2d.c etc.) I've created the following matlab-ish function:

    bool fft2(double *in, int m, int n, MKL_Complex16 *out) {
      bool ret(false);
      DFTI_DESCRIPTOR_HANDLE hand(NULL);
      MKL_LONG dim[2] = {m, n};
      if(!DftiCreateDescriptor(&hand, DFTI_DOUBLE, DFTI_REAL, 2, dim)) {
        if(!DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE)) {
          if(!DftiSetValue(hand, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX)) {
            MKL_LONG rs[3] = {0, n, 1};
            if(!DftiSetValue(hand, DFTI_INPUT_STRIDES, rs)) {
              MKL_LONG cs[3] = {0, n / 2 + 1, 1};
              if(!DftiSetValue(hand, DFTI_OUTPUT_STRIDES, cs)) {
                if(!DftiCommitDescriptor(hand)) {
                  ret = !DftiComputeForward(hand, in, out));
                }
              }
            }
          }
        }
      }
      DftiFreeDescriptor(&hand);
      return ret;
    }

Due to the fact that I want to do some DSP stuff (e.g., Gaussian filtering) and thus I have to do matrix multiplications. I want the full transformation matrix instead of the CCE format in C matrix that DftiComputeForward outputs.

**How can I reconstruct the full transformation matrix of an arbitrary sized 2d signal (i.e., matrix) from the CCE format in C matrix that I get as output from DftiComputeForward function?**

For example if I have the following 2D real signal:

    0.1, 0.2, 0.3
    0.4, 0.5, 0.6
    0.7, 0.8, 0.9

It's full transformation matrix would be:

     4.5 + 0j,         -0.45 + 0.259808j, -0.45 - 0.259808j
    -1.35 + 0.779423j,  0    - 0j,         0    - 0j
    -1.35 - 0.779423j,  0    + 0j,         0    + 0j

However the result from `DftiComputeForward` in CCE is:

     4.5 + 0j,  -0.45 + 0.259808j, -1.35 + 0.779423j,
     0   - 0j,  -1.35 - 0.779423j,  0    + 0j,
     0   + 0j,   0 + 0j,            0    + 0j


syrk mkl armadillo wrong output

$
0
0

I have written the following simple program for syrk using armadillo (arma.sourceforge.net).

Environment : Rhea from OLCF. https://www.olcf.ornl.gov/computing-resources/rhea/

MKL : Tried with version 16 and 17. Problem occurs in both.

#define ARMA_DONT_USE_WRAPPER
#define ARMA_USE_BLAS

#include <iostream>
#include <armadillo>
using namespace std;
using namespace arma;
int main() {
  int m = 10000;
  int n = 50;
  fmat A;
  A.load("H_init.csv");
  cout << "A::"<< A.n_rows << "x"<< A.n_cols << endl;
  fmat AtA = arma::zeros<fmat>(n, n);
  AtA = A.t() * A;
  cout << "AtA "<< endl;
  cout << max(max(AtA)) << ""<< min(min(AtA)) << ""<< norm(AtA, "fro") << endl;
  return 0;
}

I have also attached the H_init.csv along with this email. I compile using the following three procedure.

Compilation 1 (gcc with mkl): Based on the article https://software.intel.com/en-us/articles/a-new-linking-model-single-dyn.... I compile as "g++ hth.cpp -o hth -O2 -I ~/armadillo-7.800.1/include/ -fopenmp -lmkl_rt"

Compilation 2 (with icc and mkl): icc hth.cpp -o hth -O2 -I ~/armadillo-7.800.1/include/ -fopenmp -mkl

Compilation 3 (gcc with intel linker recommendation): g++ hth.cpp -o hth -O2 -I ~/armadillo-7.800.1/include/ -fopenmp -lmkl_rt  -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl

Compilation based on method compilation 1 is producing wrong output on HtH. Compilation 2 and 3 works fine. Infact in the case of compilation 3, even the ordering of the libraries appears important. If the ordering is change a bit, it is producing wrong output. 

Output of compile 1: (Wrong)

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Menlo}
span.s1 {font-variant-ligatures: no-common-ligatures}

A::10000x50

AtA 

24044.6 3697.91 350951

 

Output out of compile 2: (Right)
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Menlo}
span.s1 {font-variant-ligatures: no-common-ligatures}

A::10000x50

AtA 

3436.05 2437.46 126222

 

Output out of compile 3: (Right)

 

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Menlo}
span.s1 {font-variant-ligatures: no-common-ligatures}

A::10000x50

AtA 

3436.05 2437.46 126222

 

Am I making any mistake on this? Can't I link w/ gcc using -lmkl_rt?

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Menlo; color: #5330e1}
span.s1 {font-variant-ligatures: no-common-ligatures}

AttachmentSize
Downloadapplication/x-gzipH_init.tar.gz4.22 MB

Thread Topic: 

Bug Report

.NET Memory Usage - MKL under .NET

$
0
0

As every .NET developer knows memory usage is managed from Garbage Collector. This layer determines when memory is released and how to reorganize it. It allocates spaces for each thread separately and avoid conflicts.

For this, we programmers often don’t know exactly what really happen at this level, the details.

In general, this is enough, because GC has been built in order to permit developer to concentrate at higher levels.

But sometimes, especially when you pass pointers to memory block, like array to API functions or specifically to MKL API functions, it is important to know what happen under the scene.

IntPtr x = new IntPtr(0);
double[] x_init = null;
x = mkl_malloc(sizeof(double) * n, 64);
Marshal.Copy(x_init, 0, x, n);
//use x pointer …
mkl_free(ref x);

This set of instructions define a memory space for an array of double and assign a memory pointer to x. This pointer x is then passed to a function like this:

[DllImport("mkl_rt.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)]
        internal static extern int dtrnlspbc_init(
           ref IntPtr handle,
           ref int n,
           ref int m,
           IntPtr x,
           IntPtr LW,
           IntPtr UP,
           double[] eps,
           ref int iter1,
           ref int iter2,
           ref double rs
        );

Everything seems to work, but this is a very subtle felling! 

Yes, because memory is in the heap area managed by GC and can be moved, reorganized. This means that your pointer x is not reliable.

We spend a lot of time to fight against strange results, sometimes good, sometimes not. Where was the trick? Was Intel fault of our fault? I don’t like this work “fault”, but the question was where the issue?

At the end, we got the solution!

This was: Pin the pointer with the correct syntax and methods.

GCHandle x_handle = GCHandle.Alloc(x_init, GCHandleType.Pinned);
x = x_handle.AddrOfPinnedObject();
//use x pointer …
x_handle.Free();

Now GC can’t move the pointer and the array used by API function is always located.

I hope these notes useful for poor developer always alone in the ocean of troubles.

 

Gianluca

 

Direct Sparse Solver for Clusters - Pardiso Memory Allocation Error

$
0
0

Hi,

Once again having some trouble with the Direct Sparse Solver for clusters. I am getting the following error when running on a single process

entering matrix solver
*** Error in PARDISO  (     insufficient_memory) error_num= 1
*** Error in PARDISO memory allocation: MATCHING_REORDERING_DATA, allocation of 1 bytes failed
total memory wanted here: 142 kbyte

=== PARDISO: solving a real structurally symmetric system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000005 s
Time spent in reordering of the initial matrix (reorder)         : 0.000000 s
Time spent in symbolic factorization (symbfct)                   : 0.000000 s
Time spent in allocation of internal data structures (malloc)    : 0.000465 s
Time spent in additional calculations                            : 0.000080 s
Total time spent                                                 : 0.000550 s

Statistics:
===========
Parallel Direct Factorization is running on 1 OpenMP

< Linear system Ax = b >
             number of equations:           6
             number of non-zeros in A:      8
             number of non-zeros in A (%): 22.222222

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    0
             size of largest supernode:               0
             number of non-zeros in L:                0
             number of non-zeros in U:                0
             number of non-zeros in L+U:              0

ERROR during solution: 4294967294

I just hangs when running on a single process. Below is the CSR format of my matrix and the provided RHS to solve for

CSR row values
0
2
6
9
12
16
18

CSR col values
0
1
0
1
2
3
1
2
4
1
3
4
2
3
4
5
4
5

Rank 0 rhs vector :
1
0
0
0
0
1

Now my calling file looks like:

void SolveMatrixEquations(MKL_INT numRows, MatrixPointerStruct &cArrayStruct, const std::pair<MKL_INT,MKL_INT>& rowExtents)
{

	double pressureSolveTime = -omp_get_wtime();

	MKL_INT mtype = 1;  /* set matrix type to "real structurally symmetric" */
	MKL_INT nrhs = 1;  /* number of right hand sides. */

	void *pt[64] = { 0 }; //internal memory Pointer

						  /* Cluster Sparse Solver control parameters. */
	MKL_INT iparm[64] = { 0 };
	MKL_INT maxfct, mnum, phase=13, msglvl, error;

	/* Auxiliary variables. */
	float   ddum; /* float dummy   */
	MKL_INT idum; /* Integer dummy. */
	MKL_INT i, j;

	/* -------------------------------------------------------------------- */
	/* .. Init MPI.                                                         */
	/* -------------------------------------------------------------------- */

	int     mpi_stat = 0;
	int     comm, rank;
	mpi_stat = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
	comm = MPI_Comm_c2f(MPI_COMM_WORLD);

	/* -------------------------------------------------------------------- */
	/* .. Setup Cluster Sparse Solver control parameters.                                 */
	/* -------------------------------------------------------------------- */
	iparm[0] = 0; /* Solver default parameters overridden with provided by iparm */
	iparm[1] =3; /* Use METIS for fill-in reordering */
	//iparm[1] = 10; /* Use parMETIS for fill-in reordering */
	iparm[5] = 0; /* Write solution into x */
	iparm[7] = 2; /* Max number of iterative refinement steps */
	iparm[9] = 8; /* Perturb the pivot elements with 1E-13 */
	iparm[10] = 0; /* Don't use non-symmetric permutation and scaling MPS */
	iparm[12] = 0; /* Switch on Maximum Weighted Matching algorithm (default for non-symmetric) */
	iparm[17] = 0; /* Output: Number of non-zeros in the factor LU */
	iparm[18] = 0; /* Output: Mflops for LU factorization */
	iparm[20] = 0; /*change pivoting for use in symmetric indefinite matrices*/
	iparm[26] = 1;
	iparm[27] = 0; /* Single precision mode of Cluster Sparse Solver */
	iparm[34] = 1; /* Cluster Sparse Solver use C-style indexing for ia and ja arrays */

	iparm[39] = 2; /* Input: matrix/rhs/solution stored on master */
	iparm[40] = rowExtents.first+1;
	iparm[41] = rowExtents.second+1;
	maxfct = 3; /* Maximum number of numerical factorizations. */
	mnum = 1; /* Which factorization to use. */
	msglvl = 1; /* Print statistical information in file */
	error = 0; /* Initialize error flag */
	//cout << "Rank "<< rank << ": "<< iparm[40] << ""<< iparm[41] << endl;
#ifdef UNIT_TESTS
	//msglvl = 0;
#endif




	phase = 11;
	#ifndef UNIT_TESTS
	if (rank == 0)printf("Restructuring system...\n");
	cout << "Restructuring system...\n"<<endl;;
	#endif

	cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,&numRows, &ddum, cArrayStruct.rowIndexArray, cArrayStruct.colIndexArray, &idum, &nrhs, iparm, &msglvl,&ddum, &ddum, &comm, &error);
	if (error != 0)
	{
		cout << "\nERROR during solution: "<< error << endl;
		exit(error);
	}


	phase = 23;

#ifndef UNIT_TESTS
//	if (rank == 0) printf("\nSolving system...\n");
	printf("\nSolving system...\n");
#endif

	cluster_sparse_solver_64(pt, &maxfct, &mnum, &mtype, &phase,&numRows, cArrayStruct.valArray, cArrayStruct.rowIndexArray, cArrayStruct.colIndexArray, &idum, &nrhs, iparm, &msglvl,
		cArrayStruct.rhsVector, cArrayStruct.pressureSolutionVector, &comm, &error);
	if (error != 0)
	{
		cout << "\nERROR during solution: "<< error << endl;
		exit(error);
	}

	phase = -1; /* Release internal memory. */
	cluster_sparse_solver_64(pt, &maxfct, &mnum, &mtype, &phase,&numRows, &ddum, cArrayStruct.rowIndexArray, cArrayStruct.colIndexArray, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &comm, &error);
	if (error != 0)
	{
		cout << "\nERROR during release memory: "<< error << endl;
		exit(error);
	}
	/* Check residual */

	pressureSolveTime += omp_get_wtime();


#ifndef UNIT_TESTS
	//cout << "Pressure Solve Time: "<< pressureSolveTime << endl;
#endif

	//TestPrintCsrMatrix(cArrayStruct,rowExtents.second-rowExtents.first +1);
}

This is based on the format of one of the examples. Now i am trying to use the ILP64 interface becasue my example system is very large. (16 billion non-zeros). I am using the Intel C++ compiler 2017 as part of the Intel Composer XE Cluster Edition Update 1. I using the following link lines in my Cmake files: 

TARGET_COMPILE_OPTIONS(${MY_TARGET_NAME} PUBLIC "-mkl:cluster""-DMKL_ILP64""-I$ENV{MKLROOT}/include")
TARGET_LINK_LIBRARIES(${MY_TARGET_NAME} "-Wl,--start-group $ENV{MKLROOT}/lib/intel64/libmkl_intel_ilp64.a $ENV{MKLROOT}/lib/intel64/libmkl_intel_thread.a $ENV{MKLROOT}/lib/intel64/libmkl_core.a $ENV{MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl")

What is interesting is that this same code runs perfectly fine on my windows development machine. Porting it to my linux cluster is causing issues. Any Ideas?

I am currently awaiting the terribly long download for the update 4 Composer XE package. But I don't have much hope of that fixing it because this code used to run fine on this system. 

HyperThreading and CPU usage

$
0
0

Hi everyone,

I tried LAPACKE_dgels​ and change NO thread-nubmer settings at all. I guess the default thread number (the same as phycical core number) is used. As I wathch the CPU usage during the code running, it reach a peak at 50 %. I guess that means using 50% of CPU made the calculation run as fast as it could, and using more than 50% of CPU by hyper-threading only slow it down? Do I understand it right here?

 

 

Thread Topic: 

Question

How can I interrupt / abort LAPACK and BLAS methods which do not support callback ?

$
0
0

I'm computing some SVDs and other time-consuming things using the mkl C libraries.

I've found that some methods implement a progress call back (https://software.intel.com/en-us/mkl-developer-reference-c-mkl-progress), but that does not seem to be the case for the calls I'm interested in ( _gesvd, _gesdd, _gemm, _imatcopy ).

Is there a clean work-around for this issue?  I would ideally like to be able to post "progress", but at least I would like to be able to cleanly "abort" the computation by user-interaction.

Zone: 

Thread Topic: 

Question
Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>