Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

LAPACKE_zgesvxx with small size systems

$
0
0

We use LAPACKE_zgesvxx function in order to resolve a huge system of linear equations. 

This function is very fast with this kind of matrices, but if we have small matrix Size like < 10x10, it is very slow.
Did you provide any optimized method for this kind of size?

We have to resolve plenty of small small systems.

Best regards

Gianluca


Problem with mkl fft dynamic array

$
0
0

 

Hi, I want to use the mklfft with multithreading, but I faced the problem. I got an error - Segmentation fault (core dumped) even when I just created the simple 2D dimension dynamic array. Also, I have checked the multithread example with static array, it worked perfectly, but when I created dynamic array, it gave me an error. Could you tell me what is wrong I am doing, please? 

 

This is my code:

#include "mkl_dfti.h"

#include <omp.h>

#include <math.h>

#include <complex.h>

#include <float.h>

#include <sys/time.h>

 

int main(int argc, char *argv[])

{

    int k, j, N = atoi(argv[1]);

    MKL_Complex8 **data = malloc(N*sizeof(MKL_Complex8*));

    for(k=0; k<N; k++){

       data[k] = malloc(N*sizeof(MKL_Complex8));

    }

 

    printf(

      "Before FFT: N %d.\n",

      N);

 

    MKL_LONG len[2] = {N, N};

    DFTI_DESCRIPTOR_HANDLE FFT;

    int th;

    printf("Create...\n");

    DftiCreateDescriptor (&FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len);

    DftiCommitDescriptor (FFT);

 

    printf("Execute...\n");

    struct timeval start, end;

    gettimeofday(&start, NULL);

 

    DftiComputeForward (FFT, data);

 

    DftiFreeDescriptor (&FFT);

 

    gettimeofday(&end, NULL);

 

    double tstart = start.tv_sec + start.tv_usec/1000000.;

    double tend = end.tv_sec + end.tv_usec/1000000.;

    double t_sec = (tend - tstart);

    int S = N*N;

    double speed = (double) 5*S*log2(S)/t_sec*1e-6;

 

    printf("Time: %0.6f, Speed: %0.6f\n", t_sec, speed);

 

    for(j=0; j<N; j++) {

       free(data[j]);

    }

    free(data);

 

    return 0;

}

 

Link error 2019 accessing _DPOSV

$
0
0

I am converting a fortran dll library project from an earlier version of the Microsoft IDE to VS 2008 using Intel Fortran Version 10.0.3.24 and the corresponding MKL.   When the linker step of the project runs, I get missing routines:

1>MTXSOLVE.obj : error LNK2019: unresolved external symbol _DPOSV referenced in function MTXSOLVE
1>MTXSOLVE.obj : error LNK2001: unresolved external symbol _DSPTRF
1>MTXSOLVE.obj : error LNK2001: unresolved external symbol _DSPTRS
1>MTXSOLVE.obj : error LNK2001: unresolved external symbol _DPPSVDPOSV, 

I think these are LAPACK routines in the MKL.  I am building the IA32 version of the dll and I explicitly include the path to the ia23 MKL lib routines.  Where did I go wrong? Here is the linker command line.

/OUT:"Debug/Upid_Fortran.dll" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files\Intel\MKL\10.0.3.021\ia32\lib" /MANIFEST /MANIFESTFILE:"C:\Users\goldend\Documents\Development\UPID\trunk\UPID_Fortran\debug\upid_fortran.dll.intermediate.manifest" /DEBUG /PDB:"Debug/Upid_Fortran.pdb" /SUBSYSTEM:WINDOWS /IMPLIB:"Debug/Upid_Fortran.lib" /DLL kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

 

 

 

 

 

 

Zone: 

MKL changes line endings of stdout to text mode on Windows

Quadratic programming in intel math kernal lib

$
0
0

I am a intel fortran commercial user and  looking for general Quadratic programming routine, but I was unable to find it in the math library. The existing routines  limited to the constrain L1 <= x <<L2. see

https://software.intel.com/en-us/node/471098#7CF8EA20-5C99-4E1D-A8D6-C62...

In mathlab, the general solution is

with constrains

Ax<=b           Aeq. x =Beq       lb<=x<<la

https://www.mathworks.com/help/optim/ug/quadprog.html?requestedDomain=ww...

Our constrain requirement is minimization subjected to   Ax <= b

 

 

 

 

 

 

Zone: 

Thread Topic: 

Question

djacobi doesn't exit

$
0
0

(Nonlinear Least Squares Problem without Constraints)
I have implemented a C# TrustRegion Algotithm using MKL API (eg. dtrnlspbc_solve, djacobi ...) for our purposes.
It works very well but sometimes no.

Sometimes it doesn't exit, it seems djacobi remains in loop (at least it seems debugging). 
In the attached image you can see the CPU usage, I've executed the method three times.
The first time the method terminate, the CPU usage is restored to the previous value, 
and so the second time. 
But the third time something remains in usage!

I have created a dedicated class with a method that execute the algorithm as a Thread.
After 60 seconds, if the solver hadn't finished it kills the thread, but it doesn't work.

I've tried different tricks but anyone of this solved the problem.

Any suggestion?

Thank you very much

Below the solver main loop ...

 

while (!bSuccessful)
            {
                if (mbShowMessages) Console.WriteLine("Step " + iStep.ToString());

                if (RCI_Solve(ref handle, fvec, fjac, ref iRCI_Request) != TR_SUCCESS)
                {
                    Console.WriteLine("RCI_Solve() retun error!");
                    RCI_FreeBuffers();
                    iError = 1;
                    goto end;
                }
                bSuccessful =
                    iRCI_Request == -1 || iRCI_Request == -2 || iRCI_Request == -3 ||
                    iRCI_Request == -4 || iRCI_Request == -5 || iRCI_Request == -6;

                if (iRCI_Request == 1)
                {
                    if (mbShowMessages) Console.WriteLine("go objective_function()");
                    objective_function(ref m, ref n, x, fvec);
                }
                if (iRCI_Request == 2)
                {
                    if (mbShowMessages) Console.WriteLine("go djacobi()");
                    int iRes = djacobi(objective_function, ref n, ref m, fjac, x, ref jac_eps);
                    if (iRes != TR_SUCCESS)
                    {
                        if (iRes == TR_INVALID_OPTION)
                        {
                            Console.WriteLine("error in djacobi: invalid options.");
                        }
                        else if (iRes == TR_OUT_OF_MEMORY)
                        {
                            Console.WriteLine("error in djacobi: out of memory.");
                        }

                        RCI_FreeBuffers();
                        iError = 1;
                        goto end;
                    }

                }

                if (iStep > MAX_STEP)
                {
                    Console.WriteLine(string.Format("Too many external loop! (max {0})", MAX_STEP));
                    RCI_FreeBuffers();
                    iError = 1;
                    goto end;
                }

                iStep++;
            }

 

AttachmentSize
Downloadimage/pngcpu_usage.png29.12 KB

djacobix with user defined function imported from a module

$
0
0

Hi there, I'm new to this forum, so I hope I'm posting this into the correct section.

 

 

I'm trying to use the mkl trust region algorithm to solve a non linear system of equations in a fortran program. I started from the example provided online (ex_nlsqp_f90_x.f90 https://software.intel.com/en-us/node/501498) and everything works correctly. Now, because I have to use this in a much bigger program, I need the user defined objective function to be loaded from a separate module. Hence, I splitted the example into 2 separate files, but I'm not able to make it compile correctly. 

So here is the code for module which contains user defined data structure and the objective function

module modFun
implicit none
private
public my_data, extended_powell

type :: my_data
      integer a
      integer sum
end type my_data


contains

subroutine extended_powell (m, n, x, f, user_data)
    implicit none
    integer, intent(in) :: m, n
    real*8 , intent(in) :: x(n)
    real*8, intent(out) :: f(m)
    type(my_data) :: user_data
    integer i

    user_data%sum = user_data%sum + user_data%a
    do i = 1, n/4
        f(4*(i-1)+1) = x(4*(i-1)+1) + 10.0 * x(4*(i-1)+2)
        f(4*(i-1)+2) = 2.2360679774998 * (x(4*(i-1)+3) - x(4*(i-1)+4))
        f(4*(i-1)+3) = ( x(4*(i-1)+2) - 2.0 * x(4*(i-1)+3) )**2
        f(4*(i-1)+4) = 3.1622776601684 * (x(4*(i-1)+1) - x(4*(i-1)+4))**2
    end do
end subroutine extended_powell

end module modFun
!    nonlinear least square problem without boundary constraints
    include 'mkl_rci.f90'
program EXAMPLE_EX_NLSQP_F90_X
    use MKL_RCI
    use MKL_RCI_type
    use modFun

!   user's objective function
!   n - number of function variables
!   m - dimension of function value
    integer n, m
    parameter (n = 4)
    parameter (m = 4)
!   precisions for stop-criteria (see manual for more details)
    real*8 eps(6)
!   solution vector. contains values x for f(x)
    real*8 x(n)
!   jacobi matrix
    real*8 fjac(m*n)
...
!   Additional users data
    type(my_data), target :: m_data
...
djacobix (extended_powell,n,m,fjac,x,eps(1),%val(loc(m_data))) /= &
                TR_SUCCESS)
...
end program

The problem appear to be some characteristic of argument 5, which is the are defined data, hence it has its on type. I tried changing intent, but it doesn't work

ex_nlsqp_f90_x_M.f90(170): error #7065: The characteristics of dummy argument 5 of the associated actual procedure differ from the characteristics of dummy argument 5 of the dummy procedure.   [EXTENDED_POWELL]

            if (djacobix (extended_powell,n,m,fjac,x,eps(1),%val(loc(m_data))) /= &

--------------------------^

 

On the other hand http://technion.ac.il/doc/intel/mkl/mkl_manual/osr/functn_djacobix.htm it's stated that the first argument of djacobix (i.e. the objective function subroutine) need to be declared as external, which can't done  because the obj function need to reside in a module.

Can anyone help me with this problem providing a working example? Thanks a lot. Andrea

 

 

Thread Topic: 

Question

dss_reorder gives access violation error.

$
0
0

I'm trying to solve sparse symmetric system:

Matrix A looks like this (I set unly one half, like in example code  dss_sym_c.c ):

[[  1.17  -0.08   0.    -0.09   0.     0.     0.     0.     0.  ]
 [  0.     6.98  -1.73   0.    -4.16   0.     0.     0.     0.  ]
 [  0.     0.    10.5    0.     0.    -7.77   0.     0.     0.  ]
 [  0.     0.     0.     3.68  -0.86   0.    -1.73   0.     0.  ]
 [  0.     0.     0.     0.    13.55  -2.14   0.    -5.37   0.  ]
 [  0.     0.     0.     0.     0.    19.9    0.     0.    -8.97]
 [  0.     0.     0.     0.     0.     0.     4.04  -1.3    0.  ]
 [  0.     0.     0.     0.     0.     0.     0.    10.23  -2.55]
 [  0.     0.     0.     0.     0.     0.     0.     0.    12.53]]

And RHS:

[1,2,3,4,5,6,7,8,9]

The sample code after substitution of my data:

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include "mkl_dss.h"
#include "mkl_types.h"
/*
** Define the array and rhs vectors
*/
#define NROWS       9
#define NCOLS       9
#define NNONZEROS   23
#define NRHS        1
static const MKL_INT nRows = NROWS;
static const MKL_INT nCols = NCOLS;
static const MKL_INT nNonZeros = NNONZEROS;
static const MKL_INT nRhs = NRHS;

static _INTEGER_t rowIndex[NROWS + 1] = { 1,4,7,10,13,16,19,21,23,24 };
static _INTEGER_t columns[NNONZEROS] = { 4, 2, 1, 5, 3, 2, 6, 4, 3, 7, 5, 4, 8, 6, 5, 9, 7, 6, 8, 7, 9, 8, 9 };
static _DOUBLE_PRECISION_t values[NNONZEROS] = { -0.09,-0.08,1.17,-4.16,-1.73,6.98,-7.77,0,10.5,-1.73,-0.86,3.68,-5.37,-2.14,13.55,-8.97,0,19.90,-1.30,4.04,-2.55,10.23,12.53 };
static _DOUBLE_PRECISION_t rhs[NCOLS] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };

int main()
{

    /* Allocate storage for the solver handle and the right-hand side. */
    _DOUBLE_PRECISION_t solValues[NROWS];
    _MKL_DSS_HANDLE_t handle;
    _INTEGER_t error;
    MKL_INT opt = MKL_DSS_DEFAULTS;
    MKL_INT sym = MKL_DSS_NON_SYMMETRIC;
    MKL_INT type = MKL_DSS_POSITIVE_DEFINITE;
    /* --------------------- */
    /* Initialize the solver */
    /* --------------------- */
    error = dss_create(handle, opt);
    if (error != MKL_DSS_SUCCESS)
        goto printError;
    /* ------------------------------------------- */
    /* Define the non-zero structure of the matrix */
    /* ------------------------------------------- */
    error = dss_define_structure(handle, sym, rowIndex, nRows, nCols, columns, nNonZeros);
    if (error != MKL_DSS_SUCCESS)
        goto printError;
    /* ------------------ */
    /* Reorder the matrix */
    /* ------------------ */
    error = dss_reorder(handle, opt, 0); // <<<< --------- This line gives access violation error.
    if (error != MKL_DSS_SUCCESS)
        goto printError;
    /* ------------------ */
    /* Factor the matrix  */
    /* ------------------ */
    error = dss_factor_real(handle, type, values);
    if (error != MKL_DSS_SUCCESS)
        goto printError;
    /* ------------------------ */
    /* Get the solution vector  */
    /* ------------------------ */
    error = dss_solve_real(handle, opt, rhs, nRhs, solValues);
    if (error != MKL_DSS_SUCCESS)
        goto printError;

    /* -------------------------- */
    /* Deallocate solver storage  */
    /* -------------------------- */
    error = dss_delete(handle, opt);
    if (error != MKL_DSS_SUCCESS)
        goto printError;
    /* ---------------------- */
    /* Print solution vector  */
    /* ---------------------- */
    printf(" Solution array: ");
    for (int i = 0; i < nCols; i++)
        printf(" %g", solValues[i]);
    printf("\n");

    getchar();

    exit(0);
printError:
    printf("Solver returned error code %d\n", error);
    exit(1);
}

 

The python code I checked my matrix:

import numpy as np
from scipy.sparse import csr_matrix

indptr = np.array([1,4,7,10,13,16,19,21,23,24])
indices = np.array([4,2,1,5,3,2,6,4,3,7,5,4,8,6,5,9,7,6,8,7,9,8,9])

indptr-=1;
indices-=1;

data = np.array([-0.09,-0.08,1.17,-4.16,-1.73,6.98,-7.77,0,10.5,-1.73,-0.86,3.68,-5.37,-2.14,13.55,-8.97,0,19.90,-1.30,4.04,-2.55,10.23,12.53 ])

print(csr_matrix( (data,indices,indptr), shape=(9,9) ).todense())

Can you help me please ? What I made wrong ?

Zone: 


mkl 11.3.3 memory leak

$
0
0

We observed following memory leak from mkl:

10:57:09,158              7   3.179332e-08   7.099199e-08   2.239095e-14   8.614772e-15
10:57:21,509 0 leaked Serializable objects
10:57:22,421 ==14604==
10:57:22,427 ==14604== HEAP SUMMARY:
10:57:22,428 ==14604==     in use at exit: 1,009,739 bytes in 1,503 blocks
10:57:22,428 ==14604==   total heap usage: 2,080,855 allocs, 2,079,352 frees, 288,620,780 bytes allocated
10:57:22,428 ==14604==
10:57:22,650 ==14604== 184 bytes in 1 blocks are possibly lost in loss record 608 of 730
10:57:22,653 ==14604==    at 0x4C26B5D: malloc (vg_replace_malloc.c:299)
10:57:22,654 ==14604==    by 0x24EE8AB6: mkl_serv_allocate (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_core.so)
10:57:22,654 ==14604==    by 0x22CBED34: DGELSY (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_lp64.so)
10:57:22,654 ==14604==    by 0x22EC02A0: LAPACKE_dgelsy_work (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_lp64.so)
10:57:22,654 ==14604==    by 0x22EC01C3: LAPACKE_dgelsy (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_lp64.so)

10:57:22,655 {
10:57:22,655    <insert_a_suppression_name_here>
10:57:22,655    Memcheck:Leak
10:57:22,655    match-leak-kinds: possible
10:57:22,655    fun:malloc
10:57:22,655    fun:mkl_serv_allocate
10:57:22,655    fun:DGELSY
10:57:22,656    fun:LAPACKE_dgelsy_work
10:57:22,656    fun:LAPACKE_dgelsy
10:57:22,656    fun:_Z10gradientReILi3EEvRSt6vectorI6VectorILi3EdESaIS2_EEjRKS0_IS1_IXT_EdESaIS6_EE
10:57:22,656    fun:_ZN10EvaluatorTI8BaryCellLi3EdE18prepareVertexBasedEv
10:57:22,656    fun:_ZN10EvaluatorTI8BaryCellLi3EdE9setOffsetERK7Element
10:57:22,656    fun:_ZNK13ViscousEngineI8BaryCelldE7reserveERK7ElementRK10FieldIndexI4CellER9AssemblerIdE
10:57:22,656    fun:_ZNK11DiscretizerIdE7exploreI4Cell8FvRegionEEvRKT0_RK9ModelPart
10:57:22,656    fun:_ZNK8TermEvalI20BasicRegionCondition22BasicConditionFaceTermE16SpecificDelegateclERK6FvTermRK8FvObject
10:57:22,656    fun:_ZN12FvTermHelper14evaluateOnPartI8FvRegionEEjRK6FvTermRKT_RK19TermAcceptorManagerb
10:57:22,656    fun:_ZN16VerifiedFaceTerm16verifiedEvaluateERK8FvDomainRKbS4_
10:57:22,656    fun:_ZN8FaceTerm8evaluateERK8FvDomain
10:57:22,656    fun:_ZN11DiscretizerIdE10discretizeERK8FvDomainb
10:57:22,656    fun:_ZN13ViscousSolver13solveSpecificIdEEvRK16FvRepresentationR14NamedResiduals
10:57:22,656    fun:_ZN13ViscousSolver15iterationUpdateER14NamedResiduals
10:57:22,656    fun:_ZN14RunnableSolver23doSolverIterationUpdateERK13SolverManagerR14NamedResiduals
10:57:22,656    fun:_ZN14RunnableSolver7iterateEv
10:57:22,656    fun:_ZN18SimulationIterator15startSimulationEP14RunnableSolveriNS_7RunModeEb
10:57:22,656    fun:_ZN14StepSimulation7executeERK10PropertiesRS0_
10:57:22,656    fun:_ZN10Controller14executeCommandER7CommandRK10PropertiesRS2_
10:57:22,657    fun:_ZN10Controller15processCommandsEv
10:57:22,657    fun:_ZN17CommandController23SerialMasterCommandLoop5startEv
10:57:22,657 }
10:57:22,657 ==14604== 256 bytes in 1 blocks are possibly lost in loss record 621 of 730
10:57:22,657 ==14604==    at 0x4C26B5D: malloc (vg_replace_malloc.c:299)
10:57:22,657 ==14604==    by 0x24EEA011: mm_account_ptr_by_tid..0 (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_core.so)
10:57:22,657 ==14604==    by 0x24EE83E9: mkl_serv_allocate (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_core.so)
10:57:22,657 ==14604==    by 0x22EC0176: LAPACKE_dgelsy (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_lp64.so)
10:57:22,657 ==14604==    by 0x2178E611: void gradientRe<3>(std::vector<Vector<3, double>, std::allocator<Vector<3, double> > >&, unsigned int, std::vector&lt;Vector<3, double>, std::allocator<Vector<3, double> > > const&) (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/star/lib/linux-x86_64-2.5/gnu4.8/lib/libViscousModel.so)


10:57:22,658 {
10:57:22,658    <insert_a_suppression_name_here>
10:57:22,658    Memcheck:Leak
10:57:22,658    match-leak-kinds: possible
10:57:22,659    fun:malloc
10:57:22,659    fun:mm_account_ptr_by_tid..0
10:57:22,659    fun:mkl_serv_allocate
10:57:22,659    fun:LAPACKE_dgelsy
10:57:22,659    fun:_Z10gradientReILi3EEvRSt6vectorI6VectorILi3EdESaIS2_EEjRKS0_IS1_IXT_EdESaIS6_EE
10:57:22,659    fun:_ZN10EvaluatorTI8BaryCellLi3EdE18prepareVertexBasedEv
10:57:22,659    fun:_ZN10EvaluatorTI8BaryCellLi3EdE9setOffsetERK7Element
10:57:22,659    fun:_ZNK13ViscousEngineI8BaryCelldE7reserveERK7ElementRK10FieldIndexI4CellER9AssemblerIdE
10:57:22,659    fun:_ZNK11DiscretizerIdE7exploreI4Cell8FvRegionEEvRKT0_RK9ModelPart
10:57:22,659    fun:_ZNK8TermEvalI20BasicRegionCondition22BasicConditionFaceTermE16SpecificDelegateclERK6FvTermRK8FvObject
10:57:22,659    fun:_ZN12FvTermHelper14evaluateOnPartI8FvRegionEEjRK6FvTermRKT_RK19TermAcceptorManagerb
10:57:22,659    fun:_ZN16VerifiedFaceTerm16verifiedEvaluateERK8FvDomainRKbS4_
10:57:22,659    fun:_ZN8FaceTerm8evaluateERK8FvDomain
10:57:22,659    fun:_ZN11DiscretizerIdE10discretizeERK8FvDomainb
10:57:22,659    fun:_ZN13ViscousSolver13solveSpecificIdEEvRK16FvRepresentationR14NamedResiduals
10:57:22,659    fun:_ZN13ViscousSolver15iterationUpdateER14NamedResiduals
10:57:22,659    fun:_ZN14RunnableSolver23doSolverIterationUpdateERK13SolverManagerR14NamedResiduals
10:57:22,659    fun:_ZN14RunnableSolver7iterateEv
10:57:22,659    fun:_ZN18SimulationIterator15startSimulationEP14RunnableSolveriNS_7RunModeEb
10:57:22,660    fun:_ZN14StepSimulation7executeERK10PropertiesRS0_
10:57:22,660    fun:_ZN10Controller14executeCommandER7CommandRK10PropertiesRS2_
10:57:22,660    fun:_ZN10Controller15processCommandsEv
10:57:22,660    fun:_ZN17CommandController23SerialMasterCommandLoop5startEv
10:57:22,660    fun:_ZN17CommandController15processCommandsEv
10:57:22,660 }
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ More log/stack-trace ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10:57:22,666 ==14604== 557,216 bytes in 1 blocks are possibly lost in loss record 730 of 730
10:57:22,666 ==14604==    at 0x4C26B5D: malloc (vg_replace_malloc.c:299)
10:57:22,666 ==14604==    by 0x24EE8AB6: mkl_serv_allocate (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_core.so)
10:57:22,666 ==14604==    by 0x2EEF2954: mkl_blas_avx_xdtrsm (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_avx.so)
10:57:22,666 ==14604==    by 0x23739106: mkl_blas_dtrsm_omp_driver_v1 (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_thread.so)
10:57:22,666 ==14604==    by 0x237169BE: mkl_blas_dtrsm_host (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_thread.so)
10:57:22,666 ==14604==    by 0x237340D1: mkl_blas_dtrsm (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_thread.so)
10:57:22,666 ==14604==    by 0x2541ACC3: mkl_lapack_dgelsy (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_core.so)
10:57:22,666 ==14604==    by 0x22CBEDFA: DGELSY (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_lp64.so)
10:57:22,667 ==14604==    by 0x22EC02A0: LAPACKE_dgelsy_work (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_lp64.so)
10:57:22,667 ==14604==    by 0x22EC01C3: LAPACKE_dgelsy (in /home/install3/lin64/12.03.075_01/STAR-CCM+12.03.075/mkl/11.3.3/linux/lib/intel64/libmkl_intel_lp64.so)

10:57:22,668    <insert_a_suppression_name_here>
10:57:22,668    Memcheck:Leak
10:57:22,668    match-leak-kinds: possible
10:57:22,668    fun:malloc
10:57:22,668    fun:mkl_serv_allocate
10:57:22,668    fun:mkl_blas_avx_xdtrsm
10:57:22,668    fun:mkl_blas_dtrsm_omp_driver_v1
10:57:22,668    fun:mkl_blas_dtrsm_host
10:57:22,668    fun:mkl_blas_dtrsm
10:57:22,669    fun:mkl_lapack_dgelsy
10:57:22,669    fun:DGELSY
10:57:22,669    fun:LAPACKE_dgelsy_work
10:57:22,669    fun:LAPACKE_dgelsy
10:57:22,669    fun:_Z10gradientReILi3EEvRSt6vectorI6VectorILi3EdESaIS2_EEjRKS0_IS1_IXT_EdESaIS6_EE
10:57:22,669    fun:_ZN10EvaluatorTI8BaryCellLi3EdE18prepareVertexBasedEv
10:57:22,669    fun:_ZN10EvaluatorTI8BaryCellLi3EdE9setOffsetERK7Element
10:57:22,669    fun:_ZNK13ViscousEngineI8BaryCelldE7reserveERK7ElementRK10FieldIndexI4CellER9AssemblerIdE
10:57:22,669    fun:_ZNK11DiscretizerIdE7exploreI4Cell8FvRegionEEvRKT0_RK9ModelPart
10:57:22,669    fun:_ZNK8TermEvalI20BasicRegionCondition22BasicConditionFaceTermE16SpecificDelegateclERK6FvTermRK8FvObject
10:57:22,669    fun:_ZN12FvTermHelper14evaluateOnPartI8FvRegionEEjRK6FvTermRKT_RK19TermAcceptorManagerb
10:57:22,669    fun:_ZN16VerifiedFaceTerm16verifiedEvaluateERK8FvDomainRKbS4_
10:57:22,669    fun:_ZN8FaceTerm8evaluateERK8FvDomain
10:57:22,669    fun:_ZN11DiscretizerIdE10discretizeERK8FvDomainb
10:57:22,669    fun:_ZN13ViscousSolver13solveSpecificIdEEvRK16FvRepresentationR14NamedResiduals
10:57:22,669    fun:_ZN13ViscousSolver15iterationUpdateER14NamedResiduals
10:57:22,669    fun:_ZN14RunnableSolver23doSolverIterationUpdateERK13SolverManagerR14NamedResiduals
10:57:22,669    fun:_ZN14RunnableSolver7iterateEv
10:57:22,670 }
10:57:22,670 ==14604== LEAK SUMMARY:
10:57:22,670 ==14604==    definitely lost: 0 bytes in 0 blocks
10:57:22,670 ==14604==    indirectly lost: 0 bytes in 0 blocks
10:57:22,670 ==14604==      possibly lost: 627,584 bytes in 5 blocks
10:57:22,670 ==14604==    still reachable: 19,178 bytes in 150 blocks
10:57:22,670 ==14604==                       of which reachable via heuristic:
10:57:22,670 ==14604==                         stdstring          : 8,021 bytes in 130 blocks
10:57:22,670 ==14604==         suppressed: 362,977 bytes in 1,348 blocks
10:57:22,670 ==14604== Reachable blocks (those to which a pointer was found) are not shown.
10:57:22,670 ==14604== To see them, rerun with: --leak-check=full --show-leak-kinds=all
10:57:22,670 ==14604==
10:57:22,670 ==14604== For counts of detected and suppressed errors, rerun with: -v
10:57:22,670 ==14604== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 3999 from 202)
10:57:22,741 Design STAR-CCM+ simulation completed
10:57:22,741 Server process exited with code : 231
10:57:22,745

Electrostatic Boundary Value Problem with 3-D Boundary

$
0
0

Hello,

Does anyone know if it's possible (and how) to solve a boundary value problem in Cartesian coordinates with a 3-D boundary on one face (let's say the z=0 plane). For example, I want to solve Laplace's equation in the domain 0<x<L, 0<y<L, 0<z<L, with Dirichlet BC's on all but the z=0 plane, and have several (let's say two) cylinders of length L_cyl protruding into the domain that also have Dirichlet BC's on their surfaces. Is that possible in the MKL Poisson Library?

 

Thank you,

Ryan

Zone: 

Thread Topic: 

Question

mkl 2017.2.174 linux vs 2017.2.187 windows

$
0
0

I download the latest mkl for both linux and windows and notice minor version is lightly different: 2017.2.174 on linux and  2017.2.187 on windows. Are they supposed to be identical?

Simple vectcorization question

$
0
0

 

Hi,

I wrote a simple function and executed it on a KNL processor (68 cores, Flat Quadrature, using MCDRAM) using only one thread and n=10,000,000. I execute this function 100 times and take the average, then calculate the GFLOPS using the following formula gflops = (1e-9 * 2.0 * n ) / execution time

double multiplyAccum(long n,double *A, double *B)
{
    long i;
    double result = 0;
    #pragma novector
    //#pragma simd
    for ( i = 0; i < n; i++ )
    {
        result += A[i] * B[i];
    }
    return result;
}

1) When I use #pragma novector, I get 0.839571 GFLOPS/s

This is the compiler report for the loop:

      remark #15319: loop was not vectorized: novector directive used
      remark #25439: unrolled with remainder by 8  
      remark #25456: Number of Array Refs Scalar Replaced In Loop: 1
      remark #25457: Number of partial sums replaced: 1

When I use #pragma simd, I get  1.495788 GFLOPS/s

This is the compiler report for the loop:

      remark #15388: vectorization support: reference A_34279 has aligned access   [ multiplyAccum.cpp(64,3) ]
      remark #15388: vectorization support: reference B_34279 has aligned access   [ multiplyAccum.cpp(64,3) ]
      remark #15305: vectorization support: vector length 8
      remark #15399: vectorization support: unroll factor set to 8
      remark #15309: vectorization support: normalized vectorization overhead 0.446
      remark #15301: SIMD LOOP WAS VECTORIZED
      remark #15448: unmasked aligned unit stride loads: 2 
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 9 
      remark #15477: vector loop cost: 0.870 
      remark #15478: estimated potential speedup: 10.280 
      remark #15488: --- end vector loop cost summary ---
      remark #25015: Estimate of max trip count of loop=156250

The potential speedup is 10X, while I only get 1.8X, What is the explanation for this ? 

 

Thanks,

How to use mkl for FFT?

$
0
0

I am a new student in mkl and c language.

now when i replace FFT function with DftiComputeForward, i can not get the correct out put,why?

Realft(info.x,info.fftN);  // FFT info.x is  float in-out x[1] is real ,x[2] is complex.

 

  MKL_Complex8* x_cmplx = 0;
 x_cmplx = (MKL_Complex8*)mkl_malloc((info.fftN/2 + 1) * sizeof(MKL_Complex8), 64);
 DftiComputeForward(hand, info.x,x_cmplx);

Zone: 

_MSC_VER mismatch again

$
0
0

Hi,

Today I upgraded my MKL installation from 2017.0 to 2017.2 and recompile my application, then I got the following link error:
1>mkl_tbb_thread.lib(vml_tbb_threading_templates.obj) : error LNK2038: mismatch detected for '_MSC_VER': value '1600' doesn't match value '1900' in libboost_system-vc140-mt-1_62.lib(error_code.obj)
And I found a similar bug did exist in a previous MKL version:
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...

Please fix it in the next MKL release.

Thanks.

Zone: 

Thread Topic: 

Bug Report

libiomp5md.dll not found - VS2015 update 3

$
0
0

I have a fresh install of VS2015 Update 3, and MKL 2017.2.187.  It is a stand alone MKL license, I don't have the wider Intel compiler tools.

When I run my project, I get an error of a missing libiomp5md.dll.  Using process monitor, I can see that C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.2.187\windows\redist\ia32_win\mkl has been added to the PATH environment variable, but the missing dll is in a compiler sub-folder that isn't in the path.  The same problem with a 64-bit build as well.

This is a regression from MKL version 2017.0.109, which I was previously using.

I have worked around it by copying all the redist files into the mkl folder that is added to the path.  Is there a MSBuild file that should be fixed instead?

It seems similar to https://software.intel.com/en-us/comment/1890655#comment-1890655, something isn't setting up the PATH, and I guess it is because I don't have the Intel compiler installed.  Is this something that isn't covered in your testing?

Zone: 

Thread Topic: 

Bug Report

Weird multi-core scaling behaviour on Ivy Bridge-EP for MKL DGEMM

$
0
0

On IVB, it appears MKL DGEMM decides to run on only eight cores when it is told to run on nine. Behaviour on SNB, HSW, BDW is fine. I tried different IVB chips, but to no avail. When instructed to run on ten cores, all ten cores are used (so it doesn't appear to be a thread pinning issue). I've seen irregularities on IVB chips in RAPL reported power consumption when going from eight to nine to ten cores. Is this related and expected behaviour (i.e. an optimization) because DGEMM knows something I don't? Is there a workaround to force it to use nine cores?

Fast Helmholtz Solver : ipar documentation

$
0
0

Hello,

I am calling the MKL shared libraries (2017.2.187) fom the julia language in order to solve a 2D cartesian helmholtz equation (a poisson equation to be more specific). And I think I found some discrepancies in the documentation for the ipar Array (https://software.intel.com/en-us/node/522080).

Here is the julia code:

# julia code: www.julialang.org

# call [...]\mkl\bin\mklvars.bat intel64 ilp64  first to set the environment

  MKL_INT = Int32;

  r = linspace(0,1,132);
  z = linspace(0,2.5,136);

  nx = length(r);
  ny = length(z);

  in_ax = r[1];
  in_bx = r[nx];
  in_ay = z[1];
  in_by = z[ny];

  in_BCtype    = Array{UInt8}(4);
  in_BCtype[1] = UInt8('D');
  in_BCtype[2] = UInt8('N');
  in_BCtype[3] = UInt8('N');
  in_BCtype[4] = UInt8('N');

  in_q = 0.0;

  inout_ipar       = Array{MKL_INT}(128);
  inout_ipar[:]    = 123123123; #dummy value in order to identify unaltered entries
  inout_dpar       = Array{Float64}(Int32(5*nx/2+7));
  inout_dpar[:]    = 0.0;
  inout_stat       = 0;

  inout_f          = Array{Float64}(Int32((nx+1)*(ny+1)));
  inout_f[:]       = 0.0;
  in_bd_ax         = Array{Float64}(Int32(ny));
  in_bd_ax[:]      = 0.0;
  in_bd_bx         = copy(in_bd_ax);
  in_bd_ay         = Array{Float64}(Int32(nx));
  in_bd_ay[:]      = 0.0;
  in_bd_by         = copy(in_bd_ay);
  inout_xhandle    = Ptr{Any};

  ccall((:D_INIT_HELMHOLTZ_2D, "mkl_rt"),  # function and library
        Ptr{Void},              # ReturnType Void --> no return value
        (Ptr{Float64},  #ax     # ArgumentTypes as a Tuple
        Ptr{Float64},   #bx
        Ptr{Float64},   #ay
        Ptr{Float64},   #by
        Ptr{MKL_INT},   #nx
        Ptr{MKL_INT},   #ny
        Ptr{UInt8},     #BCtype
        Ptr{Float64},   #q
        Ptr{MKL_INT},   #ipar
        Ptr{Float64},   #dpar
        Ptr{MKL_INT},   #stat
        ),&in_ax,         #ax     # Arguments passed by reference&in_bx,         #bx&in_ay,         #ay&in_by,         #by&nx,            #nx&ny,            #ny
        in_BCtype,      #BCtype&in_q,          #q
        inout_ipar,     #ipar
        inout_dpar,     #dpar&inout_stat,    #stat
        );

  # print ipar array elements in a tabular manner with 0-based indices
  # as presented in the documentation
  println("Values of ipar after INIT call:")
  println("idx\tvalue")
  for i=1:length(inout_ipar)
	if inout_ipar[i]==123123123
		println("$(i-1)\tunset");
	else
		println("$(i-1)\t$(inout_ipar[i])");
    end
  end

  ccall((:D_COMMIT_HELMHOLTZ_2D, "mkl_rt"),
        Ptr{Void},              # ReturnType Void
        (Ptr{Float64},  #f      # ArgumentTypes as a Tuple
        Ptr{Float64},   #bd_ax
        Ptr{Float64},   #bd_bx
        Ptr{Float64},   #bd_ay
        Ptr{Float64},   #bd_by
        Ptr{Any},       #xhandle
        Ptr{MKL_INT},   #ipar
        Ptr{Float64},   #dpar
        Ptr{MKL_INT},   #stat
        ),
        inout_f,        # Arguments passed by reference
        in_bd_ax,
        in_bd_bx,
        in_bd_ay,
        in_bd_by,&inout_xhandle,
        inout_ipar,
        inout_dpar,&inout_stat,
        );


  # print ipar array elements in a tabular manner with 0-based indices
  # as presented in the documentation
  println("Values of ipar after COMMIT call:")
  println("idx\tvalue")
  for i=1:length(inout_ipar)
	if inout_ipar[i]==123123123
		println("$(i-1)\tunset");
	else
		println("$(i-1)\t$(inout_ipar[i])");
    end
  end

 

And this is the output:

Values of ipar after INIT call:
idx     value
0       0
1       1
2       1
3       unset
4       unset
5       unset
6       0
7       1
8       1
9       1
10      unset
11      unset
12      132
13      136
14      unset
15      unset
16      unset
17      unset
18      unset
19      unset
20      unset
21      unset
22      unset
23      1
24      1
25      unset
26      unset
27      unset
28      unset
29      unset
30      unset
31      unset
32      unset
33      unset
34      unset
35      unset
36      unset
37      unset
38      unset
39      unset
40      unset
41      unset
42      unset
43      unset
44      unset
45      unset
46      unset
47      unset
48      unset
49      unset
50      unset
51      unset
52      unset
53      unset
54      unset
55      unset
56      unset
57      unset
58      unset
59      unset
60      unset
61      unset
62      unset
63      unset
64      unset
65      unset
66      unset
67      unset
68      unset
69      unset
70      unset
71      unset
72      unset
73      unset
74      unset
75      unset
76      unset
77      unset
78      unset
79      unset
80      unset
81      unset
82      unset
83      unset
84      unset
85      unset
86      unset
87      unset
88      unset
89      unset
90      unset
91      unset
92      unset
93      unset
94      unset
95      unset
96      unset
97      unset
98      unset
99      unset
100     unset
101     unset
102     unset
103     unset
104     unset
105     unset
106     unset
107     unset
108     unset
109     unset
110     unset
111     unset
112     unset
113     unset
114     unset
115     unset
116     unset
117     unset
118     unset
119     unset
120     unset
121     unset
122     unset
123     unset
124     unset
125     unset
126     unset
127     unset

 

Values of ipar after COMMIT call:
idx     value
0       0
1       1
2       1
3       unset
4       unset
5       unset
6       0
7       1
8       1
9       1
10      unset
11      unset
12      132
13      136
14      unset
15      6
16      138
17      unset
18      unset
19      139
20      337
21      unset
22      unset
23      1
24      1
25      unset
26      unset
27      unset
28      unset
29      unset
30      unset
31      unset
32      unset
33      unset
34      unset
35      unset
36      unset
37      unset
38      unset
39      unset
40      132
41      1
42      1
43      unset
44      unset
45      2
46      0
47      1
48      1
49      1
50      0
51      unset
52      unset
53      unset
54      unset
55      unset
56      unset
57      unset
58      unset
59      unset
60      unset
61      unset
62      unset
63      unset
64      unset
65      unset
66      unset
67      unset
68      unset
69      unset
70      unset
71      unset
72      unset
73      unset
74      unset
75      unset
76      unset
77      unset
78      unset
79      unset
80      unset
81      unset
82      unset
83      unset
84      unset
85      unset
86      unset
87      unset
88      unset
89      unset
90      unset
91      unset
92      unset
93      unset
94      unset
95      unset
96      unset
97      unset
98      unset
99      unset
100     unset
101     unset
102     unset
103     unset
104     unset
105     unset
106     unset
107     unset
108     unset
109     unset
110     unset
111     unset
112     unset
113     unset
114     unset
115     unset
116     unset
117     unset
118     unset
119     unset
120     unset
121     unset
122     unset
123     unset
124     unset
125     unset
126     unset
127     unset

And it seems to me that the following issues arise:

  • Not only is ipar[3] unused (as stated in the docs) but also ipar[4] and ipar[5], because the values for my selected boundary conditions are stored in ipar[6-9] and not ipar[4-7] as stated in the docs. This increases the index of all following information by 2. From now on, I will use the "new" indices.
  • The value in ipar[16] equals ipar[15]+ipar[12]+0  and not ipar[15]+ipar[12]+1 as stated in the docs.
  • In the tabular description of indices 17 to 22 for the 2D Cartesian case:
    • ipar[21] is unset, but the docs state that it should be ipar[20]+1. But maybe this will only be set in the 2D_HELMHOLTZ procedure and might not be an issue
    • the docs state, that ipar[22] should be ipar[21]+3*ipar[14]/4 , but ipar[14] is unset because nz is not used for the 2D case. So the table entry for the 2D Cartesian case should state "unused" for ipar[22]

Greetings,

Lars

Thread Topic: 

Bug Report

Intel® MKL 2018 Beta is now available

$
0
0

Intel® MKL 2018 Beta is now available as part of the Parallel Studio XE 2018 Beta.

Check the Join the Intel® Parallel Studio XE 2018 Beta program post to learn how to join the Beta program, and the provide your feedback.

What's New in Intel® MKL 2018 Beta:

  • DNN:
    • Added initial convolution and inner product optimizations for Intel(R) Xeon Phi(TM) processors based on Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups.
    • Average pooling has an option to include padding into mean values computation
  • BLAS Features:
    • Introduced optimized integer matrix-matrix multiplication routines (GEMM_S16S16S16 and GEMM_S16S16S32) to work with quantized matrices for all architectures.
    • Introduced ?TRSM_BATCH to complement the batched BLAS for all architectures
  • BLAS Optimizations:
    • Optimized SGEMM, GEMM_S16S16S16 and GEMM_S16S16S32 for Intel(R) Xeon Phi(TM) processors based on Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups
    • Improved ?GEMM_BATCH performance for all architectures
    • Improved single and multi-threaded {D,S}SYMV performance for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and the Intel® Xeon Phi™ processor x200
  • Sparse BLAS:
    • Improved performance of CSRMV/BSRMV functionality for Intel® AVX-512 instruction set in Inspector-Executor mode
  • LAPACK:
    • Introduced factorization and solve routines based on Aasen's algorithm: ?sytrf_aa/?hetrf_aa, ?sytrs_aa/?hetrs_aa
  • Vector Mathematics:
    • Added 24 new functions: v?Fmod, v?Remainder, v?Powr, v?Exp2; v?Exp10; v?Log2; v?Logb; v?Cospi; v?Sinpi; v?Tanpi; v?Acospi; v?Asinpi; v?Atanpi; v?Atan2pi; v?Cosd; v?Sind; v?Tand; v?CopySign; v?NextAfter; v?Fdim; v?Fmax; v?Fmin; v?MaxMag; v?MinMag
  • Library Engineering:
    • Introduced support for Intel(R) Xeon Phi(TM) processors based on Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups.

Optimizations are not dispatched unless explicitly enabled with mkl_enable_instructions function call or MKL_ENABLE_INSTRUCTIONS environment variable.

  • Documentation: 
  • Hardware Support for Intel® Xeon Phi™ Coprocessors (code name Knights Corner) is removed. Customers are recommended to stay on MKL 2017 given they continue to use and develop for Intel® Xeon Phi™  Coprocessors (aka Knight Corner)

3D Interpolation?

$
0
0

Hello,

I'm interested in interpolating large 3D data sets (let's say, for example, 200 X 200 X 2000 =80,000,000 data points). That is, given a set of coordinates (x_i,y_j,z_k) defining a Cartesian mesh in 3D, with known function values on the mesh f(x_i, y_j, z_k), I'd like to calculate the function values f(x,y,z) at arbitrary points in space within the computation volume. I've noticed that the 3D data processing functions in IPP have this functionality built in. I've also noticed that MKL has 1D algorithms built in. Is there a way to call the interpolation functions used in IPP by themselves? Or, is 3D interpolation built into MKL somewhere? If someone has an example of using an Intel library to interpolate a simple function like sin(x*y*z), or knows where to find such an example, I would greatly appreciate it. Please help, I'm totally stumped at this point.

Thank you for your time,

Ryan H

When I_MPI_FABRICS=shm, the size of MPI_Bcast can't larger than 64kb

$
0
0

I run MPI with a single workstation( 2 x E5 2690).

When I export I_MPI_FABRICS=shm, the size of MPI_Bcast can't larger than 64kb.

But when I export I_MPI_FABRICS={shm,tcp}, everything is ok.

Are there some limit for shm? Can I adjust the limit?

Thread Topic: 

Question
Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>