Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

Different results getrf/getrs, dss and intel pardiso

$
0
0

Dear all,

I have a small nonsymmetric linear system that is represented by a matrix in csr format (file fort.106). The task is to solve the system. To this end, I applied three different approaches. At first, I transformed the three csr-vectors to a dense matrix with the help of mkl_ddnscsr. Using getrf/getrs (methbutton=6) solves the system and produces reasonable results. Using the sparsity of the system, I applied intel dss (methbutton=7). However, the results obtained with this method differ from the results of getrf/getrs far beyond machine precision. Going one step further to intel pardiso (methbutton=8), produces:

forrtl: severe (174): SIGSEGV, segmentation fault occurred

Things, I have tried to avoid the problems:

-https://software.intel.com/en-us/articles/determining-root-cause-of-sigs...

-checked the sparse matrix with sparse matrix checker routines (no error)

In principle, I would have said that my matrix is the problem. However, then I would guess that getrf/getrs doesn't work either. However, since it does work, I guess the solvers are somehow the issue.

You can find my code attached (2Modes.f90). The vectors representing the matrix can be found in fort.106. The program automatically reads the vector, so you can just compile and run it. Compiling works fine with ifort -o 2Modes.out 2Modes.f90 ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -i8 -I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include .

The methods can be switched with the methbutton in line 41. I tried to keep the code as simple as possible. In principle, I have extracted the code from the examples which intel provides. It would be nice, if someone could take a look at this. Thank you in advance.

Best,

Horst K.

AttachmentSize
Downloadapplication/zipCode.zip34.88 KB

Intel MKL ERROR: Parameter 4 was incorrect on entry to DSTEIN2

$
0
0

Hi,

   Some users are seeing the following error messages when running VASP, built against Intel 2018.0.3.

Intel MKL ERROR: Parameter 4 was incorrect on entry to DSTEIN2.
Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2

The behavior was noticed as the rank count increased, and https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17897.html also suggests that this behavior might be expected as rank count exceeds some level.

Is this expected behavior from MKL, and if so, what is the guidance on MPI rank count vs. matrix property (size?)?

Thanks

Performance characteristics of cblas_gemm_s16s16s32

$
0
0

Hi,

I'm interested to get more details on the performance characteristics of the function cblas_gemm_s16s16s32. In my application, the performance gain over cblas_sgemm is lower than I would hope.

Here is my test configuration, which is larger than what would typically be used in my application (a seq2seq model):

CblasColMajor

M = 1024
K = 512
N = 2048

TRANS_A = FALSE
TRANS_B = TRUE

And here are some single threaded results on a Intel(R) Core(TM) i7-6700K (AVX2), averaged over 1000 samples:

* cblas_sgemm: 17.7135 ms
* cblas_gemm_s16s16s32: 15.5617 ms

Are these values expected? Do I need to do something specific to get more performance out of cblas_gemm_s16s16s32?

Thanks,

Guillaume

Random Number Generator

$
0
0

Hi,

I have developed a function to generate a vector of random numbers. However, there are two problems:

1. When the number of random numbers to be generated is too large (e.g. 50000^2), it will output: MKL ERROR: Parameter 3 was incorrect on entry to vdRngUniform.

2. When the left bound is a positive number (i.e. a = 300 b = 100), the function violates the left bound condition and returns minimum value of zero! However, if 'a' is a negative number, it works correctly!

Can you please help? Thank you.

Afshin

 

int RNG_UNIF ( long int N, double a, double b, double *P )
{
    // N number of random values to be generated
    // a is the left bound
    // b is the right bound
	
    VSLStreamStatePtr stream;
    int errcode = 0;
    srand(time(0));    
    long seed = rand();
   	
    /***** Initialize *****/
    errcode = vslNewStream( &stream, VSL_BRNG_MT2203, seed ); 
    if (errcode != 0) goto err;
    /***** Call RNG *****/    	
    errcode = vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, stream, N, P, a, b );
    if (errcode != 0) goto err;

    vslDeleteStream(&stream);
   
    err:
    return errcode; 	

}

 

Package script bug: link_install.sh and tr -s [:blank:]

$
0
0

Having installed the package intel-comp-l-all-vars-19.0.1-144 via apt on Ubuntu, I've run into a problem with the script at /opt/intel/compilers_and_libraries_2019.1.144/linux/bin/link_install.sh which is run as part of the installation process.  On my system this script fails and prevents apt from completing.

This is because it contains a few instances of lines like:

str=$(echo $str | tr -s [:blank:] | sed 's/^ *//g')

The problem here is the

[:blank:]

On most systems it will do the right thing -- but square brackets actually denote a bash glob.  So if it happens to match files called e.g. "b", "l" or "a" (in your root directory, from where apt runs the script), bash will substitute those in place of [:blank:] and tr will get the wrong arguments.

I happen to have both a /a and a /n on my systems, so what gets run is "tr -s a n" which substitutes all "a"s for "n"s... definitely not what the author intended!  The symptom is lines such as

/opt/intel/compilers_and_libraries_2019.1.144/linux/bin/link_install.sh: line 565: =/opt/intel/compilers_nd_librnries: No such file or directory

output by apt (or dpkg) before the installation aborts.

Simple fix: put every instance of [:blank:] in quotes: "[:blank:]".

mkl/blas routine for C=AA'B

$
0
0

Hi there,

is there any mkl/blas function which performs the operation C=AA'B in on go. Currently I use an intermediate array T and dgemm: T=A'B;C=AT'. I am wondering whether there is a more efficient way since A is always the same matrix.

Thanks.

eigen value and eigen vector

$
0
0

I'm working on quantum structures, so I need to calculate eigenvalues and eigenvalues in a very precise drawing, I want to work with values such as n = 10000 and above.   Mkl library that I use it is not very favorable to offer an alternative program. 4 core 3.6 ghz 12gb ram on laptop,
I also have a 200-core 120 gb ram li host computer.

my code 

c       use imsl
      INCLUDE 'LINK_FNL_STATIC.H'
      USE EVESB_INT
C      USE EVESF_INT
c      USE EPISF_INT
c      USE EPISB_INT
C      USE CSDER_INT 
C      USE CSINT_INT 
 
         IMPLICIT NONE
      INTEGER I,II,J,K,L,M,N,NDATA,NINTV,LDA,LDEVEC,NCODA,NEVAL,
     $NEVEC,INT_TIME,IK,MXEVAL
      PARAMETER (M=101,N=M*M,NCODA=M,NDATA=N,LDA=N,LDEVEC=N
     $,NEVEC=4,MXEVAL=4)
C=======================================================================        
    REAL*8 A(LDA,N),ALPHA,AALPHA,PI,BREAK(NDATA),AA,BB,OTOP1U,SEBIN,
     $XX,YY,ZZ,EPSILON,LAMDA,DLAMDA,RRO,VB(N),PSO(N),VDC(M,M),C,VO,
     $VVO,DU,RO1RO2,RO3,RO4,A1,A2,A3,A4,B1,B2,B3,B4,PSU(M,M),
     $PSIN(M,M),PSF(M,M),DRO,TOP,RO,EM,VL(N),DZ,AYIL,RYIL,F,ETA,B,GAMA,
     $RR,U,H,XI,YI,KZZ,INTEN,HPLANCK,VVVO,DX,DY,X(M),Y(M),MY,RRIC,RIC,KZ
     $,P,EO,EB,EIK,EUC,VM(N),TOPKISI,KISIBIR,KISIUC,EPS,FXSU,FXOU,OTOPXU
     $,TOPXU,BETA1,BETA3,TOPBETA,TOPYU,M12,HW,T,EF,EIN,BETA3U,BETA3A,BET
     $A1U,BETA1A,E,FXOA1,FXSA1,FXOA2,FXSA2,NR,R,OTOPXA1,OTOPXA2,TOPXA1,T
     $OPXA2,TOPYA1,TOPYA2,TOPSON,EPSO,EVAL(NEVEC),EVEC(LDEVEC,NEVEC),F1
     $U,TOP1U,F1SU,OSI,VS(N),EYUKU,SIGMA,TZAMAN,CISIK,KISIBIRA,KISIBIR
     $U,KISIUCU,KISIUCA,RR1,R1,RR2,R2,TTB,TB,PII,XLAMDA,DXLAMDA,SAY,OTOP
     $XA,TOPYA,OEBIN,FK0,FK1,TOPXA,FXSA,ATA,FXOA,PS,RO1,RO2,FXSA3,
     $TOPXA3,OTOPXA3,OTOP1OU,OTOP2OU,F2U,F2SU,OTOP2U,M11,M22,KS1U,
     $KS31A,KS31U,KS32U,KS33U,KS34U,KS32A,KS33A,TOPKS,TOP2U,BETA31U,
     $BETA32U,BETA33U,BETA31A,KS1A,KS1,KS3
    REAL*8 OTOPYU,OTOPYA,TOPZU,TOPZA,Z,LL,LA,OTOPYA3,TOPYA3,FXOA3
C=======================================================================    
      LOGICAL SMALL
      CHARACTER*8 CHAR_TIME
      CALL TIME(CHAR_TIME)
      WRITE(*,*)'TIME1=', CHAR_TIME
    PI=4.0D0*DATAN(1.0D0)
C=======================================================================
C      OPEN(1,FILE='ALGAAS k  L60.DAT')
c    OPEN(2,FILE='ALGAAS s-PISI_L20.DAT')
c      OPEN(3,FILE='ALGAAS silindir A-E.DAT')
c    OPEN(4,FILE='ALGAAS PISILER DELTOID00t70.DAT')
c    OPEN(5,FILE='AlGaAs  r2 B VE ENERJI M1 KZ2.DAT')
C    OPEN(6,FILE='GAALAS IC BARIYER VE TABAN ENERJI.DAT')
c    OPEN(7,FILE='GAALAS  BAGLANMA ENERJI-TBB  LAZER 00KARE XIYI00.DAT')
    OPEN(8,FILE='AlGaAs 1-2 BETA INTEN03 ALFA 60 k.DAT')
    OPEN(9,FILE='AlGaAs 1-2 K_INDEX INTEN03 ALFA 60 k.DAT')

C%%%%%%%%%%%%%%%%%%%%%%%%%%%% SABITLER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
      INTEN=0.300000E10     !MEGA WATT/ CMKARE----> METREYEKAREYE CEVRILIYORc
      AALPHA=60.0D0 !LAZER GENLIĞI (ANG)
      LL=105.0
C=================================ALGAAS===================================
      MY=0.06650D0
    EPSILON=10.9      !     13.18  10.90D0 
C****************** ALGAN************************************************
C      MY=0.13D0
C      EPSILON=9.7      !YÜKSEK FREKANS 5.3!STATIK9.7 
C===================GaINAS ===============================================
c      MY=0.023+0.037*0.3+0.003*(0.3)**2
C      EPSILON=15.1-2.87*0.3+0.67*(0.3)**2!STATIK
c       EPSILON=12.3-1.4*0.3 !YUKSEK
C==========================================================================      
    RYIL=(13605.698110D0*MY/(EPSILON**2))
    AYIL=0.52917724820D0*EPSILON/MY
          
C==================POT=================================================
c      VO=228000000000000.0D0/RYIL  !ALGAAS IçIN
      VO=228.0/RYIL !DİKKAT
C      VO=345.0D0/RYIL  !ALGAN  IçIN
c      VO=227.70D0/RYIL !GAINAS IçIN
c      DO 5555 TTB=5.0,200,5.0
c    DO 5555 AALPHA=0,100,5.0
c      DO 5555 B=0.0D0,20.0D0,1.0D0 
C     DO 5555 RRIC=50.0,150.0,5.0
C     DO 5555 VVVO=100.0,300.0,10.0
      
      F=00.0D0      !ELEKTRIK ALAN ŞIDDETI (KV/CM)
    B=0.0D0       !MANYETIK ALAN ŞIDDETI (TESLA)
    
    XI=0.0000001D0      !YAB. AT. KONUMU
    YI=0.0000001D0      !YAB. AT. KONUMU
    RR=220.0012345670D0    !Dış GENIşLIK
    RRIC=50.00D0   !Iç KUYU GENIşLIğI
      TTB=50.0
C======================= OPTIK GEçIş KATSAYıLARı ======================
      EYUKU=1.60217733E-19!DSQRT(2.0D0) !COULOMB
      SIGMA=3.0E22        !M-^3 TAŞIYICI YOĞUNLUĞU  için

      
      TZAMAN=5E12         !PIKO SANIYE SANIYEYE CEVRILIP CARPIM DURUMUNDA
C      TZAMAN=(1.0/1.5)*1E12         !Algan
      CISIK=2.99792458E8  !METRE/SANIYE
      EPSO=8.854187817E-12!C^2/(NEWTON.METREKARE)
      NR=3.2!DSQRT(EPSILON)
      HPLANCK=1.05457266E-34!J.SANIYE
C******************  DONUSUMLER  **************************************
C**********************************************************************
    ALPHA=AALPHA/AYIL
      LA=LL/AYIL
    RIC=RRIC/AYIL    
    R=RR/AYIL
      RR1=40.0
      RR2=150.0
      R1=RR1/AYIL
      R2=RR2/AYIL
      TB=TTB/AYIL
C     VO=VVVO/RYIL
      KZ=0.0D0/(Ric)!DALGA SAYıSı 
    EM=0.0D0      !AZIMUTHAL MAGNETIK ALAN
C======================================================================
    ETA=0.010D0*AYIL*F/RYIL
      GAMA=4.254381195E-6*EPSILON*EPSILON*B/(MY*MY)
    DX=(2.0D0*R)/REAL(M-1)
    DY=(2.0D0*R)/REAL(M-1)    
         DZ=(2.0D0*R)/REAL(M-1)    
      DRO=R/REAL(M-1)
    AA=4.0D0/(DX*DX)
    BB=-1.0D0/(DX*DX)
C***********************************************************************
    PRINT*,'EPSILON:',EPSILON,'EPSILON=10,89 ISE LAZER AKTIF'
      PRINT*,'RYIL:',RYIL
      PRINT*,'MY:',MY
    PRINT*,'AALPHA',AALPHA,'ANGUSTRON' 
      PRINT*,'B=',B
      PRINT*,'M=',EM
      PRINT*,'KZ=',KZ
      PRINT*,'RRIC',RRIC
      PRINT*,'GAMMA=',GAMA
      PRINT*,'VO=',VO*RYIL
      PRINT*,'INTEN=',INTEN
      
C************** AZIMUTHAL MAGNETIK ALAN  *******************************
    II=1 
                DO K=1,M
              X(K)=-R+REAL(K-1)*DX
      IF(ABS(X(K)).LE.0.000000001) GO TO 32
32              DO L=1,M  
                Y(L)=-R+REAL(L-1)*DY
      IF(ABS(Y(L)).LE.0.0000000001) GOTO 33
                RO=DSQRT(X(K)*X(K)+Y(L)*Y(L))
                IF(RO.LT.RIC)THEN
      VB(II) =((EM*EM/(RO*RO))+KZ*KZ-(GAMA*RO*RO*KZ/RIC)
     $+(GAMA*GAMA*(RO**4)/(4*(RIC**2))))
      VS(II)=0.0!0.25*GAMA*GAMA*RO*RO
                ELSE 
      VB(II)=0.0!(EM*EM/(RO*RO))+KZ*KZ+2*KZ*GAMA*RIC*LOG(RIC/RO)+
C     $GAMA*GAMA*RO*RO*(LOG(RIC/RO)**2)
      VS(II)=0.0!0.25*GAMA*GAMA*RO*RO
                END IF
33      II=II+1  
                   END DO 
                 END DO   
C********************* LAZER GIYDIRILIYOR ******************************
    PRINT*,'LAZER GIYDIRILIYOR'
    II=1
      DO 999 L=1,M
        YY=Y(L)
             DO 888 K=1,M
           XX=X(K)
C=======================================================================      
             TOP=0.0D0
             DU=0.0010D0      
      DO U=0.0D0,2.0D0*PI,DU  
          TOPSON=VVO(XX+ALPHA*DSIN(U),YY,RYIL,AYIL,R1,R2,TB,RIC,LA,VO)
             TOP=TOP+(TOPSON)*DU  
      END DO 
             TOP=TOP/(2.0D0*PI)
C=======================================================
    VDC(K,L)=TOP
         VL(II)=VDC(K,L)
      VM(II)=VL(II)+VB(II)+VS(II)
       II=II+1
      WRITE(1,19)X(K)*AYIL,Y(L)*AYIL,VDC(K,L)*RYIL
888    CONTINUE 
999    CONTINUE
     PRINT*, 'KUYU TANIMLANDI', 'VB'

19      FORMAT(3(2X,F14.8))
18      FORMAT(5(2X,F14.8))
177     FORMAT(4(2X,F14.8))
C*********************  MATRIS *****************************************
      A=0.0D0                      !AMATRIS BOLOK(2M+1,N=M*M)
      DO L=M+1,M*M       
      A(1,L)=BB             
      ENDDO

            DO L=2,M*M       
            A(M,L)=BB             
            ENDDO
                DO L=M+1,M*M-1,M       
                   A(M,L)=0.0D0             
                    ENDDO

                DO L=1,M*M       
                A(M+1,L)=AA+Vm(L)             
                ENDDO
C                        DO L=1,M*M-1       
C                        A(M+2,L)=BB             
C                        END DO
C                                DO L=1,M*M-M       
C                                A(2*M+1,L)=BB             
C                                ENDDO
C***********  MATRIS EKRANA YAZDıRıLıYOR *******************************
c     DO 40 K=1,M+1 
c      WRITE(*,'(1X,6(F6.1))') (A(K,L),L=1,m)
c      WRITE(1,'(1X,6(F6.1))') (A(K,L),L=1,m)
c40    CONTINUE
c      PAUSE
c    STOP
C***********************************************************************
      SMALL =.TRUE.
      CALL DEVESB(N,NEVEC,A,LDA,NCODA,SMALL,EVAL,EVEC,LDEVEC)
C      CALL DEVESF (N, NEVEC, A, LDA, SMALL, EVAL, EVEC, LDEVEC)
C      PRINT*,'PERFORMANS INDEX=',PII
C      PII=EPISB(NEVEC,A,NCODA,EVAL,EVEC)
c      PII= EPISF(NEVEC,A,EVAL,EVEC)
      CALL TIME(CHAR_TIME)
      WRITE(*,*)'TIME2=', CHAR_TIME
            
    EO=(EVAL(NEVEC))*RYIL       !TABAN DURUM MEV CINSINDEN
    EB=(EVAL(NEVEC-1))*RYIL     !1.UYARıLMıS DURUM MEV CINSINDEN
    EIK=(EVAL(NEVEC-2))*RYIL    !2.UYARıLMıS DURUM MEV CINSINDEN
      EUC=(EVAL(NEVEC-3))*RYIL    !3.UYARıLMıS DURUM MEV CINSINDEN
C=======================================================================
    II=1
    DO L=1,M
        DO K=1,M
          PSIN(K,L)=EVEC(II,NEVEC)*AYIL*1E-10 !PISILER METRE BOYUTUNDA 
          PSF(K,L)=EVEC(II,NEVEC-1)*AYIL*1E-10 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
            PSU(K,L)=EVEC(II,NEVEC-2)*AYIL*1E-10
            PSO(II)=EVEC(II,NEVEC)
      WRITE(4,511)X(K)*AYIL,Y(L)*AYIL,PSIN(K,L)/1E-10,PSF(K,L)/1E-10,
     $EVEC(II,NEVEC-2)*AYIL*1E-10/1E-10
          II=II+1
          ENDDO
    ENDDO
    PRINT*, 'EO=',EO,'MEV'
    PRINT*, 'E1=',EB,'MEV'
    PRINT*, 'E2=',EIK,'MEV'
      PRINT*, 'E3=',EUC,'MEV'
    WRITE(3,188)AALPHA,EO,EB,EIK,EUC
c      WRITE(5,188)B,EO,EB,EIK,EUC
c       WRITE(5,*)B,EO
C     WRITE(6,*)RRIC,EO
    EF=EB
    EIN=EO
   
C*********************   M12 HESABI   ********************************
       OTOPXU=0.0D0
       OTOP1OU=0.0D0
       OTOP2OU=0.0D0
      OTOPXA1=0.0D0
      OTOPXA2=0.0D0
        OTOPXA3=0.0
            TOPYU=0.0D0
             TOP1U=0.0D0
              TOP2U=0.0D0
            TOPYA1=0.0D0
            TOPYA2=0.0D0
              TOPYA3=0.0
     DO 400 L=1,M
     YY=Y(L)
C===================INTEGRALIN X KıSMı BAşLıYOR======================
          FXOU=0.0D0
           F1U=0.0D0
           F2U=0.0D0
           
          FXOA1=0.0D0
          FXOA2=0.0D0
            FXOA3=0.0
              TOPXU=0.0D0
                TOP1U=0.0D0
                TOP2U=0.0D0
              TOPXA1=0.0D0
               TOPXA2=0.0D0
               TOPXA3=0.0D0
         II=1
               RO=DSQRT(XX*XX+YY*YY)
               
       DO 300 K=1,M
     XX=X(K)*AYIL*1E-10 !METREYE CEVIRDIK 
       yy=y(K)*AYIL*1E-10
      FXSU=PSIN(K,L)*xx*PSF(K,L) !X YONDE POLRIZASYON ıKEN XX Y ıSE YY KULLAN 
      F1SU=PSIN(K,L)*xx*PSIN(K,L)
      F2SU=PSF(K,L)*xx*PSF(K,L)
       
     FXSA1=PSIN(K,L)*PSIN(K,L)
     FXSA2=PSF(K,L)*PSF(K,L)
       FXSA3=PSU(K,L)*PSU(K,L)
       
         TOPXU=TOPXU+(FXSU+FXOU)*(DX)/2.0D0
          TOP1U=TOP1U+(F1SU+F1U)*DX/2.0D0
          TOP2U=TOP2U+(F2SU+F2U)*DX/2.0D0
            TOPXA1=TOPXA1+(FXSA1+FXOA1)*(DX)/2.0D0
            TOPXA2=TOPXA2+(FXSA2+FXOA2)*(DX)/2.0D0
              TOPXA3=TOPXA3+(FXSA3+FXOA3)*(DX)/2.0D0
                FXOU=FXSU
                F1U=F1SU
                 F2U=F2SU
                    FXOA1=FXSA1
                    FXOA2=FXSA2
                      FXOA3=FXSA3
300    CONTINUE
C==================== X KıSMı BITTI ==============================
          TOPYU=TOPYU+(OTOPXU+TOPXU)*(DY)/2.0D0
          TOP1U=TOP1U+(OTOP1U+TOP1U)*DY/2.0D0
           TOP2U=TOP2U+(OTOP2U+TOP2U)*DY/2.0D0
         TOPYA1=TOPYA1+(OTOPXA1+TOPXA1)*(DY)/2.0D0
         TOPYA2=TOPYA2+(OTOPXA2+TOPXA2)*(DY)/2.0D0
           TOPYA3=TOPYA3+(OTOPXA3+TOPXA3)*(DY)/2.0D0
            OTOPXU=TOPXU
            OTOP1U=TOP1U
            OTOP2U=TOP2U
               OTOPXA1=TOPXA1
               OTOPXA2=TOPXA2
                 OTOPXA3=TOPXA3
400    CONTINUE
C%%%%%%%%%%%%%%% NORMALIZE PISI %%%%%%%%%%%%%%%%%%%%%%%%%%%%
    DO K=1,M
    DO L=1,M
    WRITE(2,17)X(K)*AYIL,Y(L)*AYIL,(PSIN(K,L)/DSQRT(TOPYA1))**2,
     $(PSF(K,L)/DSQRT(TOPYA2))**2,(PSU(K,L)/DSQRT(TOPYA3))**2,VDC(K,L)
     $*RYIL    
    ENDDO
    ENDDO
      PRINT*, 'TOPYA3',TOPYA3
      PRINT*, 'TOPYA2',TOPYA2
      PRINT*, 'TOPYA1',TOPYA1
      M12=(TOPYU*EYUKU)/(DSQRT(TOPYA1)*DSQRT(TOPYA2))
      M11=(TOPYU*EYUKU)/(DSQRT(TOPYA1)*DSQRT(TOPYA1))
      M22=(TOPYU*EYUKU)/(DSQRT(TOPYA2)*DSQRT(TOPYA2))
     
c      OSI=(2*MY*9.1093897E-31/(HPLANCK**2))*((EB-EO)*1.6021773E-22)*
c     $(((TOPROU)/(DSQRT((TOPYA1))*DSQRT((TOPYA2))))**2)
c    PRINT*, 'M12=',M12
c      WRITE(13,*)B,OSI
C     WRITE(10,*)B,VS(II)
C     PRINT *, 'B=',B, OSI
C     PAUSE
C      STOP
C//////////////////////   OPTIK GECIS   ///////////////////////////////
C########### 1. VE 3. DERECE ABSORTSIYON KATSAYISI ####################

C######################################################################
C########### 1. VE 3. DERECE ABSORTSIYON KATSAYISI ####################
      DO 4444 HW=0.0D0,250.0D0,1.0D0 
      BETA1U=SIGMA*(HW*1.6021773E-22)*(M12*M12)*TZAMAN
      BETA1A=CISIK*EPSO*NR*(((((EF-EIN-HW)*1.6021773E-22)**2))+
     $(HPLANCK*TZAMAN)**2)
      

           BETA1=BETA1U/BETA1A
   
      BETA3U=INTEN*2.0*SIGMA*((M12)**4)*(HW*1.6021773E-22)*TZAMAN
      BETA31U=DABS((M22-M11)/(2.0*M12))**2
      BETA32U=(((EF-EIN-HW)*1.6021773E-22)**2)-(HPLANCK*TZAMAN)**2
      BETA33U=2*((EF-EIN)*1.6021773E-22)*((EF-EIN-HW)*1.6021773E-22)
     
      BETA3A=CISIK*CISIK*EPSO*EPSO*NR*NR*((((((EF-EIN-HW)*1.6021773E-22)
     $)**2)+(HPLANCK*TZAMAN)**2)**2)
      BETA31A=((EF-EIN)*1.6021773E-22)**2+(HPLANCK*TZAMAN)**2
      
      
      BETA3=-(BETA3U/BETA3A)*(1-(((BETA31U)*(BETA32U+BETA33U))/BETA31A))
    
    TOPBETA= BETA1+BETA3
      
C***************************************************************
C*************** DIREK ABSORTSION KATSAYISI ********************
    KS1U=SIGMA*((EF-EIN-HW)*1.6021773E-22)*((M12)**2)
      KS1A=2.0*NR*NR*EPSO*((((EF-EIN-HW)*1.6021773E-22)**2
     $)+(HPLANCK*TZAMAN)**2)
     
        KS1=KS1U/KS1A
      
      KS31U=INTEN*SIGMA*((EF-EIN-HW)*1.6021773E-22)*((M12)**4)
      KS31A=NR*NR*NR*EPSO*EPSO*CISIK*((((EF-EIN-HW)*1.6021773E-22)**2
     $+(HPLANCK*TZAMAN)**2)**2)
      
      KS32U=DABS((M22-M11)/(2.0*M12))**2
      KS33U=((EF-EIN)*1.6021773E-22)*((EF-EIN-HW)*1.6021773E-22)**2
      KS34U=((HPLANCK*TZAMAN)**2)*(3*((EF-EIN)*1.6021773E-22)-2*
     $(HW*1.6021773E-22))
      KS32A=((EF-EIN)*1.6021773E-22)**2+(HPLANCK*TZAMAN)**2
      KS33A=(EF-EIN-HW)*1.6021773E-22
      
      KS3=-(KS31U/KS31A)*(1-KS32U*((KS33U-KS34U)/(KS32A*KS33A)))
      
    TOPKS=KS1+KS3

      
c    PRINT*,'TOPKISI',TOPKS     
      WRITE(8,51)HW,BETA1/1e4,BETA3/1e4,TOPBETA/1e4
    WRITE(9,51)HW,Ks1,Ks3,TOPKS
c      WRITE(12,*)INTEN/1E7,TOPBETA/1E2
4444  CONTINUE !HW FOTON ENERJISI DöNGUSU    , ALPHA DONGUSU
C################## YABANCI ATOM #########################################
       
C        XLAMDA=0.1
C       DXLAMDA=0.010D0
C       SAY=0.
C       OEBIN=-1.0D30
C150      CONTINUE       
C=====================================================integralin Z kısmı başlıyor==========
C     OTOPYU=0.0D0 
C        OTOPYA=0.0D0
C
C        TOPZU=0.0D0
C        TOPZA=0.0D0 
C        DZ=0.10D0

C        DO 500 Z=-R,R,DZ
C        IF(ABS(Z).LE.0.000000010D0)GOTO 500
C
C
C

C=====================================================integralin y kısmı başlıyor==========
C        OTOPXU=0.0D0 
C        OTOPXA=0.0D0
C        TOPYU=0.0D0
C     TOPYA=0.0D0 
C          II=1
C       
C          DO 400 J=1,M       
C          YY=Y(J)
C====================================================integralin x kısmı başlıyor===========
C          FXOU=0.
C          FXOA=0.
C          TOPXU=0.
C          TOPXA=0.
C           
C
C     DO 300 I=1,M
C     XX=X(I)
C            RO1=DSQRT((XX-XI+ALPHA)**2+(YY-YI)**2+Z*Z)
C              RO2=DSQRT((XX-XI-ALPHA)**2+(YY-YI)**2+Z*Z)
C            PS=EVEC(II,NEVEC)*DEXP(-DABS(RO1)+DABS(RO2)/(2.0D0)*XLAMDA)

C              ATA=((1.0D0/RO1)+(1.0D0/RO2))/2.0D0
C              FXSU=(PS*ATA*PS)
C            FXSA=(PS*PS)
    
C               TOPXU=TOPXU+(FXSU+FXOU)*DX/2.0D0
C                 TOPXA=TOPXA+(FXSA+FXOA)*DX/2.0D0

C     II=II+1
    
C          FXOU=FXSU
C          FXOA=FXSA
C            WRITE(*,*)'RO1',RO1
C            WRITE(*,*)'RO2',RO2
C            WRITE(*,*)'PS',PS
C            WRITE(*,*)'ata',ata
C         WRITE(*,301)Z,XX,YY,TOPXA,TOPXU
C300    CONTINUE
C301    FORMAT(5(2X,F10.6))       
C=================================================== x kısmı bitti=========================

C           TOPYU=TOPYU+(OTOPXU+TOPXU)*DY/2.0D0
C         TOPYA=TOPYA+(OTOPXA+TOPXA)*DY/2.0D0
      
C         OTOPXU=TOPXU
C         OTOPXA=TOPXA   

C400    CONTINUE
C=================================================== y kısmı bitti=========================
                 

C           TOPZU=TOPZU+(OTOPYU+TOPYU)*DZ/2.0D0
C         TOPZA=TOPZA+(OTOPYA+TOPYA)*DZ/2.0D0
      
C         OTOPYU=TOPYU
C         OTOPYA=TOPYA   

C500    CONTINUE
C=================================================== Z kısmı bitti=========================

C        SEBIN=-(1.0D0/XLAMDA**2.)+2.0D0*(TOPZU/TOPZA) !bağlanma enerjisiC
C       WRITE(*,*)XLAMDA,SEBIN,SAY
           
C       PAUSE
C       STOP

C====================================bağlanma enerjisi için hassaslaştırma yapılıyor=======
C       IF(SEBIN.LT.OEBIN)THEN               
C                IF(SAY.GT.5)GO TO 250
C                DXLAMDA=-DXLAMDA/5.0D0
C                SAY=SAY+1
C                ENDIF

C          XLAMDA=XLAMDA+DXLAMDA
C        OEBIN=SEBIN     
     
C     GO TO 150 
                         
C250    CONTINUE    
C===========================================bağlanma enerjisi daha hassas bulundu========= 
C========================================
C            CALL TIME(char_time)
C            WRITE(*,*)'TIME3=', char_time        
C            WRITE(7,*)ttb,SEBIN*RYIL
C            WRITE(*,*)TTB,SEBIN*RYIL

C========================================  
     
C700    CONTINUE

51     FORMAT(4(1X,F15.11))
511    FORMAT(5(2X,F25.19))
17     FORMAT(6(2X,F20.14))
16     FORMAT(3(2X,F14.8))
188    FORMAT(5(2X,F14.8))
C      PAUSE
C      STOP
c5555  CONTINUE
      PAUSE
    STOP
    END
C=======================================================
C============================ FUNCTIONS ================
C=======================================================
C=======================================================
    FUNCTION VVO(XX,YY,RYIL,AYIL,R1,R2,TB,RIC,LA,VO)
    IMPLICIT REAL*8 (A-H,O-Z)
      REAL*8 LA
c==================== deltoid bariyerli ==================
c      rdis=150.0/ayil
c      if (abs(xx).ge.(abs(rdis)-abs(yy)))vvo=vo
c      if(abs(xx).lt.(abs(rdis)-abs(yy)).and.abs(xx).gt.(abs(ric+tb)-
c     $abs(yy)))vvo=0.0
c      if(abs(xx).le.(abs(ric+tb)-abs(yy)).and.abs(xx).ge.(abs(ric)-
c     $abs(yy)))vvo=vo
c      if(abs(xx).lt.(abs(ric)-abs(yy)))VVO=0.0

C============== KARE =====================================
      IF (ABS(XX).Le.(LA/2.0).AND.ABS(YY).Le.(LA/2.0))THEN
      VVO=0.0
      ELSE
      VVO=VO
      END IF
    
      
C******************** UCGEN*****************************************
c        IF(ABS(XX).LT.(100/AYIL))THEN
c          IF(ABS(yy).LE.ABS((50./AYIL)+(xx/2.0)))THEN
c             VVO=0.0
c          ELSE
c             VVO=VO
c          END IF
c       ELSE
c         VVO=VO
c       END IF

c////////////////////DELTOİD/////////////////////////////////////////
c      IF(ABS(YY).LT.(LA/SQRT(2.0)))THEN
c         IF(ABS(XX).GT.ABS((LA/SQRT(2.0))-ABS(YY)))THEN
c      VVO=VO
c          ELSE
c          VVO=0.0
c          END IF
c       ELSE
c       VVO=VO
c       END IF
            
C333333333333333333333333333333 DOUBLE  KARE 333333333333333333333333333333333333333      
C      IF(ABS(YY).LT.(75./AYIL))THEN
C         IF(ABS(XX).LT.(150.0/AYIL).AND.ABS(XX).GT.(50./AYIL))THEN
C          VVO=0.0
C         ELSE
C         VVO=VO
C         END IF
C      ELSE
C      VVO=VO
C      ENDIF
      
      
      
C22222222222222222222222222222222 SILINDIR barıyerli2222222222222222222222222
C    RO=SQRT((XX)**2+(YY)**2)
C      IF(RO.GE.R2) VVO=VO
C      IF(RO.LT.R2.AND.RO.GT.(R1+TB))VVO=0.0
C      IF(RO.LE.(R1+TB).AND.RO.GE.R1)VVO=VO     
C      IF(RO.LT.R1) VVO=0.0
C111111111111111111111 1SİLİNDİR 1111111111111111111111111111111    
c         RO=SQRT((XX)**2+(YY)**2)
c       IF(RO.GT.(LA/2.0))THEN
c               VVO=VO
c         ELSE
c              VVO=0.0D0
c         ENDIF
C000000000000000000000   PARABOLIC  00000000000000000000000000000
C    IF(RRO.GE.RIC)THEN
C                      VVO=VO
C                  ELSE
C                      VVO=VO*((1/RIC)**2)*RRO*RRO
C                  ENDIF
C0000000000000000000 PETEK 00000000000000000000000000000000000000000000
C      VVO=VO
C      A1=150.0/AYIL !X1 KOORDINATI
C      A2=-150.0/AYIL!X2 KOORDINATI
C      A3=-150.0/AYIL!X3 KOORDINATI
C      A4=150.0/AYIL!X4 KOORDINATI
C      B1=150.0/AYIL!Y1 KOORDINATI
C      B2=150.0/AYIL!Y2 KOORDINATI
C      B3=-150.0/AYIL!Y3 KOORDINATI
C      B4=-150.0/AYIL!Y4 KOORDINATI
C      R1=25.0/AYIL!1 NOLU DAIRE Y CAPI
C      R2=25.0/AYIL!2 NOLU DAIRE Y CAPI
C      R3=25.0/AYIL!3 NOLU DAIRE Y CAPI
C      R4=25.0/AYIL!4 NOLU DAIRE Y CAPI
C       R5=50.0/AYIL!4 NOLU DAIRE Y CAPI
C      
C      RO=SQRT((XX)**2+(YY)**2)
C      RO1=SQRT((XX-A1)**2+(YY-B1)**2)
C         RO2=SQRT((XX-A2)**2+(YY-B2)**2)
C            RO3=SQRT((XX-A3)**2+(YY-B3)**2)
C               RO4=SQRT((XX-A4)**2+(YY-B4)**2)
C               IF(RO1.LT.R1)VVO=0.0
C                IF(RO2.LT.R2)VVO=0.0
C                 IF(RO3.LT.R3)VVO=0.0
C                  IF(RO4.LT.R4)VVO=0.0
C                    IF(RO.LT.r5)VVO=0.0
          
              
      RETURN 
    END
 

mkl(dgemm) performance problems on "superlarge" processors

$
0
0

Hi,

I was running two subsequent dgemm operations: T=AB and C=A'T with A=(56,000x400,000), B=(400,000x30), T=(56,000x30) and C=B.

Conditional on the CPU I measured these wall clock times (for the dgemm operations only):

Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz with 36 (real) cores, 46080 KB cache, 250GB of RAM

T=AB: 3.73 seconds,

C=A'T: 4.17 seconds

 

Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 56 (real) cores, 19712 KB cache, 2TB of RAM

T=AB: 91.47 seconds

C=A'T: 232.78 seconds

What was paticularly striking was that T=AB used all 56 cores, whereas C=A'T used only half of it.

kmp setting was: KMP_AFFINITY=compact,1,0,granularity=fine

 

I am wondering whether the bad performance of the latter is solely attributable to its architecture and therefore is set in stone, or whether I can somehow optimize mkl/kmp environment variables to increase performance.

Thanks


Eingenvalue solver dfeast_syev does not find values (info=1)

$
0
0

I am trying to use dfeast_syev to find eigenvectors of 4x4 matrix. I used FEAST, since I'be found that other methods give incorrect values. I call the routine with the following

    dfeast_syev(&uplo, &N, MM, &lda, fpm, epsout, &loop, &emin, &emax, N, (**EVal)->elm, (**EVec)-> elm, (MKL_INT*)&m0, (**res)->elm, &info);

N=4

lda=4

uplo='F'

emin=0.1

emax=10

m0=4

fpm - default

MM
        [0]    1.0000000000000000    double
        [1]    0.00000000000000000    double
        [2]    0.00000000000000000    double
        [3]    0.00000000000000000    double
        [4]    0.0000000000000000    double
        [5]    1.00000000000000000    double
        [6]    0.00000000000000000    double
        [7]    0.00000000000000000    double
        [8]    0.0000000000000000    double
        [9]    0.00000000000000000    double
        [10]    1.00000000000000000    double
        [11]    0.00000000000000000    double
        [12]    0.0000000000000000    double
        [13]    0.00000000000000000    double
        [14]    0.00000000000000000    double
        [15]    1.00000000000000000    double

The results is info=1, m0=0, loop=0.Meanwhile, this is what I think is correct usage. And the answer should be 4 degenerate eigenvalues equal to 1. 

If I change lda to 5 and add 4 zeros to the matrix, so it becomes 5x4, the routine finds 1 correct eigenvalue. 

What is going on? Why do I have to change lda to 5?

 

DGELSS Issue

$
0
0

I am attempting to use this routine but am getting an exception:

Exception thrown at 0x01FA5428 (mkl_core.dll) in Console1.exe: 0xC0000005: Access violation reading location 0x00000000.

I wonder if anyone knows how I might fix this.

Thanks, Angus. 

 

 

AttachmentSize
Downloadimage/pngISSUE.png192.76 KB

PARDISO, How to escape the occurrence of zero pivot error

$
0
0

Hi everyone,

I  have zero pivot error -4, but I know the error occurs in the condftion of my calculation.

So I want to have a solution  using zero pivot replacement (the value i_parm(10)) and without stopping calculation.

Is it possible and how should I set parameters?

 

mkl_set_num_threads() doesn't work

$
0
0

Hello everyone!

I am using MxNet library with MKL support. I need my mxnet predictor use only one thread for calculations. For this purpose I used OMP_NUM_THREADS = 1 environment variable. So it works fine for me. Then I read in MKL documentation that there is a special function mkl_set_num_threads() for that and that it's equivalent to OMP_NUM_THREADS. So I removed this env variable and called mkl_set_num_threads() instead before creating and using predictor. But there was no effect - my predictor still used more than one thread for calculations and it looks like he ignored this function. Maybe I don't know something. Any ideas? Thanks.

Rare crashes on MKL

$
0
0

I have implemented in C++ an algorithm in image processing using (among other things)  fftw wrappers in MKL library (version 2018.3.210)

I am working on a x64 machine with Intel Xeon E5-1650 v3 3.5 GHz processor and Windows7 as OS.

Have used MS visual studio 2015 as my IDE for development and debugging, the application is multi-threaded via C++11

<thread>

 library.

When running the application over and over again I see that in about 1% of the runs it crashes.

When I have attached it to my IDE and looked at the crash dumps I saw that the crashes are always on the call:

thePlan = fftw_plan_many_dft(.....);

with the exception "Unhandled exception at someaddress (mkl_avx2.dll) in MyApp.exe: 0xC00000005 access violation reading location 0x0000000000000"

or

the exception "Unhandled exception at someaddress (mkl_avx2.dll) in MyApp.exe 0xC00000005 access violation reading location 0x0000000000018"

1. I have checked that all inputs to the call are valid (pointers were allocated with fftw_malloc()) ,other inputs have legitimate  sizes and types.

2. Have run the application with Windows ApplicationVerifier attached to my IDE and got no warnings or errors.

3. Have run the application with Windows global flags attached to my IDE with all possible  heap corruption checks and got no exceptions.

What else can I do to debug these crashes?

Question about the mkl_?omatcopy

$
0
0

I can not understand the manual about the following parameters of the function.

rows        The number of rows in matrix B (the destination matrix).

cols         The number of columns in matrix B (the destination matrix).

ldb          If ordering = 'R' or 'r' , ldb represents the number of elements in array
               b between adjacent rows of matrix B.
             
              •If trans = 'T' or 't' or 'C' or 'c' , ldb must be at least equal to rows .
             •If trans = 'N' or 'n' or 'R' or 'r' , ldb must be at least equal to cols .

              If ordering = 'C' or 'c' , ldb represents the number of elements in array
              b between adjacent columns of matrix B.

             •If trans = 'T' or 't' or 'C' or 'c' , ldb must be at least equal to cols .
             •If trans = 'N' or 'n' or 'R' or 'r' , ldb must be at least equal to rows.

Please see the code in MKL official examples.

int main(int argc, char *argv[])
{
  size_t n=3, m=5;
  double src[] = {
    1.,   2.,   3.,   4.,   5.,
    6.,   7.,   8.,   9.,   10.,
    11.,  12.,  13.,  14.,  15.
  }; /* source matrix */
  double dst[8]; /* destination matrix */
  size_t src_stride = 5;
  size_t dst_stride = 2;

  printf("\nThis is example of using mkl_domatcopy\n");

  printf("INPUT DATA:\nSource matrix:\n");
  print_matrix(n, m, 'd', src);

  /*
  **  Source submatrix(2,4) a will be transposed
  */
  mkl_domatcopy('R'        /* row-major ordering */,
                'T'        /* A will be transposed */,
                2          /* rows */,
                4          /* cols */,
                1.         /* scales the input matrix */,
                src        /* source matrix */,
                src_stride /* src_stride */,
                dst        /* destination matrix */,
                dst_stride /* dst_stride */);
  /*  New matrix: src = {
  **      1,  6,
  **      2,  7,
  **      3,  8,
  **      4,  9,
  **    }
  */
  printf("OUTPUT DATA:\nDestination matrix:\n");
  print_matrix(4, 2, 'd', dst);

  return 0;
}

The new matrix should be 4 rows and 2 cols, but in the code are 2 and 4.

If the code is correct, the rows should be explained as the rows of destination matrix without "operation".

Would you please look into the manual and give me some advice?

Thanks.

 

 

How to access the number of non-zero elements in sparse_matrix_t?

$
0
0

I am writing a Python wrapper for calling the 'mkl_sparse_spmm' function. 

In order to export the result of matrix-matrix multiplication to a Python object, I need to know the size of the 'col_idx' or 'values' array in the MKL export routines. How could I get a hold of it?

Incidentally, the documentation for the 'mkl_sparse_spmm' function does not state the format of the returned matrix 'C'. Is it the same format as the matrix 'A' (or 'B')?


Downloading old versions of MKL

Intel Pardiso error numerical factorization

$
0
0

Dear Pardiso users,

I am using Intel Pardiso to solve a sparse system. For small matrices it works perfectly and very fast. However, with increasing system size I figured out that for some set of parameters the numerical factorization doesn't work. Since I have to solve the system several times and the system does not change crucially for each step, I use the CGS-algorithm (iparm(4) = 91). Now, the following problem occurs: the solver directly (iparm(4) = 0) solves the system as a first step and the following error is produced:

*** error PARDISO: iterative refinement  contraction rate is greater than 0.9, interrupt

Unfortunately, both the documentation and the forum/Intel website do not provide any further information about this error. In the following you can see the iparm parameters I have used. They almost correspond to default values, however, iparm(4) is set to perform CGS steps.

call mkl_set_dynamic ( 0 )

 iparm(1)  = 1  ! do not use default values
 iparm(2)  = 3  ! fill-in reordering from METIS
 iparm(3)  = 1  ! Number of processors
 iparm(4)  = 91 ! iterative-direct algorithm
 iparm(8)  = 10 ! Max. number of iterative refinement steps on entry
 iparm(10) = 13 ! perturb the pivot elements with 1E-13
 iparm(11) = 1  ! use nonsymmetric permutation and scaling MPS
 iparm(13) = 1  ! Improved accuracy using nonsymmetric weighted matching
 iparm(21) = 1  ! Apply 1x1 diagonal pivoting during the factorization process
 iparm(24) = 1  ! Parallel factorization control
 iparm(25) = 1  ! Parallel forward/backward solve control
 iparm(27) = 1  ! checks whether column indices are sorted in increasing order within each row

I would like to share a code that you can reproduce the problem, however, the matrices are 1.2 GB large. As I have written, the problem only occurs for large systems. In the following, you can see an extract of my code that performs the solution of the system.

ik = 0
cgsxcounter = 0

do

ik = ik + 1
if (ik.ge.3) then
write(*,*) 'no solution found'
stop
end if

if (cgsxcounter.eq.0) then

iparm(4) = 0

!Release all memory
phase = -1
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, error)

!Reordering and Symbolic Factorization, This step also allocates all memory that is necessary for the factorization
phase = 11 ! only reordering and symbolic factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)
if (error.ne.0) write(*,*) 'Reordering and Symbolic Factorization wrong: ', error

 cgsxcounter=1

end if

!Factorization.
phase = 22 ! only factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)
if (error.ne.0) stop

!Back substitution and iterative refinement
phase = 33 ! only substitution
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, error)

if (iparm(20).lt.0) then
  write(*,*) 'Try again'
  cgsxcounter = 0
else
 exit
end if

end do

iparm(4) = 91

 

 

It would be nice if you could provide more information about the problem. What can cause such a error? What can be done in order to avoid it?

Thanks in advance,

Horst

 

 

Finding the eigenvalues (diagonalizing) of a block-diagonal matrix

$
0
0

I have to diagonalize a large matrix which take a lot of time. The matrix size 10,000  x 10,000.

This  matrix is Hamiltonian of a spin system which have some block structure. Is there way to diagonalize the full matrix by diagonalize each block ?

Basically I want to

1. permutate the matrix to reduce to a block structure

2. Diagonalize each blocks .

I would appreciate any help.

Similar question for mathematica : https://mathematica.stackexchange.com/questions/170008/finding-the-eigen...

Thanks.

Using FMA in MKL routines

$
0
0

Hey everyone,

I couldn't find any old topics that dealt with this question in detail, so here I am asking it again: is there a way to enable FMA math when using the MKL routines? Here is a sample routine that when run on MSVC 2017 with the latest MKL version (details in the output below) and an AVX2 processor DOES NOT use FMA:

void print_mkl_info() {
    MKLVersion Version;
    mkl_get_version(&Version);
    printf("Major version:           %d\n",Version.MajorVersion);
    printf("Minor version:           %d\n",Version.MinorVersion);
    printf("Update version:          %d\n",Version.UpdateVersion);
    printf("Product status:          %s\n",Version.ProductStatus);
    printf("Build:                   %s\n",Version.Build);
    printf("Platform:                %s\n",Version.Platform);
    printf("Processor optimization:  %s\n",Version.Processor);
    printf("================================================================\n");
    printf("\n");
}

float standard_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i < 4; i++) {
        c = c + (a[i] * b[i]);
    }
    return c;
}

float standard_fma_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i < 4; i++) {
        c = fmaf(a[i], b[i], c);
    }
    return c;
}

float mkl_dot_product(float* a, float* b) {
    return cblas_sdot(4, a, 1, b, 1);
}

int main() {
    print_mkl_info();
    float a[4] = { 1.907607, -.7862027, 1.148311, .9604002 };
    float b[4] = { -.9355000, -.6915108, 1.724470, -.7097529 };
    printf("Standard dot product is:     %.23f\n", standard_dot_product(a, b));
    printf("Standard FMA dot product is: %.23f\n", standard_fma_dot_product(a, b));
    printf("MKL dot product is:          %.23f\n", mkl_dot_product(a, b));
    return 0;
}

The above program outputs (compiled with FP:FAST and O2. Note that changing O2 to O1 changes the result of the standard_dot_product function, but not of the CBLAS routine):

 

Major version:           2019
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20190118
Platform:                32-bit
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Standard dot product is:     0.05768233537673950195313
Standard FMA dot product is: 0.05768235772848129272461
MKL dot product is:          0.05768233537673950195313

 

So is there anyway to generate results with FMA in such cases? Or am I being a knobhead and missing something?

 

THANKS!

Swat

dfeast_sygv -4 error, BUT B IS POSITIVE DEFINITE!!!

$
0
0

Hello all,

I am using the extended eigensolver routines specifically dfeast_sygv function, and I get the following error

==>INFO code =: -4
Intel MKL Extended Eigensolvers Error: Matrix B is not positive definite.

But the matrix B is positive definite!!!

B =

+2.222e-01      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +5.556e-02      +0.000e+00
+0.000e+00      +2.222e-01      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +5.556e-02
+1.111e-01      +0.000e+00      +2.222e-01      +0.000e+00      +5.556e-02      +0.000e+00      +1.111e-01      +0.000e+00
+0.000e+00      +1.111e-01      +0.000e+00      +2.222e-01      +0.000e+00      +5.556e-02      +0.000e+00      +1.111e-01
+1.111e-01      +0.000e+00      +5.556e-02      +0.000e+00      +2.222e-01      +0.000e+00      +1.111e-01      +0.000e+00
+0.000e+00      +1.111e-01      +0.000e+00      +5.556e-02      +0.000e+00      +2.222e-01      +0.000e+00      +1.111e-01
+5.556e-02      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +2.222e-01      +0.000e+00
+0.000e+00      +5.556e-02      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +2.222e-01

 

using dfeast_syev function give the following eigenvalues.

+5.556e-02
+5.556e-02
+1.667e-01
+1.667e-01
+1.667e-01
+1.667e-01
+5.000e-01
+5.000e-01

all positive.

Here go the matrix A for the general eigenvalues problem

A =

+4.569e+02      +0.000e+00      -3.046e+02      +0.000e+00      +7.616e+01      +0.000e+00      -2.285e+02      +0.000e+00
+0.000e+00      +4.569e+02      +0.000e+00      +7.616e+01      +0.000e+00      -3.046e+02      +0.000e+00      -2.285e+02
-3.046e+02      +0.000e+00      +4.569e+02      +0.000e+00      -2.285e+02      +0.000e+00      +7.616e+01      +0.000e+00
+0.000e+00      +7.616e+01      +0.000e+00      +4.569e+02      +0.000e+00      -2.285e+02      +0.000e+00      -3.046e+02
+7.616e+01      +0.000e+00      -2.285e+02      +0.000e+00      +4.569e+02      +0.000e+00      -3.046e+02      +0.000e+00
+0.000e+00      -3.046e+02      +0.000e+00      -2.285e+02      +0.000e+00      +4.569e+02      +0.000e+00      +7.616e+01
-2.285e+02      +0.000e+00      +7.616e+01      +0.000e+00      -3.046e+02      +0.000e+00      +4.569e+02      +0.000e+00
+0.000e+00      -2.285e+02      +0.000e+00      -3.046e+02      +0.000e+00      +7.616e+01      +0.000e+00      +4.569e+02

can anybody help me here???

I'm getting something wrong???

Im using linux (debian testing).

thanks in advance.

ps. Sorry about the english.

Viewing all 3005 articles
Browse latest View live