Different results getrf/getrs, dss and intel pardiso

January 30, 2019, 6:21 am

Latest and popular articles on Intel Technologies

≫ Next: Intel MKL ERROR: Parameter 4 was incorrect on entry to DSTEIN2

Dear all,

I have a small nonsymmetric linear system that is represented by a matrix in csr format (file fort.106). The task is to solve the system. To this end, I applied three different approaches. At first, I transformed the three csr-vectors to a dense matrix with the help of mkl_ddnscsr. Using getrf/getrs (methbutton=6) solves the system and produces reasonable results. Using the sparsity of the system, I applied intel dss (methbutton=7). However, the results obtained with this method differ from the results of getrf/getrs far beyond machine precision. Going one step further to intel pardiso (methbutton=8), produces:

forrtl: severe (174): SIGSEGV, segmentation fault occurred

Things, I have tried to avoid the problems:

-https://software.intel.com/en-us/articles/determining-root-cause-of-sigs...

-checked the sparse matrix with sparse matrix checker routines (no error)

In principle, I would have said that my matrix is the problem. However, then I would guess that getrf/getrs doesn't work either. However, since it does work, I guess the solvers are somehow the issue.

You can find my code attached (2Modes.f90). The vectors representing the matrix can be found in fort.106. The program automatically reads the vector, so you can just compile and run it. Compiling works fine with ifort -o 2Modes.out 2Modes.f90 ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -i8 -I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include .

The methods can be switched with the methbutton in line 41. I tried to keep the code as simple as possible. In principle, I have extracted the code from the examples which intel provides. It would be nice, if someone could take a look at this. Thank you in advance.

Best,

Horst K.

Attachment	Size
Download Code.zip	34.88 KB

↧

Intel MKL ERROR: Parameter 4 was incorrect on entry to DSTEIN2

January 30, 2019, 4:04 pm

Latest and popular articles on Intel Technologies

≫ Next: Performance characteristics of cblas_gemm_s16s16s32

≪ Previous: Different results getrf/getrs, dss and intel pardiso

Hi,

Some users are seeing the following error messages when running VASP, built against Intel 2018.0.3.

Intel MKL ERROR: Parameter 4 was incorrect on entry to DSTEIN2.
Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2

The behavior was noticed as the rank count increased, and https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17897.html also suggests that this behavior might be expected as rank count exceeds some level.

Is this expected behavior from MKL, and if so, what is the guidance on MPI rank count vs. matrix property (size?)?

Thanks

↧

Performance characteristics of cblas_gemm_s16s16s32

January 31, 2019, 4:15 am

Latest and popular articles on Intel Technologies

≫ Next: Random Number Generator

≪ Previous: Intel MKL ERROR: Parameter 4 was incorrect on entry to DSTEIN2

Hi,

I'm interested to get more details on the performance characteristics of the function cblas_gemm_s16s16s32. In my application, the performance gain over cblas_sgemm is lower than I would hope.

Here is my test configuration, which is larger than what would typically be used in my application (a seq2seq model):

CblasColMajor

M = 1024
K = 512
N = 2048

TRANS_A = FALSE
TRANS_B = TRUE

And here are some single threaded results on a Intel(R) Core(TM) i7-6700K (AVX2), averaged over 1000 samples:

* cblas_sgemm: 17.7135 ms
* cblas_gemm_s16s16s32: 15.5617 ms

Are these values expected? Do I need to do something specific to get more performance out of cblas_gemm_s16s16s32?

Thanks,

Guillaume

↧

Random Number Generator

January 31, 2019, 9:32 am

Latest and popular articles on Intel Technologies

≫ Next: Package script bug: link_install.sh and tr -s [:blank:]

≪ Previous: Performance characteristics of cblas_gemm_s16s16s32

Hi,

I have developed a function to generate a vector of random numbers. However, there are two problems:

1. When the number of random numbers to be generated is too large (e.g. 50000^2), it will output: MKL ERROR: Parameter 3 was incorrect on entry to vdRngUniform.

2. When the left bound is a positive number (i.e. a = 300 b = 100), the function violates the left bound condition and returns minimum value of zero! However, if 'a' is a negative number, it works correctly!

Can you please help? Thank you.

Afshin

int RNG_UNIF ( long int N, double a, double b, double *P )
{
    // N number of random values to be generated
    // a is the left bound
    // b is the right bound
	
    VSLStreamStatePtr stream;
    int errcode = 0;
    srand(time(0));    
    long seed = rand();
   	
    /***** Initialize *****/
    errcode = vslNewStream( &stream, VSL_BRNG_MT2203, seed ); 
    if (errcode != 0) goto err;
    /***** Call RNG *****/    	
    errcode = vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, stream, N, P, a, b );
    if (errcode != 0) goto err;

    vslDeleteStream(&stream);
   
    err:
    return errcode; 	

}

↧

Package script bug: link_install.sh and tr -s [:blank:]

February 1, 2019, 5:18 pm

Latest and popular articles on Intel Technologies

≫ Next: mkl/blas routine for C=AA'B

≪ Previous: Random Number Generator

Having installed the package intel-comp-l-all-vars-19.0.1-144 via apt on Ubuntu, I've run into a problem with the script at /opt/intel/compilers_and_libraries_2019.1.144/linux/bin/link_install.sh which is run as part of the installation process. On my system this script fails and prevents apt from completing.

This is because it contains a few instances of lines like:

str=$(echo $str | tr -s [:blank:] | sed 's/^ *//g')

The problem here is the

[:blank:]

On most systems it will do the right thing -- but square brackets actually denote a bash glob. So if it happens to match files called e.g. "b", "l" or "a" (in your root directory, from where apt runs the script), bash will substitute those in place of [:blank:] and tr will get the wrong arguments.

I happen to have both a /a and a /n on my systems, so what gets run is "tr -s a n" which substitutes all "a"s for "n"s... definitely not what the author intended! The symptom is lines such as

/opt/intel/compilers_and_libraries_2019.1.144/linux/bin/link_install.sh: line 565: =/opt/intel/compilers_nd_librnries: No such file or directory

output by apt (or dpkg) before the installation aborts.

Simple fix: put every instance of [:blank:] in quotes: "[:blank:]".

↧

mkl/blas routine for C=AA'B

February 4, 2019, 11:15 pm

Latest and popular articles on Intel Technologies

≫ Next: eigen value and eigen vector

≪ Previous: Package script bug: link_install.sh and tr -s [:blank:]

Hi there,

is there any mkl/blas function which performs the operation C=AA'B in on go. Currently I use an intermediate array T and dgemm: T=A'B;C=AT'. I am wondering whether there is a more efficient way since A is always the same matrix.

Thanks.

↧

eigen value and eigen vector

February 5, 2019, 12:29 am

Latest and popular articles on Intel Technologies

≫ Next: mkl(dgemm) performance problems on "superlarge" processors

≪ Previous: mkl/blas routine for C=AA'B

I'm working on quantum structures, so I need to calculate eigenvalues and eigenvalues in a very precise drawing, I want to work with values such as n = 10000 and above. Mkl library that I use it is not very favorable to offer an alternative program. 4 core 3.6 ghz 12gb ram on laptop,
I also have a 200-core 120 gb ram li host computer.

my code

c use imsl
INCLUDE 'LINK_FNL_STATIC.H'
USE EVESB_INT
C USE EVESF_INT
c USE EPISF_INT
c USE EPISB_INT
C USE CSDER_INT
C USE CSINT_INT

    IMPLICIT NONE
INTEGER I,II,J,K,L,M,N,NDATA,NINTV,LDA,LDEVEC,NCODA,NEVAL,
$NEVEC,INT_TIME,IK,MXEVAL
PARAMETER (M=101,N=M*M,NCODA=M,NDATA=N,LDA=N,LDEVEC=N
$,NEVEC=4,MXEVAL=4)
C=======================================================================
   REAL*8 A(LDA,N),ALPHA,AALPHA,PI,BREAK(NDATA),AA,BB,OTOP1U,SEBIN,
$XX,YY,ZZ,EPSILON,LAMDA,DLAMDA,RRO,VB(N),PSO(N),VDC(M,M),C,VO,
$VVO,DU,RO1RO2,RO3,RO4,A1,A2,A3,A4,B1,B2,B3,B4,PSU(M,M),
$PSIN(M,M),PSF(M,M),DRO,TOP,RO,EM,VL(N),DZ,AYIL,RYIL,F,ETA,B,GAMA,
$RR,U,H,XI,YI,KZZ,INTEN,HPLANCK,VVVO,DX,DY,X(M),Y(M),MY,RRIC,RIC,KZ
$,P,EO,EB,EIK,EUC,VM(N),TOPKISI,KISIBIR,KISIUC,EPS,FXSU,FXOU,OTOPXU
$,TOPXU,BETA1,BETA3,TOPBETA,TOPYU,M12,HW,T,EF,EIN,BETA3U,BETA3A,BET
$A1U,BETA1A,E,FXOA1,FXSA1,FXOA2,FXSA2,NR,R,OTOPXA1,OTOPXA2,TOPXA1,T
$OPXA2,TOPYA1,TOPYA2,TOPSON,EPSO,EVAL(NEVEC),EVEC(LDEVEC,NEVEC),F1
$U,TOP1U,F1SU,OSI,VS(N),EYUKU,SIGMA,TZAMAN,CISIK,KISIBIRA,KISIBIR
$U,KISIUCU,KISIUCA,RR1,R1,RR2,R2,TTB,TB,PII,XLAMDA,DXLAMDA,SAY,OTOP
$XA,TOPYA,OEBIN,FK0,FK1,TOPXA,FXSA,ATA,FXOA,PS,RO1,RO2,FXSA3,
$TOPXA3,OTOPXA3,OTOP1OU,OTOP2OU,F2U,F2SU,OTOP2U,M11,M22,KS1U,
$KS31A,KS31U,KS32U,KS33U,KS34U,KS32A,KS33A,TOPKS,TOP2U,BETA31U,
$BETA32U,BETA33U,BETA31A,KS1A,KS1,KS3
   REAL*8 OTOPYU,OTOPYA,TOPZU,TOPZA,Z,LL,LA,OTOPYA3,TOPYA3,FXOA3
C=======================================================================
LOGICAL SMALL
CHARACTER*8 CHAR_TIME
CALL TIME(CHAR_TIME)
WRITE(*,*)'TIME1=', CHAR_TIME
   PI=4.0D0*DATAN(1.0D0)
C=======================================================================
C OPEN(1,FILE='ALGAAS k L60.DAT')
c   OPEN(2,FILE='ALGAAS s-PISI_L20.DAT')
c OPEN(3,FILE='ALGAAS silindir A-E.DAT')
c   OPEN(4,FILE='ALGAAS PISILER DELTOID00t70.DAT')
c   OPEN(5,FILE='AlGaAs r2 B VE ENERJI M1 KZ2.DAT')
C   OPEN(6,FILE='GAALAS IC BARIYER VE TABAN ENERJI.DAT')
c   OPEN(7,FILE='GAALAS BAGLANMA ENERJI-TBB LAZER 00KARE XIYI00.DAT')
   OPEN(8,FILE='AlGaAs 1-2 BETA INTEN03 ALFA 60 k.DAT')
   OPEN(9,FILE='AlGaAs 1-2 K_INDEX INTEN03 ALFA 60 k.DAT')

C%%%%%%%%%%%%%%%%%%%%%%%%%%%% SABITLER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
INTEN=0.300000E10 !MEGA WATT/ CMKARE----> METREYEKAREYE CEVRILIYORc
AALPHA=60.0D0 !LAZER GENLIĞI (ANG)
LL=105.0
C=================================ALGAAS===================================
MY=0.06650D0
   EPSILON=10.9 ! 13.18 10.90D0
C****************** ALGAN************************************************
C MY=0.13D0
C EPSILON=9.7 !YÜKSEK FREKANS 5.3!STATIK9.7
C===================GaINAS ===============================================
c MY=0.023+0.037*0.3+0.003*(0.3)**2
C EPSILON=15.1-2.87*0.3+0.67*(0.3)**2!STATIK
c EPSILON=12.3-1.4*0.3 !YUKSEK
C==========================================================================
   RYIL=(13605.698110D0*MY/(EPSILON**2))
   AYIL=0.52917724820D0*EPSILON/MY

C==================POT=================================================
c VO=228000000000000.0D0/RYIL !ALGAAS IçIN
VO=228.0/RYIL !DİKKAT
C VO=345.0D0/RYIL !ALGAN IçIN
c VO=227.70D0/RYIL !GAINAS IçIN
c DO 5555 TTB=5.0,200,5.0
c   DO 5555 AALPHA=0,100,5.0
c DO 5555 B=0.0D0,20.0D0,1.0D0
C DO 5555 RRIC=50.0,150.0,5.0
C DO 5555 VVVO=100.0,300.0,10.0

F=00.0D0 !ELEKTRIK ALAN ŞIDDETI (KV/CM)
   B=0.0D0 !MANYETIK ALAN ŞIDDETI (TESLA)

   XI=0.0000001D0 !YAB. AT. KONUMU
   YI=0.0000001D0 !YAB. AT. KONUMU
   RR=220.0012345670D0 !Dış GENIşLIK
   RRIC=50.00D0 !Iç KUYU GENIşLIğI
TTB=50.0
C======================= OPTIK GEçIş KATSAYıLARı ======================
EYUKU=1.60217733E-19!DSQRT(2.0D0) !COULOMB
SIGMA=3.0E22 !M-^3 TAŞIYICI YOĞUNLUĞU için

TZAMAN=5E12 !PIKO SANIYE SANIYEYE CEVRILIP CARPIM DURUMUNDA
C TZAMAN=(1.0/1.5)*1E12 !Algan
CISIK=2.99792458E8 !METRE/SANIYE
EPSO=8.854187817E-12!C^2/(NEWTON.METREKARE)
NR=3.2!DSQRT(EPSILON)
HPLANCK=1.05457266E-34!J.SANIYE
C****************** DONUSUMLER **************************************
C**********************************************************************
   ALPHA=AALPHA/AYIL
LA=LL/AYIL
   RIC=RRIC/AYIL
   R=RR/AYIL
RR1=40.0
RR2=150.0
R1=RR1/AYIL
R2=RR2/AYIL
TB=TTB/AYIL
C VO=VVVO/RYIL
KZ=0.0D0/(Ric)!DALGA SAYıSı
   EM=0.0D0 !AZIMUTHAL MAGNETIK ALAN
C======================================================================
   ETA=0.010D0*AYIL*F/RYIL
GAMA=4.254381195E-6*EPSILON*EPSILON*B/(MY*MY)
   DX=(2.0D0*R)/REAL(M-1)
   DY=(2.0D0*R)/REAL(M-1)
    DZ=(2.0D0*R)/REAL(M-1)
DRO=R/REAL(M-1)
   AA=4.0D0/(DX*DX)
   BB=-1.0D0/(DX*DX)
C***********************************************************************
   PRINT*,'EPSILON:',EPSILON,'EPSILON=10,89 ISE LAZER AKTIF'
PRINT*,'RYIL:',RYIL
PRINT*,'MY:',MY
   PRINT*,'AALPHA',AALPHA,'ANGUSTRON'
PRINT*,'B=',B
PRINT*,'M=',EM
PRINT*,'KZ=',KZ
PRINT*,'RRIC',RRIC
PRINT*,'GAMMA=',GAMA
PRINT*,'VO=',VO*RYIL
PRINT*,'INTEN=',INTEN

C************** AZIMUTHAL MAGNETIK ALAN *******************************
   II=1
DO K=1,M
   X(K)=-R+REAL(K-1)*DX
IF(ABS(X(K)).LE.0.000000001) GO TO 32
32 DO L=1,M
Y(L)=-R+REAL(L-1)*DY
IF(ABS(Y(L)).LE.0.0000000001) GOTO 33
RO=DSQRT(X(K)*X(K)+Y(L)*Y(L))
IF(RO.LT.RIC)THEN
VB(II) =((EM*EM/(RO*RO))+KZ*KZ-(GAMA*RO*RO*KZ/RIC)
$+(GAMA*GAMA*(RO**4)/(4*(RIC**2))))
VS(II)=0.0!0.25*GAMA*GAMA*RO*RO
ELSE
VB(II)=0.0!(EM*EM/(RO*RO))+KZ*KZ+2*KZ*GAMA*RIC*LOG(RIC/RO)+
C $GAMA*GAMA*RO*RO*(LOG(RIC/RO)**2)
VS(II)=0.0!0.25*GAMA*GAMA*RO*RO
END IF
33 II=II+1
   END DO
END DO
C********************* LAZER GIYDIRILIYOR ******************************
   PRINT*,'LAZER GIYDIRILIYOR'
   II=1
   DO 999 L=1,M
YY=Y(L)
DO 888 K=1,M
   XX=X(K)
C=======================================================================
   TOP=0.0D0
   DU=0.0010D0
   DO U=0.0D0,2.0D0*PI,DU
TOPSON=VVO(XX+ALPHA*DSIN(U),YY,RYIL,AYIL,R1,R2,TB,RIC,LA,VO)
   TOP=TOP+(TOPSON)*DU
   END DO
   TOP=TOP/(2.0D0*PI)
C=======================================================
   VDC(K,L)=TOP
    VL(II)=VDC(K,L)
VM(II)=VL(II)+VB(II)+VS(II)
II=II+1
WRITE(1,19)X(K)*AYIL,Y(L)*AYIL,VDC(K,L)*RYIL
888   CONTINUE
999   CONTINUE
    PRINT*, 'KUYU TANIMLANDI', 'VB'

19 FORMAT(3(2X,F14.8))
18 FORMAT(5(2X,F14.8))
177 FORMAT(4(2X,F14.8))
C********************* MATRIS *****************************************
A=0.0D0 !AMATRIS BOLOK(2M+1,N=M*M)
DO L=M+1,M*M
A(1,L)=BB
ENDDO

DO L=2,M*M
   A(M,L)=BB
ENDDO
   DO L=M+1,M*M-1,M
   A(M,L)=0.0D0
ENDDO

               DO L=1,M*M
               A(M+1,L)=AA+Vm(L)
               ENDDO
C                       DO L=1,M*M-1
C                       A(M+2,L)=BB
C                       END DO
C                               DO L=1,M*M-M
C                               A(2*M+1,L)=BB
C                               ENDDO
C*********** MATRIS EKRANA YAZDıRıLıYOR *******************************
c DO 40 K=1,M+1
c WRITE(*,'(1X,6(F6.1))') (A(K,L),L=1,m)
c WRITE(1,'(1X,6(F6.1))') (A(K,L),L=1,m)
c40 CONTINUE
c PAUSE
c   STOP
C***********************************************************************
SMALL =.TRUE.
CALL DEVESB(N,NEVEC,A,LDA,NCODA,SMALL,EVAL,EVEC,LDEVEC)
C CALL DEVESF (N, NEVEC, A, LDA, SMALL, EVAL, EVEC, LDEVEC)
C PRINT*,'PERFORMANS INDEX=',PII
C PII=EPISB(NEVEC,A,NCODA,EVAL,EVEC)
c PII= EPISF(NEVEC,A,EVAL,EVEC)
CALL TIME(CHAR_TIME)
WRITE(*,*)'TIME2=', CHAR_TIME

   EO=(EVAL(NEVEC))*RYIL !TABAN DURUM MEV CINSINDEN
   EB=(EVAL(NEVEC-1))*RYIL !1.UYARıLMıS DURUM MEV CINSINDEN
   EIK=(EVAL(NEVEC-2))*RYIL !2.UYARıLMıS DURUM MEV CINSINDEN
EUC=(EVAL(NEVEC-3))*RYIL !3.UYARıLMıS DURUM MEV CINSINDEN
C=======================================================================
   II=1
   DO L=1,M
   DO K=1,M
   PSIN(K,L)=EVEC(II,NEVEC)*AYIL*1E-10 !PISILER METRE BOYUTUNDA
   PSF(K,L)=EVEC(II,NEVEC-1)*AYIL*1E-10 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
PSU(K,L)=EVEC(II,NEVEC-2)*AYIL*1E-10
PSO(II)=EVEC(II,NEVEC)
WRITE(4,511)X(K)*AYIL,Y(L)*AYIL,PSIN(K,L)/1E-10,PSF(K,L)/1E-10,
$EVEC(II,NEVEC-2)*AYIL*1E-10/1E-10
   II=II+1
ENDDO
   ENDDO
   PRINT*, 'EO=',EO,'MEV'
   PRINT*, 'E1=',EB,'MEV'
   PRINT*, 'E2=',EIK,'MEV'
PRINT*, 'E3=',EUC,'MEV'
   WRITE(3,188)AALPHA,EO,EB,EIK,EUC
c WRITE(5,188)B,EO,EB,EIK,EUC
c WRITE(5,*)B,EO
C WRITE(6,*)RRIC,EO
   EF=EB
   EIN=EO

C********************* M12 HESABI ********************************
OTOPXU=0.0D0
OTOP1OU=0.0D0
OTOP2OU=0.0D0
   OTOPXA1=0.0D0
   OTOPXA2=0.0D0
OTOPXA3=0.0
   TOPYU=0.0D0
TOP1U=0.0D0
TOP2U=0.0D0
   TOPYA1=0.0D0
   TOPYA2=0.0D0
TOPYA3=0.0
   DO 400 L=1,M
   YY=Y(L)
C===================INTEGRALIN X KıSMı BAşLıYOR======================
   FXOU=0.0D0
F1U=0.0D0
F2U=0.0D0

   FXOA1=0.0D0
   FXOA2=0.0D0
FXOA3=0.0
       TOPXU=0.0D0
TOP1U=0.0D0
TOP2U=0.0D0
   TOPXA1=0.0D0
TOPXA2=0.0D0
TOPXA3=0.0D0
II=1
RO=DSQRT(XX*XX+YY*YY)

DO 300 K=1,M
   XX=X(K)*AYIL*1E-10 !METREYE CEVIRDIK
yy=y(K)*AYIL*1E-10
FXSU=PSIN(K,L)*xx*PSF(K,L) !X YONDE POLRIZASYON ıKEN XX Y ıSE YY KULLAN
F1SU=PSIN(K,L)*xx*PSIN(K,L)
F2SU=PSF(K,L)*xx*PSF(K,L)

   FXSA1=PSIN(K,L)*PSIN(K,L)
   FXSA2=PSF(K,L)*PSF(K,L)
FXSA3=PSU(K,L)*PSU(K,L)

   TOPXU=TOPXU+(FXSU+FXOU)*(DX)/2.0D0
TOP1U=TOP1U+(F1SU+F1U)*DX/2.0D0
TOP2U=TOP2U+(F2SU+F2U)*DX/2.0D0
   TOPXA1=TOPXA1+(FXSA1+FXOA1)*(DX)/2.0D0
   TOPXA2=TOPXA2+(FXSA2+FXOA2)*(DX)/2.0D0
TOPXA3=TOPXA3+(FXSA3+FXOA3)*(DX)/2.0D0
   FXOU=FXSU
F1U=F1SU
F2U=F2SU
   FXOA1=FXSA1
   FXOA2=FXSA2
FXOA3=FXSA3
300 CONTINUE
C==================== X KıSMı BITTI ==============================
TOPYU=TOPYU+(OTOPXU+TOPXU)*(DY)/2.0D0
TOP1U=TOP1U+(OTOP1U+TOP1U)*DY/2.0D0
TOP2U=TOP2U+(OTOP2U+TOP2U)*DY/2.0D0
       TOPYA1=TOPYA1+(OTOPXA1+TOPXA1)*(DY)/2.0D0
   TOPYA2=TOPYA2+(OTOPXA2+TOPXA2)*(DY)/2.0D0
TOPYA3=TOPYA3+(OTOPXA3+TOPXA3)*(DY)/2.0D0
   OTOPXU=TOPXU
OTOP1U=TOP1U
OTOP2U=TOP2U
       OTOPXA1=TOPXA1
   OTOPXA2=TOPXA2
OTOPXA3=TOPXA3
400 CONTINUE
C%%%%%%%%%%%%%%% NORMALIZE PISI %%%%%%%%%%%%%%%%%%%%%%%%%%%%
   DO K=1,M
   DO L=1,M
   WRITE(2,17)X(K)*AYIL,Y(L)*AYIL,(PSIN(K,L)/DSQRT(TOPYA1))**2,
$(PSF(K,L)/DSQRT(TOPYA2))**2,(PSU(K,L)/DSQRT(TOPYA3))**2,VDC(K,L)
$*RYIL
   ENDDO
   ENDDO
PRINT*, 'TOPYA3',TOPYA3
PRINT*, 'TOPYA2',TOPYA2
PRINT*, 'TOPYA1',TOPYA1
M12=(TOPYU*EYUKU)/(DSQRT(TOPYA1)*DSQRT(TOPYA2))
M11=(TOPYU*EYUKU)/(DSQRT(TOPYA1)*DSQRT(TOPYA1))
M22=(TOPYU*EYUKU)/(DSQRT(TOPYA2)*DSQRT(TOPYA2))

c OSI=(2*MY*9.1093897E-31/(HPLANCK**2))*((EB-EO)*1.6021773E-22)*
c $(((TOPROU)/(DSQRT((TOPYA1))*DSQRT((TOPYA2))))**2)
c   PRINT*, 'M12=',M12
c WRITE(13,*)B,OSI
C WRITE(10,*)B,VS(II)
C PRINT *, 'B=',B, OSI
C PAUSE
C STOP
C////////////////////// OPTIK GECIS ///////////////////////////////
C########### 1. VE 3. DERECE ABSORTSIYON KATSAYISI ####################

C######################################################################
C########### 1. VE 3. DERECE ABSORTSIYON KATSAYISI ####################
DO 4444 HW=0.0D0,250.0D0,1.0D0
BETA1U=SIGMA*(HW*1.6021773E-22)*(M12*M12)*TZAMAN
BETA1A=CISIK*EPSO*NR*(((((EF-EIN-HW)*1.6021773E-22)**2))+
$(HPLANCK*TZAMAN)**2)

BETA1=BETA1U/BETA1A

BETA3U=INTEN*2.0*SIGMA*((M12)**4)*(HW*1.6021773E-22)*TZAMAN
BETA31U=DABS((M22-M11)/(2.0*M12))**2
BETA32U=(((EF-EIN-HW)*1.6021773E-22)**2)-(HPLANCK*TZAMAN)**2
BETA33U=2*((EF-EIN)*1.6021773E-22)*((EF-EIN-HW)*1.6021773E-22)

BETA3A=CISIK*CISIK*EPSO*EPSO*NR*NR*((((((EF-EIN-HW)*1.6021773E-22)
$)**2)+(HPLANCK*TZAMAN)**2)**2)
BETA31A=((EF-EIN)*1.6021773E-22)**2+(HPLANCK*TZAMAN)**2


BETA3=-(BETA3U/BETA3A)*(1-(((BETA31U)*(BETA32U+BETA33U))/BETA31A))

   TOPBETA= BETA1+BETA3

C***************************************************************
C*************** DIREK ABSORTSION KATSAYISI ********************
   KS1U=SIGMA*((EF-EIN-HW)*1.6021773E-22)*((M12)**2)
KS1A=2.0*NR*NR*EPSO*((((EF-EIN-HW)*1.6021773E-22)**2
$)+(HPLANCK*TZAMAN)**2)

   KS1=KS1U/KS1A

KS31U=INTEN*SIGMA*((EF-EIN-HW)*1.6021773E-22)*((M12)**4)
KS31A=NR*NR*NR*EPSO*EPSO*CISIK*((((EF-EIN-HW)*1.6021773E-22)**2
$+(HPLANCK*TZAMAN)**2)**2)

KS32U=DABS((M22-M11)/(2.0*M12))**2
KS33U=((EF-EIN)*1.6021773E-22)*((EF-EIN-HW)*1.6021773E-22)**2
KS34U=((HPLANCK*TZAMAN)**2)*(3*((EF-EIN)*1.6021773E-22)-2*
$(HW*1.6021773E-22))
KS32A=((EF-EIN)*1.6021773E-22)**2+(HPLANCK*TZAMAN)**2
KS33A=(EF-EIN-HW)*1.6021773E-22

KS3=-(KS31U/KS31A)*(1-KS32U*((KS33U-KS34U)/(KS32A*KS33A)))

   TOPKS=KS1+KS3

c   PRINT*,'TOPKISI',TOPKS
WRITE(8,51)HW,BETA1/1e4,BETA3/1e4,TOPBETA/1e4
   WRITE(9,51)HW,Ks1,Ks3,TOPKS
c WRITE(12,*)INTEN/1E7,TOPBETA/1E2
4444 CONTINUE !HW FOTON ENERJISI DöNGUSU   , ALPHA DONGUSU
C################## YABANCI ATOM #########################################

C XLAMDA=0.1
C   DXLAMDA=0.010D0
C   SAY=0.
C   OEBIN=-1.0D30
C150 CONTINUE
C=====================================================integralin Z kısmı başlıyor==========
C OTOPYU=0.0D0
C   OTOPYA=0.0D0
C
C   TOPZU=0.0D0
C   TOPZA=0.0D0
C DZ=0.10D0

C       DO 500 Z=-R,R,DZ
C IF(ABS(Z).LE.0.000000010D0)GOTO 500
C
C
C
C
C=====================================================integralin y kısmı başlıyor==========
C OTOPXU=0.0D0
C   OTOPXA=0.0D0
C   TOPYU=0.0D0
C TOPYA=0.0D0
C II=1
C
C DO 400 J=1,M
C YY=Y(J)
C====================================================integralin x kısmı başlıyor===========
C   FXOU=0.
C   FXOA=0.
C   TOPXU=0.
C   TOPXA=0.
C
C
C   DO 300 I=1,M
C   XX=X(I)
C   RO1=DSQRT((XX-XI+ALPHA)**2+(YY-YI)**2+Z*Z)
C RO2=DSQRT((XX-XI-ALPHA)**2+(YY-YI)**2+Z*Z)
C           PS=EVEC(II,NEVEC)*DEXP(-DABS(RO1)+DABS(RO2)/(2.0D0)*XLAMDA)

C ATA=((1.0D0/RO1)+(1.0D0/RO2))/2.0D0
C FXSU=(PS*ATA*PS)
C   FXSA=(PS*PS)

C   TOPXU=TOPXU+(FXSU+FXOU)*DX/2.0D0
C TOPXA=TOPXA+(FXSA+FXOA)*DX/2.0D0

C   II=II+1

C   FXOU=FXSU
C   FXOA=FXSA
C WRITE(*,*)'RO1',RO1
C WRITE(*,*)'RO2',RO2
C WRITE(*,*)'PS',PS
C WRITE(*,*)'ata',ata
C WRITE(*,301)Z,XX,YY,TOPXA,TOPXU
C300 CONTINUE
C301 FORMAT(5(2X,F10.6))
C=================================================== x kısmı bitti=========================

C TOPYU=TOPYU+(OTOPXU+TOPXU)*DY/2.0D0
C   TOPYA=TOPYA+(OTOPXA+TOPXA)*DY/2.0D0

C   OTOPXU=TOPXU
C   OTOPXA=TOPXA

C400 CONTINUE
C=================================================== y kısmı bitti=========================

C TOPZU=TOPZU+(OTOPYU+TOPYU)*DZ/2.0D0
C   TOPZA=TOPZA+(OTOPYA+TOPYA)*DZ/2.0D0

C   OTOPYU=TOPYU
C   OTOPYA=TOPYA

C500 CONTINUE
C=================================================== Z kısmı bitti=========================

C SEBIN=-(1.0D0/XLAMDA**2.)+2.0D0*(TOPZU/TOPZA) !bağlanma enerjisiC
C WRITE(*,*)XLAMDA,SEBIN,SAY

C PAUSE
C STOP

C====================================bağlanma enerjisi için hassaslaştırma yapılıyor=======
C IF(SEBIN.LT.OEBIN)THEN
C       IF(SAY.GT.5)GO TO 250
C           DXLAMDA=-DXLAMDA/5.0D0
C           SAY=SAY+1
C   ENDIF

C XLAMDA=XLAMDA+DXLAMDA
C   OEBIN=SEBIN

C   GO TO 150

C250 CONTINUE
C===========================================bağlanma enerjisi daha hassas bulundu=========
C========================================
C CALL TIME(char_time)
C WRITE(*,*)'TIME3=', char_time
C WRITE(7,*)ttb,SEBIN*RYIL
C WRITE(*,*)TTB,SEBIN*RYIL

C========================================

C700 CONTINUE

51 FORMAT(4(1X,F15.11))
511 FORMAT(5(2X,F25.19))
17 FORMAT(6(2X,F20.14))
16 FORMAT(3(2X,F14.8))
188 FORMAT(5(2X,F14.8))
C PAUSE
C STOP
c5555 CONTINUE
PAUSE
   STOP
   END
C=======================================================
C============================ FUNCTIONS ================
C=======================================================
C=======================================================
   FUNCTION VVO(XX,YY,RYIL,AYIL,R1,R2,TB,RIC,LA,VO)
   IMPLICIT REAL*8 (A-H,O-Z)
REAL*8 LA
c==================== deltoid bariyerli ==================
c rdis=150.0/ayil
c if (abs(xx).ge.(abs(rdis)-abs(yy)))vvo=vo
c if(abs(xx).lt.(abs(rdis)-abs(yy)).and.abs(xx).gt.(abs(ric+tb)-
c $abs(yy)))vvo=0.0
c if(abs(xx).le.(abs(ric+tb)-abs(yy)).and.abs(xx).ge.(abs(ric)-
c $abs(yy)))vvo=vo
c if(abs(xx).lt.(abs(ric)-abs(yy)))VVO=0.0

C============== KARE =====================================
IF (ABS(XX).Le.(LA/2.0).AND.ABS(YY).Le.(LA/2.0))THEN
VVO=0.0
ELSE
VVO=VO
END IF

C******************** UCGEN*****************************************
c IF(ABS(XX).LT.(100/AYIL))THEN
c IF(ABS(yy).LE.ABS((50./AYIL)+(xx/2.0)))THEN
c VVO=0.0
c ELSE
c VVO=VO
c END IF
c ELSE
c VVO=VO
c END IF

c////////////////////DELTOİD/////////////////////////////////////////
c IF(ABS(YY).LT.(LA/SQRT(2.0)))THEN
c IF(ABS(XX).GT.ABS((LA/SQRT(2.0))-ABS(YY)))THEN
c VVO=VO
c ELSE
c VVO=0.0
c END IF
c ELSE
c VVO=VO
c END IF

C333333333333333333333333333333 DOUBLE KARE 333333333333333333333333333333333333333
C IF(ABS(YY).LT.(75./AYIL))THEN
C IF(ABS(XX).LT.(150.0/AYIL).AND.ABS(XX).GT.(50./AYIL))THEN
C VVO=0.0
C ELSE
C VVO=VO
C END IF
C ELSE
C VVO=VO
C ENDIF



C22222222222222222222222222222222 SILINDIR barıyerli2222222222222222222222222
C   RO=SQRT((XX)**2+(YY)**2)
C IF(RO.GE.R2) VVO=VO
C IF(RO.LT.R2.AND.RO.GT.(R1+TB))VVO=0.0
C IF(RO.LE.(R1+TB).AND.RO.GE.R1)VVO=VO
C IF(RO.LT.R1) VVO=0.0
C111111111111111111111 1SİLİNDİR 1111111111111111111111111111111
c    RO=SQRT((XX)**2+(YY)**2)
c    IF(RO.GT.(LA/2.0))THEN
c VVO=VO
c ELSE
c VVO=0.0D0
c ENDIF
C000000000000000000000 PARABOLIC 00000000000000000000000000000
C   IF(RRO.GE.RIC)THEN
C               VVO=VO
C   ELSE
C               VVO=VO*((1/RIC)**2)*RRO*RRO
C               ENDIF
C0000000000000000000 PETEK 00000000000000000000000000000000000000000000
C VVO=VO
C A1=150.0/AYIL !X1 KOORDINATI
C A2=-150.0/AYIL!X2 KOORDINATI
C A3=-150.0/AYIL!X3 KOORDINATI
C A4=150.0/AYIL!X4 KOORDINATI
C B1=150.0/AYIL!Y1 KOORDINATI
C B2=150.0/AYIL!Y2 KOORDINATI
C B3=-150.0/AYIL!Y3 KOORDINATI
C B4=-150.0/AYIL!Y4 KOORDINATI
C R1=25.0/AYIL!1 NOLU DAIRE Y CAPI
C R2=25.0/AYIL!2 NOLU DAIRE Y CAPI
C R3=25.0/AYIL!3 NOLU DAIRE Y CAPI
C R4=25.0/AYIL!4 NOLU DAIRE Y CAPI
C R5=50.0/AYIL!4 NOLU DAIRE Y CAPI
C
C RO=SQRT((XX)**2+(YY)**2)
C RO1=SQRT((XX-A1)**2+(YY-B1)**2)
C RO2=SQRT((XX-A2)**2+(YY-B2)**2)
C RO3=SQRT((XX-A3)**2+(YY-B3)**2)
C RO4=SQRT((XX-A4)**2+(YY-B4)**2)
C IF(RO1.LT.R1)VVO=0.0
C IF(RO2.LT.R2)VVO=0.0
C IF(RO3.LT.R3)VVO=0.0
C IF(RO4.LT.R4)VVO=0.0
C IF(RO.LT.r5)VVO=0.0


RETURN
   END

↧

mkl(dgemm) performance problems on "superlarge" processors

February 5, 2019, 1:13 pm

Latest and popular articles on Intel Technologies

≫ Next: Eingenvalue solver dfeast_syev does not find values (info=1)

≪ Previous: eigen value and eigen vector

Hi,

I was running two subsequent dgemm operations: T=AB and C=A'T with A=(56,000x400,000), B=(400,000x30), T=(56,000x30) and C=B.

Conditional on the CPU I measured these wall clock times (for the dgemm operations only):

Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz with 36 (real) cores, 46080 KB cache, 250GB of RAM

T=AB: 3.73 seconds,

C=A'T: 4.17 seconds

Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 56 (real) cores, 19712 KB cache, 2TB of RAM

T=AB: 91.47 seconds

C=A'T: 232.78 seconds

What was paticularly striking was that T=AB used all 56 cores, whereas C=A'T used only half of it.

kmp setting was: KMP_AFFINITY=compact,1,0,granularity=fine

I am wondering whether the bad performance of the latter is solely attributable to its architecture and therefore is set in stone, or whether I can somehow optimize mkl/kmp environment variables to increase performance.

Thanks

↧

Eingenvalue solver dfeast_syev does not find values (info=1)

February 6, 2019, 4:31 am

Latest and popular articles on Intel Technologies

≫ Next: DGELSS Issue

≪ Previous: mkl(dgemm) performance problems on "superlarge" processors

I am trying to use dfeast_syev to find eigenvectors of 4x4 matrix. I used FEAST, since I'be found that other methods give incorrect values. I call the routine with the following

dfeast_syev(&uplo, &N, MM, &lda, fpm, epsout, &loop, &emin, &emax, N, (**EVal)->elm, (**EVec)-> elm, (MKL_INT*)&m0, (**res)->elm, &info);

N=4

lda=4

uplo='F'

emin=0.1

emax=10

m0=4

fpm - default

MM
       [0]   1.0000000000000000   double
       [1]   0.00000000000000000   double
       [2]   0.00000000000000000   double
       [3]   0.00000000000000000   double
       [4] 0.0000000000000000   double
       [5] 1.00000000000000000   double
       [6]   0.00000000000000000   double
       [7]   0.00000000000000000   double
       [8] 0.0000000000000000   double
       [9]   0.00000000000000000   double
       [10] 1.00000000000000000   double
       [11]   0.00000000000000000   double
       [12] 0.0000000000000000   double
       [13]   0.00000000000000000   double
       [14]   0.00000000000000000   double
       [15] 1.00000000000000000   double

The results is info=1, m0=0, loop=0.Meanwhile, this is what I think is correct usage. And the answer should be 4 degenerate eigenvalues equal to 1.

If I change lda to 5 and add 4 zeros to the matrix, so it becomes 5x4, the routine finds 1 correct eigenvalue.

What is going on? Why do I have to change lda to 5?

↧

DGELSS Issue

February 7, 2019, 2:18 am

Latest and popular articles on Intel Technologies

≫ Next: PARDISO, How to escape the occurrence of zero pivot error

≪ Previous: Eingenvalue solver dfeast_syev does not find values (info=1)

I am attempting to use this routine but am getting an exception:

Exception thrown at 0x01FA5428 (mkl_core.dll) in Console1.exe: 0xC0000005: Access violation reading location 0x00000000.

I wonder if anyone knows how I might fix this.

Thanks, Angus.

Attachment	Size
Download ISSUE.png	192.76 KB

↧

PARDISO, How to escape the occurrence of zero pivot error

February 6, 2019, 8:41 pm

Latest and popular articles on Intel Technologies

≫ Next: mkl_set_num_threads() doesn't work

≪ Previous: DGELSS Issue

Hi everyone,

I have zero pivot error -4, but I know the error occurs in the condftion of my calculation.

So I want to have a solution using zero pivot replacement (the value i_parm(10)) and without stopping calculation.

Is it possible and how should I set parameters?

↧

mkl_set_num_threads() doesn't work

February 8, 2019, 10:43 am

Latest and popular articles on Intel Technologies

≫ Next: Rare crashes on MKL

≪ Previous: PARDISO, How to escape the occurrence of zero pivot error

Hello everyone!

I am using MxNet library with MKL support. I need my mxnet predictor use only one thread for calculations. For this purpose I used OMP_NUM_THREADS = 1 environment variable. So it works fine for me. Then I read in MKL documentation that there is a special function mkl_set_num_threads() for that and that it's equivalent to OMP_NUM_THREADS. So I removed this env variable and called mkl_set_num_threads() instead before creating and using predictor. But there was no effect - my predictor still used more than one thread for calculations and it looks like he ignored this function. Maybe I don't know something. Any ideas? Thanks.

↧

Rare crashes on MKL

February 9, 2019, 10:52 pm

Latest and popular articles on Intel Technologies

≫ Next: Question about the mkl_?omatcopy

≪ Previous: mkl_set_num_threads() doesn't work

I have implemented in C++ an algorithm in image processing using (among other things) fftw wrappers in MKL library (version 2018.3.210)

I am working on a x64 machine with Intel Xeon E5-1650 v3 3.5 GHz processor and Windows7 as OS.

Have used MS visual studio 2015 as my IDE for development and debugging, the application is multi-threaded via C++11

<thread>

library.

When running the application over and over again I see that in about 1% of the runs it crashes.

When I have attached it to my IDE and looked at the crash dumps I saw that the crashes are always on the call:

thePlan = fftw_plan_many_dft(.....);

with the exception "Unhandled exception at someaddress (mkl_avx2.dll) in MyApp.exe: 0xC00000005 access violation reading location 0x0000000000000"

the exception "Unhandled exception at someaddress (mkl_avx2.dll) in MyApp.exe 0xC00000005 access violation reading location 0x0000000000018"

1. I have checked that all inputs to the call are valid (pointers were allocated with fftw_malloc()) ,other inputs have legitimate sizes and types.

2. Have run the application with Windows ApplicationVerifier attached to my IDE and got no warnings or errors.

3. Have run the application with Windows global flags attached to my IDE with all possible heap corruption checks and got no exceptions.

What else can I do to debug these crashes?

↧

Question about the mkl_?omatcopy

February 11, 2019, 4:34 am

Latest and popular articles on Intel Technologies

≫ Next: How to access the number of non-zero elements in sparse_matrix_t?

≪ Previous: Rare crashes on MKL

I can not understand the manual about the following parameters of the function.

rows The number of rows in matrix B (the destination matrix).

cols The number of columns in matrix B (the destination matrix).

ldb          If ordering = 'R' or 'r' , ldb represents the number of elements in array
               b between adjacent rows of matrix B.

              •If trans = 'T' or 't' or 'C' or 'c' , ldb must be at least equal to rows .
             •If trans = 'N' or 'n' or 'R' or 'r' , ldb must be at least equal to cols .

If ordering = 'C' or 'c' , ldb represents the number of elements in array
b between adjacent columns of matrix B.

•If trans = 'T' or 't' or 'C' or 'c' , ldb must be at least equal to cols .
•If trans = 'N' or 'n' or 'R' or 'r' , ldb must be at least equal to rows.

Please see the code in MKL official examples.

int main(int argc, char *argv[])
{
size_t n=3, m=5;
double src[] = {
    1.,   2.,   3.,   4.,   5.,
    6.,   7.,   8.,   9.,   10.,
    11., 12., 13., 14., 15.
}; /* source matrix */
double dst[8]; /* destination matrix */
size_t src_stride = 5;
size_t dst_stride = 2;

printf("\nThis is example of using mkl_domatcopy\n");

printf("INPUT DATA:\nSource matrix:\n");
print_matrix(n, m, 'd', src);

/*
** Source submatrix(2,4) a will be transposed
*/
mkl_domatcopy('R'        /* row-major ordering */,
                'T'        /* A will be transposed */,
                2          /* rows */,
                4          /* cols */,
                1.         /* scales the input matrix */,
                src        /* source matrix */,
                src_stride /* src_stride */,
                dst        /* destination matrix */,
                dst_stride /* dst_stride */);
/* New matrix: src = {
**      1, 6,
**      2, 7,
**      3, 8,
**      4, 9,
**    }
*/
printf("OUTPUT DATA:\nDestination matrix:\n");
print_matrix(4, 2, 'd', dst);

return 0;
}

The new matrix should be 4 rows and 2 cols, but in the code are 2 and 4.

If the code is correct, the rows should be explained as the rows of destination matrix without "operation".

Would you please look into the manual and give me some advice?

Thanks.

↧

How to access the number of non-zero elements in sparse_matrix_t?

February 9, 2019, 4:31 pm

Latest and popular articles on Intel Technologies

≫ Next: Downloading old versions of MKL

≪ Previous: Question about the mkl_?omatcopy

I am writing a Python wrapper for calling the 'mkl_sparse_spmm' function.

In order to export the result of matrix-matrix multiplication to a Python object, I need to know the size of the 'col_idx' or 'values' array in the MKL export routines. How could I get a hold of it?

Incidentally, the documentation for the 'mkl_sparse_spmm' function does not state the format of the returned matrix 'C'. Is it the same format as the matrix 'A' (or 'B')?

↧

Downloading old versions of MKL

February 11, 2019, 10:51 am

Latest and popular articles on Intel Technologies

≫ Next: Intel Pardiso error numerical factorization

≪ Previous: How to access the number of non-zero elements in sparse_matrix_t?

Disregard, I figured it out.

↧

Intel Pardiso error numerical factorization

February 12, 2019, 6:56 am

Latest and popular articles on Intel Technologies

≫ Next: Finding the eigenvalues (diagonalizing) of a block-diagonal matrix

≪ Previous: Downloading old versions of MKL

Dear Pardiso users,

I am using Intel Pardiso to solve a sparse system. For small matrices it works perfectly and very fast. However, with increasing system size I figured out that for some set of parameters the numerical factorization doesn't work. Since I have to solve the system several times and the system does not change crucially for each step, I use the CGS-algorithm (iparm(4) = 91). Now, the following problem occurs: the solver directly (iparm(4) = 0) solves the system as a first step and the following error is produced:

*** error PARDISO: iterative refinement contraction rate is greater than 0.9, interrupt

Unfortunately, both the documentation and the forum/Intel website do not provide any further information about this error. In the following you can see the iparm parameters I have used. They almost correspond to default values, however, iparm(4) is set to perform CGS steps.

call mkl_set_dynamic ( 0 )

 iparm(1)  = 1  ! do not use default values
 iparm(2)  = 3  ! fill-in reordering from METIS
 iparm(3)  = 1  ! Number of processors
 iparm(4)  = 91 ! iterative-direct algorithm
 iparm(8)  = 10 ! Max. number of iterative refinement steps on entry
 iparm(10) = 13 ! perturb the pivot elements with 1E-13
 iparm(11) = 1  ! use nonsymmetric permutation and scaling MPS
 iparm(13) = 1  ! Improved accuracy using nonsymmetric weighted matching
 iparm(21) = 1  ! Apply 1x1 diagonal pivoting during the factorization process
 iparm(24) = 1  ! Parallel factorization control
 iparm(25) = 1  ! Parallel forward/backward solve control
 iparm(27) = 1  ! checks whether column indices are sorted in increasing order within each row

I would like to share a code that you can reproduce the problem, however, the matrices are 1.2 GB large. As I have written, the problem only occurs for large systems. In the following, you can see an extract of my code that performs the solution of the system.

ik = 0
cgsxcounter = 0

do

ik = ik + 1
if (ik.ge.3) then
write(*,*) 'no solution found'
stop
end if

if (cgsxcounter.eq.0) then

iparm(4) = 0

!Release all memory
phase = -1
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, error)

!Reordering and Symbolic Factorization, This step also allocates all memory that is necessary for the factorization
phase = 11 ! only reordering and symbolic factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)
if (error.ne.0) write(*,*) 'Reordering and Symbolic Factorization wrong: ', error

 cgsxcounter=1

end if

!Factorization.
phase = 22 ! only factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)
if (error.ne.0) stop

!Back substitution and iterative refinement
phase = 33 ! only substitution
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, error)

if (iparm(20).lt.0) then
  write(*,*) 'Try again'
  cgsxcounter = 0
else
 exit
end if

end do

iparm(4) = 91

It would be nice if you could provide more information about the problem. What can cause such a error? What can be done in order to avoid it?

Thanks in advance,

Horst

↧

Finding the eigenvalues (diagonalizing) of a block-diagonal matrix

February 13, 2019, 4:36 am

Latest and popular articles on Intel Technologies

≫ Next: Using FMA in MKL routines

≪ Previous: Intel Pardiso error numerical factorization

I have to diagonalize a large matrix which take a lot of time. The matrix size 10,000 x 10,000.

This matrix is Hamiltonian of a spin system which have some block structure. Is there way to diagonalize the full matrix by diagonalize each block ?

Basically I want to

1. permutate the matrix to reduce to a block structure

2. Diagonalize each blocks .

I would appreciate any help.

Similar question for mathematica : https://mathematica.stackexchange.com/questions/170008/finding-the-eigen...

Thanks.

↧

Using FMA in MKL routines

February 13, 2019, 7:03 am

Latest and popular articles on Intel Technologies

≫ Next: dfeast_sygv -4 error, BUT B IS POSITIVE DEFINITE!!!

≪ Previous: Finding the eigenvalues (diagonalizing) of a block-diagonal matrix

Hey everyone,

I couldn't find any old topics that dealt with this question in detail, so here I am asking it again: is there a way to enable FMA math when using the MKL routines? Here is a sample routine that when run on MSVC 2017 with the latest MKL version (details in the output below) and an AVX2 processor DOES NOT use FMA:

void print_mkl_info() {
    MKLVersion Version;
    mkl_get_version(&Version);
    printf("Major version:           %d\n",Version.MajorVersion);
    printf("Minor version:           %d\n",Version.MinorVersion);
    printf("Update version:          %d\n",Version.UpdateVersion);
    printf("Product status:          %s\n",Version.ProductStatus);
    printf("Build:                   %s\n",Version.Build);
    printf("Platform:                %s\n",Version.Platform);
    printf("Processor optimization:  %s\n",Version.Processor);
    printf("================================================================\n");
    printf("\n");
}

float standard_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i < 4; i++) {
        c = c + (a[i] * b[i]);
    }
    return c;
}

float standard_fma_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i < 4; i++) {
        c = fmaf(a[i], b[i], c);
    }
    return c;
}

float mkl_dot_product(float* a, float* b) {
    return cblas_sdot(4, a, 1, b, 1);
}

int main() {
    print_mkl_info();
    float a[4] = { 1.907607, -.7862027, 1.148311, .9604002 };
    float b[4] = { -.9355000, -.6915108, 1.724470, -.7097529 };
    printf("Standard dot product is:     %.23f\n", standard_dot_product(a, b));
    printf("Standard FMA dot product is: %.23f\n", standard_fma_dot_product(a, b));
    printf("MKL dot product is:          %.23f\n", mkl_dot_product(a, b));
    return 0;
}

The above program outputs (compiled with FP:FAST and O2. Note that changing O2 to O1 changes the result of the standard_dot_product function, but not of the CBLAS routine):

Major version:           2019
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20190118
Platform:                32-bit
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Standard dot product is:     0.05768233537673950195313
Standard FMA dot product is: 0.05768235772848129272461
MKL dot product is:          0.05768233537673950195313

So is there anyway to generate results with FMA in such cases? Or am I being a knobhead and missing something?

THANKS!

Swat

↧

dfeast_sygv -4 error, BUT B IS POSITIVE DEFINITE!!!

February 13, 2019, 7:18 am

Latest and popular articles on Intel Technologies

≫ Next: How to compile and run Quantum Espresso with intel MKL and Open MPI on Mac OS X

≪ Previous: Using FMA in MKL routines

Hello all,

I am using the extended eigensolver routines specifically dfeast_sygv function, and I get the following error

==>INFO code =: -4
Intel MKL Extended Eigensolvers Error: Matrix B is not positive definite.

But the matrix B is positive definite!!!

B =

+2.222e-01      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +5.556e-02      +0.000e+00
+0.000e+00      +2.222e-01      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +5.556e-02
+1.111e-01      +0.000e+00      +2.222e-01      +0.000e+00      +5.556e-02      +0.000e+00      +1.111e-01      +0.000e+00
+0.000e+00      +1.111e-01      +0.000e+00      +2.222e-01      +0.000e+00      +5.556e-02      +0.000e+00      +1.111e-01
+1.111e-01      +0.000e+00      +5.556e-02      +0.000e+00      +2.222e-01      +0.000e+00      +1.111e-01      +0.000e+00
+0.000e+00      +1.111e-01      +0.000e+00      +5.556e-02      +0.000e+00      +2.222e-01      +0.000e+00      +1.111e-01
+5.556e-02      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +2.222e-01      +0.000e+00
+0.000e+00      +5.556e-02      +0.000e+00      +1.111e-01      +0.000e+00      +1.111e-01      +0.000e+00      +2.222e-01

using dfeast_syev function give the following eigenvalues.

+5.556e-02
+5.556e-02
+1.667e-01
+1.667e-01
+1.667e-01
+1.667e-01
+5.000e-01
+5.000e-01

all positive.

Here go the matrix A for the general eigenvalues problem

A =

+4.569e+02      +0.000e+00      -3.046e+02      +0.000e+00      +7.616e+01      +0.000e+00      -2.285e+02      +0.000e+00
+0.000e+00      +4.569e+02      +0.000e+00      +7.616e+01      +0.000e+00      -3.046e+02      +0.000e+00      -2.285e+02
-3.046e+02      +0.000e+00      +4.569e+02      +0.000e+00      -2.285e+02      +0.000e+00      +7.616e+01      +0.000e+00
+0.000e+00      +7.616e+01      +0.000e+00      +4.569e+02      +0.000e+00      -2.285e+02      +0.000e+00      -3.046e+02
+7.616e+01      +0.000e+00      -2.285e+02      +0.000e+00      +4.569e+02      +0.000e+00      -3.046e+02      +0.000e+00
+0.000e+00      -3.046e+02      +0.000e+00      -2.285e+02      +0.000e+00      +4.569e+02      +0.000e+00      +7.616e+01
-2.285e+02      +0.000e+00      +7.616e+01      +0.000e+00      -3.046e+02      +0.000e+00      +4.569e+02      +0.000e+00
+0.000e+00      -2.285e+02      +0.000e+00      -3.046e+02      +0.000e+00      +7.616e+01      +0.000e+00      +4.569e+02

can anybody help me here???

I'm getting something wrong???

Im using linux (debian testing).

thanks in advance.

ps. Sorry about the english.

↧