Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 3005 articles
Browse latest View live

dll files to include with the program

$
0
0

Dear Intel support,

I am linking my program with the Single Dynamic Library (mkl_rt.dll). Could you please let me know exactly which dlls from the redistributable folder should we inlcude with our software? Currently it fails to run unless we include all of the dlls in the program folder.

Thanks,

 


Calls using FFTW wrapper don't seem to be thread safe

$
0
0

Hi,

I am attempting to add an OpenMP layer of parallelism into an MPI code writtten in Fortran. However, I am not doing this correctly as it does not appear to be thread safe. I can say this with some confidence as adding an !$OMP CRITICAL region around the library call results in a correct answer. I believe that this is a result of the FFTW wrappers.

Having looked at the MKL documentation I see a mention of "fftw3_mkl.number_of_user_threads", I imagine that this would resolve the issue but I haven't been able to implement it sucuessfully. Is there any chance anyone has a working sample code (written in Fortran) that shows the correct use of this object?

To clarify: I am attempting to execute iterations of a loop that contains calls to the FFT routines in parallel, this is essentially serial FFTs in parallel. I realise that there is an argument for using parallel FFTs here but that would require a substantially larger coding effort given the code surrounding the FFT library calls. That will be my next job.

Many thanks,

Karl

MKL - different result in each processor?

$
0
0

Hello,

I am having problems with an MPI code using Intel MKL and ifort (Composer version: 13.1.0.146). Each processor has exactly the same matrix, and they should be able to perform some sequential operations. Each processor is expected o obtain exactly the same values, since they are using the same binaries, same libraries and each node is in fact identical (2 Sandy Bridge EP E5-2670 processors in each node). However, routines as CGEMM and CGESVD produce  slightly different values in each processor, a variantion of the order of 1e-6~1e-8. This does not always happen, and it seem to depend on the number of processors being used.

Is this behaviour expected at all? The difference is below the machine precision (considering single precision) but aren't the individual cores suppose to perform the roundoffs in the same manner? If this behaviour is not expected I could provide some example matrices.

Thanks in advance

pardiso produces strange result

$
0
0

Hi all,

MKL-pardiso gives back a totally wrong solution vector for one of my smaller test cases in C with a sparse unsymmetric 96x96 matrix. The solution clearly satisfies A*x != b.

Attached is the example file pardiso_unsym_c.c from the MKL example collection in which I replaced the data for ia[], ja[] and a[] with my matrix data. There are no other changes in the program, except that I print the solution vector at the end. For me this program gives a wrong result.

For comparison I also attach a matlab script solving the linear system based on the identical matrix data (copy-pasted from the C-code). The matrix has a decent condition number and matlab gives the correct result.

I tested the C-code with MKL-pardiso using different number of threads, different platforms (Mac OSX and Linux), as well as using the original pardiso version of pardiso-project.org, but the error persisted (different numbers, though). An increased value of iparm[7] for iterative refinement also shows no improvement. 

On the other hand, trying the matrix data with other libraries, like 'eigen' or 'SuperLU' gives correct results. 

I would be more than happy if anybody would have any suggestions on this problem or point out some mistake on my side...

Best regards & thanks,
Manuel

Segfault when using threaded 1D DFT on AVX platforms

$
0
0

Hi,

I've recently ported some code to a 64-bit CentOS 6 server that supports AVX instructions and I think I have encountered a bug with the MKL DFT routines when threading is enabled.  When I try to take a 80640 point complex 1D forward DFT, I get a segfault if I set mkl_set_num_threads to any number greater than 1, yet the code works fine if I set mkl_set_num_threads(1).  Not sure if this has been documented or encountered by others, but for me it seems to be limited to my 64-bit AVX platform as when I compile on a 64-bit SSE4.2 platform, the code runs fine with no segfault.  I've attached the test code that I've been running to debug.  For reference, I am compiling with:

icpc -O3 -xHost test.cpp -openmp -liomp5 -lpthread -lm -lmkl_core -lmkl_intel_lp64 -lmkl_intel_thread

Here are my system stats:

Compiler: intel compserXE 2013.0.79 (MKL v11.0)

OS: 64-bit Linux CentOS 6.4

CPU: Xeon E5-2690@2.9GHz

Also, when I run the core dump through gdb, I get the following back-trace:

mkl_dft_avx_xc_4step_1_2 ()

step1234 ()

ttl_parallel_team ()

L_kmp_invoke_pass_parms ()

Is this a bug or am I just doing something wrong with my DFT?

Thanks,

Nick

AttachmentSize
Downloadtest.cpp1.15 KB

Looks like a function is missing

$
0
0

Hello!

Any help will be highly appriciated)

Strange thing occures when i try to build a custom dll.

the make command i use looks like this: nmake ia32 buf_lib= export=func_list name=mkl_1

while func list contains only one function

DGEMM

And everything works perfect. 

Then i add one more function to func_list so it looks like this:

DGEMM
DGEMV

, make another dll:

nmake ia32 buf_lib= export=func_list name=mkl_2

Matrix-Matrix product still works

but the dgemv function seems not working

Matrix A is 10x10 matrix, containing 1 on each position.

B is a 10-dimension vector containing 1 on each position. 

C is a 10-dimention vector containing 1 on each position.

This call of dgemv

CBLAS.dgemv(CBLAS.TRANSPOSE.NoTrans, 10, 10, 25.0, A, 10, B, 10, 11.0, C, 10); (C := 25*A*B+11*C)

is successful and shows no link errors. That means that a new entry point arised.

BUT! This call should change the C vector an make 36 on each position, but it doesn't! It leaves C unchanged. 

Moreover, the strange thing i found, that the mkl_1.dll is about 995 Kb, and the mkl_2.dll is smth like 996 Kb. Is it possible that additional function dgemv adds only 1Kb to the output dll?

I also tried another styles in func_list file, like cblas_dgemm and cblas_dgemv, but it doesn't change a thing. 

looks like my blas library is not full. What can i do?

May be i should change some Object File Library files in /%MKL_ROOT%/lib/ia32/ ?

Problems calling mkl_rt.dll in C# / program crashes

$
0
0

Hi,

we are using Intel MKL 11.02. since a few weeks and a first C++ example calling the includes works fine. But we have some trouble to import the mkl_rt.dll in C#. When calling the dll the program crashes without any notice. We have tried it for some functions (MKL_Set_Num_Threads, VDDIV, VDADD, DASUM, MKL_Get_Max_Threads) and always get the same result. Additionally we tested 32bit and 64bit versions of mkl_rt.dll and also the Intel C#-example "vddiv" - and result every time in a uncommented program stop.

namespace MKL_Test
public unsafe class MKL_Wrapper

[DllImport("mkl_rt.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling=true, SetLastError = false, EntryPoint = "MKL_Get_Max_Threads")]
public static extern int Get_Max_Threads();

The function is then called by:  int anzahl = MKL_Test.MKL_Wrapper.Get_Max_Threads();

---
Are there any further issues we don't have taken into account?
Our test files are attached.

Many thanks in advance,
Philipp Wollmann

AttachmentSize
Downloadmkl-test.zip1.13 MB

Data fitting task creation/destruction does not appear to be thread safe

$
0
0

Hi Everyone,

I'm using the data fitting routines in MKL (w_mkl_11.0.3.171) for spline interpolation. The fortran routine where these are called is used in a C++ OpenMP multi-threaded application, inside a parallel for loop for completely independent datasets.

The first attempt produced random crashes and wrong results, which suggested a possible data race. When I added a OMP CRITICAL section around the task creation (dfdNewTask1D) and destruction (dfDeleteTask) the code started working perfectly, producing the same results as the single core invocation.

I think the task creation and destruction is not currently thread safe, but a (admittedly quick) look at the manual did not warn about this.

Is this a known issue?

Thanks and regards,
Federico 


Eigenvalue solver dfeast_scsrgv overwrites input arguments?

$
0
0

 

Experimenting with the symetric general eigen value solver dfeast_scsrgv.

Running it with a test matrix, I find that it overwrites the values for e and x on the stack?

dfeast_scsrgv (&uplo, &n, a, rows, columns, 
b, rows, columns,
fpm, &epsout, &loop, &emin, &emax, &m0, eigs, eigv, &m, &res, &info);

exits with info=0, m=1, suggesting one eigenvalue was found but eigs and eigv have been changed and no longer point to the original arrays.

inputs are declared as

double* a = (double*) mkl_malloc(nnz*sizeof(double), MKL_ALIGN);
double* b = (double*) mkl_malloc(nnz*sizeof(double), MKL_ALIGN);
int* columns = (int*) mkl_malloc(nnz*sizeof(int), MKL_ALIGN);
int* rows = (int*) mkl_malloc((n+1)*sizeof(int), MKL_ALIGN);

char uplo = 'U'; //upper triangular matrices

MKL_INT fpm[128];
feastinit(fpm);

MKL_INT m0 = 60;
double emin = 2;
double emax = 150;

//output
double epsout = 0, res = 0;
double *eigs = (double*) mkl_malloc(m0 * sizeof(double), MKL_ALIGN);
double *eigv = (double*) mkl_malloc(m0 * n * sizeof(double), MKL_ALIGN);
MKL_INT loop=0, m=0, info = 0;

Setting a data breakpoint suggests it happens at mkl_core.dll!000007fee5371abf()  

I thought I had the wrong size for one of the arguments, but they look ok, and there doesn't appear to be a variable adjacent on the stack.

Test project is attached, VS2010 with MKL 11.0.3

Anyone seen anything similar?

 

 

 

 

 

AttachmentSize
Downloadeigentest.zip13.77 MB

Error running NumPy example: MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.

$
0
0

Hi,

I built NumPy and SciPy according to the instructions posted at http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl. I'm using the free version of Composer XE 2013 for Linux. I have added the appropriate directories to LD_LIBRARY_PATH and updated ldconfig. NumPy and SciPy both seem to build and install just fine, but when I run the example program at the end of the page, it fails at the first matrix multiplication line C = A*B (just before start = time.time() ) with the error MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.

I've tried everything I can think of, but I still can't get rid of the error. I recently applied the MKL patch here http://software.intel.com/en-us/articles/svd-multithreading-bug-in-mkl could this have something to do with it?

Thanks,
Matt

MKL buffer management and linking issues (incl crashes)

$
0
0

Our application MOSEK links with the static version of MKL and we have some issues in that regard.

Note that our apllication is .so or DLL that is linked with other applications by our users and those users may also use MKL. For instance our application is linked to MATLAB (www.mathworks.com) that also uses MKL.. 

1. The first issue is that you say mkl_free_buffers can always be called i.e.

   http://software.intel.com/en-us/forums/topic/277599

In our exprience that is not the case if MKL is called form multiple threads because then the application may crash. Should we always be able to mkl_free_buffers unconditoinally?

2. In Linux 64bit using Intel C 13.0.0 our application runs fine if we do not call mkl_thread_free_buffers (we use that function instead mkl_free_buffers because that has issues mention under 1). However, if we do call mkl_thread_free_buffers it crashes. Should it not always work?

3. It seems if we call mkl_disable_fasst_mm that the problems goes away. However, if we do that then is only our static MKL library affacted? Or is the users application also affected e.g. MATLAB.

4. Do you have any information about how to dela with situation where an application may me linked with two diffrent version of MKL. One static and one dynamic for instance.

To us seems the buffer management is major pain to get information about and figuring out how it works. Can you shed any light on the issues we having.

Error mesage when running Intel® Optimized LINPACK Benchmark for Linux* OS on Intel Phi cards.

$
0
0

Hi,

I am trying Intel® Optimized LINPACK Benchmark for Linux* OS on Multi-Intel Phi cards configuration.

 (http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-D15B5C2F-07AC-4449-B148-6AF1DFDE674D.htm).

 

My test environment :

  1. AIC Sandy Bridge EP-4S server system with Sandy Bridge EP-4S *4 + 98GB memory
  2. Intel Xeon Phi : 3 pcs of 3110 and 4 pcs of 3115
  3. OS: Redhat Enterprise Linux 6.2 x64
  4. Xeon Phi MPSS: KNC_gold_update_2-2.1.5889-16-rhel-6.2.tar
  5. Intel Composer XE : l_ccompxe_2013.3.163.tgz
  6. Intel MPI : l_mpi_p_4.1.0.024.tgz or l_mpi_p_4.1.0.030.tgz

After ran the runme_xeon64_ao script to enables acceleration by offloading computations to Intel Xeon Phi coprocessors available on the system, I found that when I increase the HPL problem size(Ns) to a arrange, Linpack process(xlinpack_xeon64) will run endlessly and can’t be finished and found some relevant error message in host system log . For example, at 7 pcs Phi configuration, I got this problem when I set HPL problem size(Ns) to 46000. It related to Phi card quantity. At 1 pcs Phi configuration, I can increase HPL problem size(Ns) to 100000 without problem.

 

The below is error message:

 

__scif_fence_wait 3041 err -16

dma_mark_wait 1080 TO chan 0x0

drain_dma_intr 1151 err -16

micscif_rma_destroy_temp_windows 2082 DMA channel 0 hung ep->state 2 window->dma_mark 0x1c0 channel_mark 0x1c2

------------[ cut here ]------------

WARNING: at /home/build/sandbox/mpss/MPSS_4982/k1om/rhel-6.2/mpss/.rpmbuild_4982/BUILD/intel-mic-kmod-2.1.4982/micscif_rma.c:2084 micscif_rma_destroy_temp_windows+0x314/0x540 [mic]() (Not tainted)

Hardware name: SB301-TO

Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 mic(U) microcode sg ixgbe dca mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp e1000e i2c_i801 i2c_core ext4 mbcache jbd2 sr_mod cdrom usb_storage sd_mod crc_t10dif ahci isci libsas scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 2812, comm: SCIF_MISC Not tainted 2.6.32-220.el6.x86_64 #1

Call Trace:

 [<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0

 [<ffffffff81069bca>] ? warn_slowpath_null+0x1a/0x20

 [<ffffffffa0235664>] ? micscif_rma_destroy_temp_windows+0x314/0x540 [mic]

 [<ffffffffa02321b5>] ? micscif_rma_handle_remote_fences+0x155/0x380 [mic]

 [<ffffffff814eca40>] ? thread_return+0x4e/0x77e

 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20

 [<ffffffffa022a0f0>] ? micscif_misc_handler+0x0/0xc0 [mic]

 [<ffffffffa022a10a>] ? micscif_misc_handler+0x1a/0xc0 [mic]

 [<ffffffffa022a0f0>] ? micscif_misc_handler+0x0/0xc0 [mic]

 [<ffffffff8108b2b0>] ? worker_thread+0x170/0x2a0

 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40

 [<ffffffff8108b140>] ? worker_thread+0x0/0x2a0

 [<ffffffff81090886>] ? kthread+0x96/0xa0

 [<ffffffff8100c14a>] ? child_rip+0xa/0x20

 [<ffffffff810907f0>] ? kthread+0x0/0xa0

 [<ffffffff8100c140>] ? child_rip+0x0/0x20

---[ end trace e0d2c31584645743 ]---

RSSTop

memory of MKL routines

$
0
0

 1 Does the pdposv routine, which solves a symmetric positive definite system of linear equations, allocate any other large memory except the matrix A and B?

2  To solves a symmetric positive definite system of linear equations, which routine needed addtional memory  is   the least ?  and how many about?

Intel(R) Composer XE 2013 SP1 Beta has begun

$
0
0

Hello Intel MKL users,

The beta program for our compilers and libraries has begun. The Intel(R) Composer XE 2013 SP1 beta contains Intel(R) MKL 11.1 beta in which we've expanded our support of Intel(R) Xeon Phi(TM) coprocessor support to Windows hosts and expanded numerical reproducibility to programs with unaligned data. Look through the invite below and join us.

Dear Developer,

You are invited to the Intel® Composer XE 2013 SP1 beta program for Intel compilers and libraries. Please reply to this email for beta participation questions or to opt-out of further contact with Intel.

The products contain exciting new technologies and useful improvements to Intel¹s existing software development tools:

• New install option that dynamically downloads and installs only the components you select.
• Preview feature offering Windows-based C and C++ support for Intel® Graphics Technology for 32-bit Windows* applications. See the product release notes for limitations.
• New debugging support on Linux* and OS X* via the GNU Project Debugger* (GDB*) with Intel extensions for branch tracing, data race detection, Pointer Checker support, and support for Intel® Transactional Synchronization Extensions.
• Compiler support for SIMD and target constructs of upcoming OpenMP* 4.0 features as defined in the RC2 release candidate. Includes support for taskgroup constructs, new forms of atomic capture and update, and improved thread affinity controls.
• New Fortran features including co-array support on Intel® Xeon Phi(TM) coprocessors and user defined derived-type input and output.
• The Conditional Numerical Reproducibility (CNR) feature in the Intel® Math Kernel Library (Intel® MKL) has been extended to provide reproducible results for unaligned memory.
• Other new features include additional C++11 support, and an optional GUI-based installer on Linux*. See the release notes for details.

When you are ready to get started, follow this link to register, and download the beta software. You will be asked to complete a short pre-beta survey: http://softwareproductsurvey.intel.com/survey/150266/123c

For more details and information on this beta program, please read the details below. Additional information on the new features can be found in the FAQ and Whats New documents available upon registration.

We greatly value your input on these new features.

Sincerely,

The Intel Beta Program Team
Developer Products Division
Intel Corporation

=============================================================

Intel® Composer XE 2013 SP1 Beta Details

This beta software is available for IA-32 architecture-based processors and Intel® 64 architecture-based processors for Windows*, Linux*, and OS X* and consists of the following components:

Intel® C++ and Fortran Compilers 14.0 beta
Intel® Math Kernel Library 11.1 beta
Intel® Integrated Performance Primitives 8.0 beta
Intel® Threading Building Blocks 4.1
GNU Project Debugger* (GDB*)
Intel® Debugger Extension for Intel® MIC Architecture Applications

Questions?
Please reply to this email for beta participation questions or to opt-out of further contact with Intel. And speaking of questions, there is a pre- and post-beta questionnaire. Your responses and your feedback during the beta help us engineer the best possible developer tools for your software engineering work.

Beta duration
Beta starts now and ends on July 17, 2013. We ask you to submit issues by June 18 so that we can address them in the planned production release. The beta license provided will expire on Sept. 27, 2013.

Support:
Technical support is available through Intel® Premier Support (http://premier.intel.com). All software for the Intel® Composer XE 2013 SP1 beta is provided through the Intel® Software Development Products Registration Center (http://registrationcenter.intel.com). This includes updates to components. If there are any problems or questions on using either site, please reply to this email.

To enroll in this beta program:
Complete the pre-beta survey and registration following the link: http://softwareproductsurvey.intel.com/survey/150266/123c/?LQID=7&source=cppfor 
- Information collected from the pre-beta survey will be used to evaluate beta testing coverage. The Intel Privacy Policy is available at:http://www.intel.com/sites/corporate/privacy.htm?iid=homepage+ftr_privacy
- Keep the beta product serial number provided for future reference
- Upon registration, you will be taken to the beta download page in the Intel® Software Development Products Registration Center 
- After registration, you will be able to download all available beta products at any time by returning to the Intel Software Development Products Registration Center at http://registrationcenter.intel.com

Notes: At the end of the beta program uninstall all beta product software. Use of the beta software for commercially released products or for externally published performance data is prohibited by the End User License Agreement.

Your next steps:
- Review the Intel® Composer XE 2013 SP1 beta release notes and FAQs
- Install the Intel Composer XE 2013 SP1 beta product(s)
- Try it out and share your experience with us!
- Submit any issues or feedback via Intel® Premier Support early and often!

Intel and Xeon Phi are trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2013, Intel Corporation. All rights reserved.

Brandon Hewitt Technical Consulting Engineer Tools Knowledge Base: http://software.intel.com/en-us/articles/tools Software Product support info: http://www.intel.com/software/support

mkl_mkl_dcsradd appears error

$
0
0

Dear all

I use the function as follows:

call mkl_dcsradd(trans,request,sort,m,n,a,ja,ia,beta,b,jb,ib,c,jc,ic,nzmax,info)

the length of arrays a and b are 3569442 and 1156736,respectively.

m=n=250080

when I choose nzmax=9452356, I find that my sample code cannot run due to this error: program Exception - access violation.

at case of nzmax=1500000 ,the program could run ,but the result is not correct

I want to know  whether it appears Memory overflow due to the value of the nzmax .

 


fftw3 redistribute

$
0
0

Hello,

Now, I purchased intel C++ composer to use mkl.(MSVS 2010, intel 32bit architecture, C/C++ language, MS windows 7)

I'm trying to redistribute functions in fftw3 libraries to colleagues without extra installation of mkl on their own PCs.

I'm using 'fftw_plan_dft_r2c_2d'function, how can my colleagues also use this function?

Which files do I have to share for them to use that function?

I'm new to this kind of activity, so please describe the procedure in detail.

Thank you.

Best regards,

JYSong

 

non-linear optimization routines

$
0
0

Hi,

I am trying to use the trust-region methods w/in mkl 11 and have a couple of questions :

a) what is the layout of the jacobian matrix ?

I expect it to be fortran style matrix with m (mapped -f-  dimension) rows and n columns ( x dimension). Is this correct ?

b) concerning the jacobian calculations :

1) can I completely ignore the provided interface and provide instead with my own evaluation ? Is the jacobian matrix all that I am interested ?

(Unfortunately the examples provided are not very illuminating as far as the part of the API concerning the jacobian valuations. As a result, I still do not understand what it does, besides the obvious/expected ) 

2) if I provide with a NULL pointer, will the jacobian be calculated internally, using numerical differentiation ? -leaving aside for the moment questions of efficiency. I believe not, because a) it is not mentioned in the notes and b) because there is no place to insert the function pointer for the function valuation ( wouldn't that simplify a lot the interface ? The RCI use here seems quite typical, as far as the examples show, at least) - the question still is valid though, since, in this case, the code could request for function values, no ?

3) when calculating the jacobian, one of the inputs is the function values at the point of valuation. As far as I see, from the examples, there is no accompanying evaluation of the function values. Is this always  the case, and I, therefore, can use as fvec input the one held by the pointer passed in the solver routines ?

Finally, is there a more detailed write-up than the one provided in the reference manual ? ( the reference Conn00 comes with a rather heafty toll ;-) ).

Thank you very much in advance, for your help,

Petros

Linking MKL is very hard - can't link using Visual Studio

$
0
0

Visual Studio 2012, Parallel Studio XE 2013, 64-bit Win7. C++ console project, Intel c++13.0 compiler, attempting to compile 64-bit (but same errors occur with 32 bit). Started VS from /start/all programs/Parallel Studio XE 2013/Parallel Studio XE 2013 with VS2012

Trying to run one of the examples, cblas_dgemvx to assess MKL for my project. I need to do very fast matrix-vector operations.

The c file builds but link fails.

Settings are :

Intel performance libraries - Use MKL - sequential, Use ILP64 - yes

Includes:

C:\Program Files (x86)\Intel\Composer XE 2013\mkl\include

Libs:

 C:\Program Files (x86)\Intel\Composer XE 2013\compiler\lib\intel64

C:\Program Files (x86)\Intel\Composer XE 2013\mkl\lib\intel64

Command line additional options for compiler :  /DMKL_ILP64  -I%MKLROOT%/include /Qmkl

Linker inputs (used link line adviser).

mkl_intel_ilp64.lib mkl_sequential.lib mkl_core.lib

no linker command line options.

I followed the instructions in MKL11 docs, "Creating, Configuring, and Running the Intel® C/C++ and/or Visual C++* 2008 Project"

They're out of date but easily modified for VS2012.

Build output appears below. What else do I need to do ? Shouldn't be this hard. The compiler works for non-MKL projects.

 

1>------ Build started: Project: MKL_CBLAS (Intel C++ 13.0), Configuration: Release x64 ------

1>ipo : warning #11021: unresolved PrintVectorD

1>          Referenced in ipo_6504obj3.obj

1>ipo : warning #11021: unresolved PrintArrayD

1>          Referenced in ipo_6504obj3.obj

1>ipo : warning #11021: unresolved PrintParameters

1>          Referenced in ipo_6504obj3.obj

1>ipo : warning #11021: unresolved GetArrayD

1>          Referenced in ipo_6504obj3.obj

1>ipo : warning #11021: unresolved GetVectorD

1>          Referenced in ipo_6504obj3.obj

1>ipo : warning #11021: unresolved GetCblasCharParameters

1>          Referenced in ipo_6504obj3.obj

1>ipo : warning #11021: unresolved GetScalarsD

1>          Referenced in ipo_6504obj3.obj

1>ipo : warning #11021: unresolved GetIntegerParameters

1>          Referenced in ipo_6504obj3.obj

1>ipo : error #11023: Not all components required for linking are present on command line

1>  xilink: executing 'link'

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol GetIntegerParameters referenced in function main

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol GetScalarsD referenced in function main

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol GetCblasCharParameters referenced in function main

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol GetVectorD referenced in function main

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol GetArrayD referenced in function main

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol PrintParameters referenced in function main

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol PrintVectorD referenced in function main

1>ipo_6504obj3.obj : error LNK2019: unresolved external symbol PrintArrayD referenced in function main

1>C:\Users\Rodney\Documents\Visual Studio 2012\Projects\MKL_CBLAS\x64\Release\MKL_CBLAS.exe : fatal error LNK1120: 8 unresolved externals

========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

 

 

 

 

Cluster FFT of 1D array using MPI

$
0
0

Dear. Intel Members,

I have a question about distributing data among processes when using Intel MKL cluster FFT functions for 1D transforms. I am testing 1D FFT of an array of six complex numbers. The result obtained using one process is correct. However, the result obtained using two processes has wrong indices (the numbers are correct).  The results and source codes are pasted below. I am confused about how data are distributed among processes, although I have read through the manual. Any advice is greatly appreciated.

Sirui

________________________________________

one process results:

 local_nx=           6 for rank=           0
 local_x_start=           1 for rank=           0
 local_out_nx=           6 for rank=           0
 local_out_x_start=           1 for rank=           0
 local_size=           6 for rank=           0

 input global array
 (0.0000000E+00,1.000000)           1
 (1.000000,1.000000)           2
 (2.000000,1.000000)           3
 (3.000000,1.000000)           4
 (4.000000,1.000000)           5
 (5.000000,1.000000)           6

 output local array with global index
 (15.00000,6.000000)           1 for rank=           0
 (-3.000000,5.196153)           2 for rank=           0
 (-3.000000,1.732051)           3 for rank=           0
 (-3.000000,0.0000000E+00)           4 for rank=           0
 (-3.000000,-1.732051)           5 for rank=           0
 (-3.000000,-5.196152)           6 for rank=           0
 successfully done

Two processes result:

 local_nx=           4 for rank=           0
 local_x_start=           1 for rank=           0
 local_out_nx=           4 for rank=           0
 local_out_x_start=           1 for rank=           0
 local_size=           4 for rank=           0

 input global array
 (0.0000000E+00,1.000000)           1
 (1.000000,1.000000)           2
 (2.000000,1.000000)           3
 (3.000000,1.000000)           4
 (4.000000,1.000000)           5
 (5.000000,1.000000)           6

 local_nx=           2 for rank=           1
 local_x_start=           5 for rank=           1
 local_out_nx=           2 for rank=           1
 local_out_x_start=           5 for rank=           1
 local_size=           3 for rank=           1

 (-3.000000,0.0000000E+00)           5 for rank=           1
 (-3.000000,-1.732051)           6 for rank=           1
 (-3.000000,-5.196152)           7 for rank=           1
 output local array with global index
 (15.00000,6.000000)           1 for rank=           0
 (-3.000000,5.196153)           2 for rank=           0
 (-3.000000,1.732051)           3 for rank=           0
 (-3.000000,-1.732051)           4 for rank=           0
 successfully done

My source code is the following.

      program main
      USE mmpivardef
      USE MKL_CDFT
      USE mpi

      IMPLICIT NONE
      complex(4), allocatable, dimension(:) :: in,work,in_local
      INTEGER(4), parameter :: N=6
      integer(4) :: i,j,localsize
      INTEGER(4) :: status,local_nx,x_start,local_out_nx,out_xstart
      TYPE(DFTI_DESCRIPTOR_DM), POINTER :: My_Desc1_Handle

      CALL MPI_INIT(ierr)
      CALL MPI_COMM_DUP(MPI_COMM_WORLD,MCW,ierr)
      CALL MPI_COMM_RANK(MCW,rank,ierr)
      CALL MPI_COMM_SIZE(MCW,msize,ierr)

      allocate(in(N))

      status = DftiCreateDescriptorDM(MCW,My_Desc1_Handle, &
                DFTI_SINGLE,DFTI_COMPLEX,1,N)
      status = DftiGetValueDM(My_Desc1_Handle,CDFT_LOCAL_SIZE,localsize)
      status = DftiGetValueDM(My_Desc1_Handle,CDFT_LOCAL_NX,local_nx)
      status = DftiGetValueDM(My_Desc1_Handle,CDFT_LOCAL_X_START, x_start)
      status = DftiGetValueDM(My_Desc1_Handle,CDFT_LOCAL_OUT_NX, local_out_nx)
      status = DftiGetValueDM(My_Desc1_Handle,CDFT_LOCAL_OUT_X_START, out_xstart)

      write(*,*) 'local_nx=',local_nx,'for rank=',rank
      write(*,*) 'local_x_start=',x_start,'for rank=',rank
      write(*,*) 'local_out_nx=',local_out_nx,'for rank=',rank
      write(*,*) 'local_out_x_start=',out_xstart,'for rank=',rank
      write(*,*) 'local_size=',localsize,'for rank=',rank
      write(*,*)

      ALLOCATE(in_local(localsize))
      ALLOCATE(work(localsize))
      status = DftiSetValueDM(My_Desc1_Handle,CDFT_WORKSPACE,work)

      do i=1,N
        j=i-1
        in(i)=cmplx(j,1)
      enddo
      IF (rank.eq.0) THEN
      write(*,*) 'input global array'
      Do i=1,N
        write(*,*) in(i),i
      ENDDO
      ENDIF
      write(*,*)

      DO i=1,localsize
        in_local(i) = in(i+x_start-1)
      ENDDO

      status = DftiCommitDescriptorDM(My_Desc1_Handle)

      status = DftiComputeForwardDM(My_Desc1_Handle,in_local)

      IF (rank.eq.0) write(*,*) 'output local array with global index'
       DO i=1,localsize
        write(*,*) in_local(i),i+x_start-1,'for rank=',rank
       ENDDO

      status = DftiFreeDescriptorDM(My_Desc1_Handle)

      DEALLOCATE(in_local,work,in)

      IF (rank.eq.0) write(*,*) 'successfully done'
      CALL MPI_FINALIZE(ierr)
      end program

Gear Method

$
0
0

The Intel Math Kernel Library has some subroutine to solve coupled ODE's using Gear's Method?

I solved this once ago using DIVPAG subroutine on IMSL Library, but now i'm trying to solve this without use IMSL or any non-free library.

Viewing all 3005 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>