Hi dear Intel
Now I'm using 'MKL2017 update 1' and 'MPICH3.1.4'.
And I have 2 machines with 512GB memory in each.
When I tired to solve the SPD sparse matrix having 488 and 1500 million elements with thease two machines, MKL showed the error code as the following....
===============================================================================================
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1610)........: MPI_Bcast(buf=0x2ab9fd981080, count=438400118, MPI_LONG_LONG_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1462)...:
MPIR_Bcast(1486)........:
MPIR_Bcast_intra(1295)..:
MPIR_Bcast_binomial(252): message sizes do not match across processes in the collective routine: Received -32766 but expected -787766352
[proxy:0:0@phas0007] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:0@phas0007] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0@phas0007] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@phas0007] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@phas0007] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@phas0007] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@phas0007] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
===============================================================================================
(However, in 100 million case, the 'cluster_sparse_solver_64' function was well behaved.)
Could you please let me know the way how I deal with this problem.
Thank you so much in advance.
Have a nice day.
Regards,
Yong-hee
P.S. If I use the intel MPI instead of the MPICH, will it be possible to see the good result?