Hi Edo
I am using mpirun with nwchem_openmpi which gives the error in the top post. I also tried mpirun with nwchem_mpich which gives the following error:
Fatal error in PMPI_Errhandler_set: Invalid communicator, error stack:
PMPI_Errhandler_set(117): MPI_Errhandler_set(comm=0x5651f5a0, errh=0x565202c0) failed
PMPI_Errhandler_set(70).: Invalid communicator
Fatal error in PMPI_Errhandler_set: Invalid communicator, error stack:
PMPI_Errhandler_set(117): MPI_Errhandler_set(comm=0xc11955a0, errh=0xc11962c0) failed
PMPI_Errhandler_set(70).: Invalid communicator
Fatal error in PMPI_Errhandler_set: Invalid communicator, error stack:
PMPI_Errhandler_set(117): MPI_Errhandler_set(comm=0xbbb8f5a0, errh=0xbbb902c0) failed
PMPI_Errhandler_set(70).: Invalid communicator
and the output file contained the following:
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[52842,1],3]
Exit code: 1
--------------------------------------------------------------------------
When I run ldd on the nwchem_mpich and nwchem_openmpi binaries I see that they use both libmpi.so.1 and libmpi.so.12, which is what caused the ELPA error for the binary I compiled myself. Getting rid of ELPA solved the problem for me so far for the binary I compiled myself. However, the problem should be solved for the repository packages as well.
|