NWChem MPI-PR does not work with OpenMPI 2.1.1.


Clicked A Few Times
I just compiled OpenMPI ver 2.1.1 using the most basic configuration for a KNL cluster interconnected with Mellanox EDR IB.
"../configure --prefix=/home/users/astar/ihpc/chiensh/apps/openmpi/2.1.1 CC=icc FC=ifort CXX=icpc -without-slurm --with-verbs=/usr --with-hwloc=/home/users/astar/ihpc/chiensh/apps/hwloc/1.11.7  --enable-mpi-thread-multiple"

The MPI has been tested and verified by different MPI benchmarks, so it should be working correctly.

and then I build GA-5.6.1 using this configuration:
"../ga-5.6.1/configure --prefix=/home/users/astar/ihpc/chiensh/nwchem-29377-openmpi/src/tools/install --with-tcgmsg --with-mpi="-I/home/users/astar/ihpc/chiensh/apps/openmpi/2.1.1/include -L/opt/pbs/default/lib -L/home/users/astar/ihpc/chiensh/apps/openmpi/2.1.1/lib -L/home/users/astar/ihpc/chiensh/apps/openmpi/2.1.1/lib -lmpi_mpifh -lmpi_usempif08 -lmpi -lpthread" --enable-peigs --enable-underscoring --disable-mpi-tests --with-scalapack8="-L/home/users/app/intel_psxe_2017_update4/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl -lmkl_scalapack_ilp64 -lmkl_blacs_openmpi_ilp64" --with-lapack="-L/home/users/app/intel_psxe_2017_update4/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl" --with-blas8="-L/home/users/app/intel_psxe_2017_update4/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl" CC=icc CXX=icpc F77=ifort FFLAGS=-no-vec CFLAGS=-no-vec INTEL_64ALIGN=1 ARMCI_DEFAULT_SHMMAX_UBOUND=131072 --with-mpi-pr"


and then link it to NWChem, but it is unable to run with the following error messages:

chiensh@r1i0n27:~/nwtest$ mpirun -machinefile hosts  -np 2 --mca btl ^sm --mca shmem_mmap_enable_nfs_warning 0 --mca orte_base_help_aggregate 0   $HOME/nwchem-29377-openmpi/bin/LINUX64/nwchem W2.nw
manpath: warning: $MANPATH set, inserting /etc/man_db.conf
[0] ../../ga-5.6.1/comex/src-mpi-pr/groups.c:462: comex_group_init: Assertion `0 == status' failed[0] Received an Error in
Communication: (-1) comex_assert_fail
[1] ../../ga-5.6.1/comex/src-mpi-pr/groups.c:462: comex_group_init: Assertion `0 == status' failed[1] Received an Error in
Communication: (-1) comex_assert_fail
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------



P.S. I also built GA 5.5 and 5.6, but all of them fail to run with the same error.


Forum >> NWChem's corner >> Running NWChem