MPI errors Centos 7.3


Clicked A Few Times
Trying to get NWChem 6.6 working on our HPC Grid running Centos 7.3.

Tried both OpenMPI and MPICH 3.0 as supplied via the Centos distribution. I can get both to compile, but they die during runtime with the following results:

Openmpi:
2:ga_iter_lsolve: dgesv failed:Received an Error in Communication


MPI_ABORT was invoked on rank 2 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.




MPICH:
0:geom_binvr: dsyev failed:Received an Error in Communication
application called MPI_Abort(comm=0x84000004, 0) - process 0


I've run multiple MPI test suites and as far as I can tell the core OpenMPI and MPICH infrastructures are working properly within the HPC Grid environment.


OpenMPI version compiled with:
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export NWCHEM_TOP=/mnt/research/deej/src/nwchem/6.6/nwchem-6.6

export USE_NOFSCHECK=TRUE
export USE_NOIO=TRUE

export ARMCI_NETWORK=MPI_TS
export LARGE_FILES=TRUE
export MRCC_THEORY=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=1677721600

export USE_PYTHONCONFIG=y
export PYTHONHOME=/usr
export PYTHONVERSION=2.7
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

export BLASOPT="-L/usr/lib64/atlas -llapack -lf77blas -latlas"
export HAS_BLAS=y

export SCALAPACK="-L/usr/lib64/openmpi/lib -lscalapack -lmpiblacs"
export SCALAPACK_LIBS="-L/usr/lib64/openmpi/lib -lscalapack -lmpiblacs"
export USE_SCALAPACK=y

export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib/:$LD_LIBRARY_PATH
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib64/openmpi
export MPI_LIB=/usr/lib64/openmpi/lib
export MPI_INCLUDE=/usr/include/openmpi-x86_64
export LIBMPI="-pthread -m64 -I/usr/lib64/openmpi/lib -Wl,-rpath -Wl,/usr/lib64/openmpi/lib -Wl,--enable-new-dtags -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh -lmpi"

make FC=mpif90 CC=mpicc nwchem_config NWCHEM_MODULES="all python"
make FC=mpif90 CC=mpicc >& make.log



MPICH version compiled with:

export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export NWCHEM_TOP=/mnt/research/deej/src/nwchem/6.6/nwchem-6.6

export USE_NOFSCHECK=TRUE
export USE_NOIO=TRUE

export ARMCI_NETWORK=MPI_TS
export LARGE_FILES=TRUE
export MRCC_THEORY=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=1677721600

export USE_PYTHONCONFIG=y
export PYTHONHOME=/usr
export PYTHONVERSION=2.7
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

export BLASOPT="-L/usr/lib64/atlas -llapack -lf77blas -latlas"
export HAS_BLAS=y

export SCALAPACK="-L/usr/lib64/mpich/lib -lscalapack -lmpiblacs"
export SCALAPACK_LIBS="-L/usr/lib64/mpich/lib -lscalapack -lmpiblacs"
export USE_SCALAPACK=y

export PATH=/usr/lib64/mpich/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/mpich/lib/:$LD_LIBRARY_PATH
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib64/mpich
export MPI_LIB=/usr/lib64/mpich/lib
export MPI_INCLUDE=/usr/include/mpich-x86_64
export LIBMPI="-lmpichf90 -lmpich -lopa -lmpl -lrt -lpthread"

make nwchem_config NWCHEM_MODULES="all python"
make FC=/usr/lib64/mpich/bin/mpif90 >& make.log


Any help would be greatly appreciated!

Forum Vet
Please do the following

unset MPI_LIB
unset LIBMPI
unset MPI_INCLUDE
cd $NWCHEM_TOP/src
rm -f 32_to_64 64_to_32
make 64_to_32
export BLAS_SIZE=4
export SCALAPACK_SIZE=4
export USE_64TO32=y
rm -rf tools/build tools/install
make >& make.log 


Further information can be find at

http://nwchemgit.github.io/index.php/Compiling_NWChem#NWChem_6.6_on_Centos_7.1

http://nwchemgit.github.io/index.php/Compiling_NWChem#Optimized_math_libraries

Clicked A Few Times
Thank you for the assistance.

I had started out by using the Centos 7.1 instructions that you mention, but couldn't get a clean compile, which is why I started added the MPI and other environment variables. However, your comments made me move back towards a more minimalist config, and the following seems to work, resulting in an OpenMPI aware binary:

Start with a clean source tree by untarring from the original file.

Patch as appropriate.

export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export NWCHEM_TOP=/mnt/research/deej/src/nwchem/6.6/nwchem-6.6

export USE_64TO32=y
export USE_NOFSCHECK=TRUE
export USE_NOIO=TRUE

export ARMCI_NETWORK=MPI_TS
export LARGE_FILES=TRUE

export USE_PYTHONCONFIG=y
export PYTHONHOME=/usr
export PYTHONVERSION=2.7

export BLAS_SIZE=4
export BLASOPT="-L/usr/lib64/atlas -llapack -lf77blas -latlas"
export HAS_BLAS=y

export SCALAPACK_SIZE=4
export SCALAPACK="-L/usr/lib64/openmpi/lib -lscalapack -lmpiblacs"
export SCALAPACK_LIBS="-L/usr/lib64/openmpi/lib -lscalapack -lmpiblacs"

export USE_MPI=y

make nwchem_config NWCHEM_MODULES="all python"
make 64_to_32
make >& make.log


Thanks again!

Forum Vet
I am glad you made good progress.
I strongly suggest you to apply the patches from
http://nwchemgit.github.io/index.php/Download#Patches_for_the_27746_revision_of_NWChem_6.6
Since you are using OpenMPI, the following patch might help you to avoid segmentation faults at startup
http://nwchemgit.github.io/download.php?f=Ga_argv.patch.gz

Clicked A Few Times
Thanks! We do have most of the patches on that page applied:

Config_libs66.patch
Cosmo_dftprint.patch
Cosmo_meminit.patch
Dplot_tolrho.patch
Driver_smalleig.patch
Ga_argv.patch
Ga_defs.patch
Gcc6_optfix.patch
Raman_displ.patch
Sym_abelian.patch
Tddft_mxvec20.patch
Txs_gcc6.patch
Xatom_vdw.patch
Xccvs98.patch
Zgesvd.patch


Forum >> NWChem's corner >> Compiling NWChem