Hi,
I’m running NWCHEM on a large cluster using Qlogic IB; my install script is below. The install completes successfully and runs fine on 1 node using 16 processors (NWCHEM input file and run script are shown below). It even works fine across 4 nodes using all the processors (4 nodes*16 processors = 64 total processors); however, when I go to 5 or more nodes it fails with the following in the output file:
argument 1 = h2o.nw
112:armci_ListenSockAll: listen failed: 0
32:armci_ListenSockAll: listen failed: 0
(rank:112 hostname:node31 pid:117359):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/sockets/sockets.c:armci_ListenSockAll():614 cond:0
(rank:32 hostname:node26 pid:47485):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/sockets/sockets.c:armci_ListenSockAll():614 cond:0
0:Terminate signal was sent, status=: 15
(rank:0 hostname:node24 pid:17912):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0
Could the problem be with the install of NWCHEM, a memory problem, or is it a Qlogic IB problem? Any help would be much appreciated.
Install script:
#!/bin/bash
export NWCHEM_TOP=/home/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=SPAWN
export IB_HOME=/usr
export IB_INCLUDE=/usr/include/infiniband
export IB_LIB=/usr/lib64
export IB_LIB_NAME="-lrt -lnsl -lutil -lrdmacm -libumad -libverbs -ldl -lpthread -lm -Wl,--export-dynamic -lrt -lnsl -lutil"
export MSG_COMMS=MPI
export TCGRSH=/usr/bin/ssh
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/openmpi-1.6-intel
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-lpthread -lmpi_f90 -lmpi_f77 -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil"
export NWCHEM_MODULES="all"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=259738112
export MKLROOT=/opt/intel-12.1/mkl
export MKLPATH=$MKLROOT/lib/intel64
export BLASOPT="-Wl,--start-group $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread -lm"
export FC=ifort
export CC=icc
make nwchem_config
make FC=$FC CC=$CC
NWCHEM input file:
start h2o-test
title "test"
geometry units angstrom
O -0.23615400 0.02856700 3.59321400
H -0.46943300 -0.85241100 3.91668900
H -1.02284300 0.57807900 3.71903800
end
basis
* library 6-31G
end
driver
maxiter 2000
end
dft
xc b3lyp
end
scratch_dir /scratch/NWCHEM
ecce_print ecce.out
task dft optimize
NWCHEM run script:
#!/bin/bash
#SBATCH --time=4:00:00
#SBATCH -N 4
#SBATCH --job-name h2o
nodes=4
cores=16
mpiexec -npernode $cores -n $(($cores*$nodes)) /home/nwchem-6.1.1-src/bin/LINUX64/nwchem h2o.nw > h2o.out