3:58:12 PM PDT - Thu, Aug 23rd 2012 |
|
Hi,
Hopefully someone can help me to get NWChem running in parallel. I have a infiniband connection with qlogic; therefore, I have been trying to compile with ARMCI_NETWORK=MPI-SPAWN. Below is my install script. When stalling everything finishes fine; I can even run parallel within one node. However, when I try using more then one node I get the following error in my .out file:
argument 1 = PCBM.nw
chama18.40582ipath_userinit: assign_context command failed: Network is down
chama17.41893ipath_userinit: assign_context command failed: Network is down
0:Terminate signal was sent, status=: 15
(rank:0 hostname:chama17 pid:41859):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0
Any suggestions?
- !/bin/bash
module unload openmpi-intel/1.4
module load openmpi-intel/1.6
export NWCHEM_TOP=/home/mefoste/chama/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=MPI-SPAWN
export IB_HOME=/usr
export IB_INCLUDE=/usr/include
export IB_LIB=/usr/lib64
export IB_LIB_NAME="-libverbs -libumad -lpthread "
export MSG_COMMS=MPI
export TCGRSH=/usr/bin/ssh
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/openmpi-1.6-intel
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-lpthread -L$MPI_LIB -lmpi_f90 -lmpi_f77 -lmpi"
export NWCHEM_MODULES="all"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647
export MKLROOT=/opt/intel-12.1/mkl
export HAS_BLAS=yes
export BLASOPT="-Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"
export FC=ifort
export CC=icc
cd $NWCHEM_TOP/src
make nwchem_config
make FC=$FC CC=$CC
|