Hi I trying to running NWchem 6.1.1 in a cluster, I compiled NWChem in my local user directory, Here are the environment variables I used to compile :
export NWCHEM_TOP="/home/diego/Software/NWchem/nwchem-6.1.1"
export TARGET=LINUX64
export LARGE_FILES=TRUE
export ENABLE_COMPONENT=yes
export TCGRSH=/usr/bin/ssh
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export LIB_DEFINES="-DDFLT_TOT_MEM=16777216"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export IB_HOME=/usr
export IB_INCLUDE=$IB_HOME/include/infiniband
export IB_LIB=$IB_HOME/lib64
export IB_LIB_NAME="-libumad -libverbs -lpthread -lrt"
export ARMCI_NETWORK=OPENIB
export MKLROOT="/opt/intel/mkl"
export MKL_INCLUDE=$MKLROOT/include/intel64/ilp64
export BLAS_LIB="-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm"
export BLASOPT="$BLAS_LIB"
export BLAS_SIZE=8
export SCALAPACK_SIZE=8
export SCALAPACK="-L$MKLROOT/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"
export SCALAPACK_LIB="$SCALAPACK"
export USE_SCALAPACK=y
export MPI_HOME=/opt/intel/impi/4.0.3.008
export MPI_LOC=$MPI_HOME
export MPI_LIB=$MPI_LOC/lib64
export MPI_INCLUDE=$MPI_LOC/include64
export LIBMPI="-lmpigf -lmpigi -lmpi_ilp64 -lmpi"
export CXX=/opt/intel/bin/icpc
export CC=/opt/intel/bin/icc
export FC=/opt/intel/bin/ifort
export PYTHONPATH="/usr"
export PYTHONHOME="/usr"
export PYTHONVERSION="2.6"
export USE_PYTHON64=y
export PYTHONLIBTYPE=so
export MPICXX=$MPI_LOC/bin/mpiicpc
export MPICC=$MPI_LOC/bin/mpiicc
export MPIF77=$MPI_LOC/bin/mpiifort
input file :
start
memory global 1000 mb heap 100 mb stack 600 mb
title "ZrB10 CCSD(T) single point"
echo
scratch_dir /scratch/users
charge -1
geometry units angstrom
Zr 0.00001 -0.00002 0.12043
B 2.46109 0.44546 -0.10200
B 2.25583 -1.07189 -0.09994
B 1.19305 -2.20969 -0.10354
B -0.32926 -2.46629 -0.09796
B -1.72755 -1.82109 -0.10493
B -2.46111 -0.44543 -0.10198
B -2.25583 1.07193 -0.09983
B -1.19306 2.20972 -0.10337
B 0.32924 2.46632 -0.09779
B 1.72753 1.82112 -0.10485
end
scf
DOUBLET; UHF
THRESH 1.0e-10
TOL2E 1.0e-8
maxiter 200
end
tce
ccsd(t)
maxiter 200
freeze atomic
end
basis
Zr library def2-tzvp
B library def2-tzvp
end
ecp
Zr library def2-ecp
end
task tce energy
pbs submit file:
c#!/bin/bash
#PBS -N ZrB10_UHF
#PBS -l nodes=10:ppn=16
#PBS -q CA
BIN=/home/diego/Software/NWchem/nwchem-6.1.1/bin/LINUX64
source /opt/intel/impi/4.0.3.008/bin/mpivars.sh
source /home/diego/Software/NWchem/vars
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/impi/4.0.3/intel64/lib
#ulimit -s unlimited
#ulimit -d unlimited
#ulimit -l unlimited
#ulimit -n 32767
export ARMCI_DEFAULT_SHMMAX=8000
#export MA_USE_ARMCI_MEM=TRUE
cd $PBS_O_WORKDIR
NP=`(wc -l < $PBS_NODEFILE) | awk '{print $1}'`
cat $PBS_NODEFILE |sort|uniq> mpd.hosts
time mpirun -f mpd.hosts -np $NP $BIN/nwchem ZrB10.nw > ZrB10.log
exit 0
memory for procesador 2GB of RAM, in 16 proc with 32GB of RAM in 10 nodes
and other ticks :
kernel.shmmax = 68719476736
the file error is
Last System Error Message from Task 32:: Cannot allocate memory
(rank:32 hostname:node32 pid:27391):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void *)0))
Varying stack, heap or global and ARMCI_DEFAULT_SHMMAX does not really change anything (if I set them low, then another error occurs). Setting MA_USE_ARMCI_MEM = y/n does not have any effect.
ldd /home/diego/Software/NWchem/nwchem-6.1.1/bin/LINUX64/nwchem :
linux-vdso.so.1 => (0x00007ffff7ffe000)
libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x0000003f3aa00000)
libmkl_scalapack_ilp64.so => not found
libmkl_intel_ilp64.so => not found
libmkl_sequential.so => not found
libmkl_core.so => not found
libmkl_blacs_intelmpi_ilp64.so => not found
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003f39200000)
libm.so.6 => /lib64/libm.so.6 (0x0000003f38600000)
libmpigf.so.4 => not found
libmpi_ilp64.so.4 => not found
libmpi.so.4 => not found
libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x000000308aa00000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x000000308a600000)
librt.so.1 => /lib64/librt.so.1 (0x0000003f39a00000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003f3c200000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003f38e00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003f38a00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ffff7dce000)
/lib64/ld-linux-x86-64.so.2 (0x0000003f38200000)
So what could be the reason for the failure? Any help would be appreciated.
Diego
|