NWChem 6.1.1 CCSD(T) parallel running


Click here for full thread
Just Got Here
Hi I trying to running NWchem 6.1.1 in a cluster, I compiled NWChem in my local user directory, Here are the environment variables I used to compile :

export NWCHEM_TOP="/home/diego/Software/NWchem/nwchem-6.1.1"
export TARGET=LINUX64
export LARGE_FILES=TRUE
export ENABLE_COMPONENT=yes
export TCGRSH=/usr/bin/ssh
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export LIB_DEFINES="-DDFLT_TOT_MEM=16777216"

export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y

export IB_HOME=/usr
export IB_INCLUDE=$IB_HOME/include/infiniband
export IB_LIB=$IB_HOME/lib64
export IB_LIB_NAME="-libumad -libverbs -lpthread -lrt"
export ARMCI_NETWORK=OPENIB

export MKLROOT="/opt/intel/mkl"
export MKL_INCLUDE=$MKLROOT/include/intel64/ilp64

export BLAS_LIB="-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm"
export BLASOPT="$BLAS_LIB"
export BLAS_SIZE=8
export SCALAPACK_SIZE=8
export SCALAPACK="-L$MKLROOT/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"
export SCALAPACK_LIB="$SCALAPACK"
export USE_SCALAPACK=y

export MPI_HOME=/opt/intel/impi/4.0.3.008
export MPI_LOC=$MPI_HOME
export MPI_LIB=$MPI_LOC/lib64
export MPI_INCLUDE=$MPI_LOC/include64
export LIBMPI="-lmpigf -lmpigi -lmpi_ilp64 -lmpi"

export CXX=/opt/intel/bin/icpc
export CC=/opt/intel/bin/icc
export FC=/opt/intel/bin/ifort

export PYTHONPATH="/usr"
export PYTHONHOME="/usr"
export PYTHONVERSION="2.6"
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

export MPICXX=$MPI_LOC/bin/mpiicpc
export MPICC=$MPI_LOC/bin/mpiicc
export MPIF77=$MPI_LOC/bin/mpiifort


input file :

start
memory global 1000 mb heap 100 mb stack 600 mb 
title "ZrB10 CCSD(T) single point"
echo 
scratch_dir /scratch/users
charge -1
geometry units angstrom
Zr          0.00001        -0.00002         0.12043
B           2.46109         0.44546        -0.10200
B           2.25583        -1.07189        -0.09994
B           1.19305        -2.20969        -0.10354
B          -0.32926        -2.46629        -0.09796
B          -1.72755        -1.82109        -0.10493
B          -2.46111        -0.44543        -0.10198
B          -2.25583         1.07193        -0.09983
B          -1.19306         2.20972        -0.10337
B           0.32924         2.46632        -0.09779
B           1.72753         1.82112        -0.10485
end
scf 
DOUBLET; UHF
THRESH 1.0e-10
TOL2E 1.0e-8
maxiter 200
end 
tce
 ccsd(t)
 maxiter 200
 freeze atomic
end 
basis
Zr library def2-tzvp 
B library def2-tzvp
end
ecp
Zr library def2-ecp
end
task tce energy


pbs submit file:

c#!/bin/bash
#PBS -N ZrB10_UHF
#PBS -l nodes=10:ppn=16
#PBS -q CA
BIN=/home/diego/Software/NWchem/nwchem-6.1.1/bin/LINUX64
source /opt/intel/impi/4.0.3.008/bin/mpivars.sh
source /home/diego/Software/NWchem/vars
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/impi/4.0.3/intel64/lib

#ulimit -s unlimited
#ulimit -d unlimited
#ulimit -l unlimited
#ulimit -n 32767    
export ARMCI_DEFAULT_SHMMAX=8000
#export MA_USE_ARMCI_MEM=TRUE
cd $PBS_O_WORKDIR

NP=`(wc -l < $PBS_NODEFILE) | awk '{print $1}'`

cat $PBS_NODEFILE |sort|uniq> mpd.hosts
time mpirun -f mpd.hosts -np $NP $BIN/nwchem ZrB10.nw > ZrB10.log
exit 0


memory for procesador 2GB of RAM, in 16 proc with 32GB of RAM in 10 nodes
and other ticks :

kernel.shmmax = 68719476736


the file error is

Last System Error Message from Task 32:: Cannot allocate memory


(rank:32 hostname:node32 pid:27391):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void *)0))


Varying stack, heap or global and ARMCI_DEFAULT_SHMMAX does not really change anything (if I set them low, then another error occurs). Setting MA_USE_ARMCI_MEM = y/n does not have any effect.

ldd /home/diego/Software/NWchem/nwchem-6.1.1/bin/LINUX64/nwchem :
        linux-vdso.so.1 =>  (0x00007ffff7ffe000)
        libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x0000003f3aa00000)
        libmkl_scalapack_ilp64.so => not found
        libmkl_intel_ilp64.so => not found
        libmkl_sequential.so => not found
        libmkl_core.so => not found
        libmkl_blacs_intelmpi_ilp64.so => not found
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003f39200000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003f38600000)
        libmpigf.so.4 => not found
        libmpi_ilp64.so.4 => not found
        libmpi.so.4 => not found
        libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x000000308aa00000)
        libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x000000308a600000)
        librt.so.1 => /lib64/librt.so.1 (0x0000003f39a00000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003f3c200000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003f38e00000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003f38a00000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ffff7dce000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003f38200000)

So what could be the reason for the failure? Any help would be appreciated.

Diego