NWChem Efficiency


Click here for full thread
Clicked A Few Times
Hi,
I’m wondering if I have NWChem running efficiently/correctly. As a test, I determined how long it takes to do a single-point B3LYP/6-31G* calculation on PCBM (88 atoms) with NWChem (6.1.1) and Gaussian09. Here are the results:

Gaussian09 – 1 node; 16 cores; Total Time ~8 mins
NWChem – 4 node; 64 cores; Total Time ~26 mins

What is the deal? This is a huge difference… I can’t run G09 on multiple nodes to do a direct comparison; however, I started the same NWChem calculation on 1 node (16 cores) and it took over an hour. Below is my NWChem install file:

#!/bin/bash
  
export NWCHEM_TARGET=LINUX64
 
export ARMCI_NETWORK=MPI-MT
export IB_HOME=/usr
export IB_INCLUDE=/usr/include/infiniband
export IB_LIB=/usr/lib64
export IB_LIB_NAME="-lrdmacm -libumad -libverbs -lpthread -lrt"
export MSG_COMMS=MPI
export TCGRSH=/usr/bin/ssh
export SLURM=y
 
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/mvapich2-intel-psm-1.7
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-lpthread -L$MPI_LIB -lmpichf90 -lmpich -lmpl -lrdmacm -libverbs"
export MV2_ENABLE_AFFINITY=0
 
 
export NWCHEM_MODULES="all"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=259738112 
 
export MKLROOT=/opt/intel-12.1/mkl
export HAS_BLAS=yes
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"
 
export FC=mpif90
export CC=mpicc
 
make nwchem_config


Do I have something configured wrong?