NWChem Efficiency


Clicked A Few Times
Hi,
I’m wondering if I have NWChem running efficiently/correctly. As a test, I determined how long it takes to do a single-point B3LYP/6-31G* calculation on PCBM (88 atoms) with NWChem (6.1.1) and Gaussian09. Here are the results:

Gaussian09 – 1 node; 16 cores; Total Time ~8 mins
NWChem – 4 node; 64 cores; Total Time ~26 mins

What is the deal? This is a huge difference… I can’t run G09 on multiple nodes to do a direct comparison; however, I started the same NWChem calculation on 1 node (16 cores) and it took over an hour. Below is my NWChem install file:

#!/bin/bash
  
export NWCHEM_TARGET=LINUX64
 
export ARMCI_NETWORK=MPI-MT
export IB_HOME=/usr
export IB_INCLUDE=/usr/include/infiniband
export IB_LIB=/usr/lib64
export IB_LIB_NAME="-lrdmacm -libumad -libverbs -lpthread -lrt"
export MSG_COMMS=MPI
export TCGRSH=/usr/bin/ssh
export SLURM=y
 
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/mvapich2-intel-psm-1.7
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-lpthread -L$MPI_LIB -lmpichf90 -lmpich -lmpl -lrdmacm -libverbs"
export MV2_ENABLE_AFFINITY=0
 
 
export NWCHEM_MODULES="all"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=259738112 
 
export MKLROOT=/opt/intel-12.1/mkl
export HAS_BLAS=yes
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"
 
export FC=mpif90
export CC=mpicc
 
make nwchem_config


Do I have something configured wrong?

Forum Vet
I see from another thread you have Infiniband with Qlogic, for which you use MPI-MT.

Can you check which compilers were actually used? Using mpif90 and mpicc for FC and CC does not help our configure environment to figure out which compiler is used underneath, which means it will not use compiler optimization flags. If you are using Intel compilers you should be using FC=ifort and CC=icc (or gcc) for the make environment to recognize them and use the best compile flags.

One other cautionary note, make sure you compare apples to apples. For DFT, make sure the same integral screening tolerances are used, and that the same grid size (same number of grid points) is used.

Bert


Quote:Mef362 Oct 3rd 11:56 pm
Hi,
I’m wondering if I have NWChem running efficiently/correctly. As a test, I determined how long it takes to do a single-point B3LYP/6-31G* calculation on PCBM (88 atoms) with NWChem (6.1.1) and Gaussian09. Here are the results:

Gaussian09 – 1 node; 16 cores; Total Time ~8 mins
NWChem – 4 node; 64 cores; Total Time ~26 mins

What is the deal? This is a huge difference… I can’t run G09 on multiple nodes to do a direct comparison; however, I started the same NWChem calculation on 1 node (16 cores) and it took over an hour. Below is my NWChem install file:

#!/bin/bash
  
export NWCHEM_TARGET=LINUX64
 
export ARMCI_NETWORK=MPI-MT
export IB_HOME=/usr
export IB_INCLUDE=/usr/include/infiniband
export IB_LIB=/usr/lib64
export IB_LIB_NAME="-lrdmacm -libumad -libverbs -lpthread -lrt"
export MSG_COMMS=MPI
export TCGRSH=/usr/bin/ssh
export SLURM=y
 
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/mvapich2-intel-psm-1.7
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-lpthread -L$MPI_LIB -lmpichf90 -lmpich -lmpl -lrdmacm -libverbs"
export MV2_ENABLE_AFFINITY=0
 
 
export NWCHEM_MODULES="all"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=259738112 
 
export MKLROOT=/opt/intel-12.1/mkl
export HAS_BLAS=yes
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"
 
export FC=mpif90
export CC=mpicc
 
make nwchem_config


Do I have something configured wrong?

Clicked A Few Times
Thanks for the reply Bert
I know I should compare “apples to apples” but I just didn’t expect the defaults to give such different results. I will use the same integral screening tolerances and grid size next time.

Clicked A Few Times
I tried setting FC=ifort and CC=icc but I would get errors when compiling stating that certain libraries could not be found. I also tried compiling with MPI-SPAWN. It compiled but I had problems running it. It would run on 1 node fine but not on multiple nodes. When executing on multiple nodes nothing would write to the output file; however, the job seemed to be running on all the nodes (verified by using the top commend).

Clicked A Few Times
I can use ifort and icc with openmpi, mkl, and mpi-spawn, but I get a different error:

 argument  1 = PCBM.nw
0:Terminate signal was sent, status=: 15
(rank:0 hostname:xxx25 pid:23193):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0


ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 940
Last System Error Message from Task 0:: No such file or directory
Last System Error Message from Task 1:: No such file or directory
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
nwchem             000000000397DECA  Unknown               Unknown  Unknown
nwchem             000000000397C9C6  Unknown               Unknown  Unknown
nwchem             000000000392DCE0  Unknown               Unknown  Unknown
nwchem             00000000038DA33E  Unknown               Unknown  Unknown
nwchem             00000000038DF793  Unknown               Unknown  Unknown
nwchem             0000000002BB088D  Unknown               Unknown  Unknown
nwchem             0000000002B9250F  Unknown               Unknown  Unknown
nwchem             0000000002BB074B  Unknown               Unknown  Unknown
libc.so.6          00002AAAAB9FF900  Unknown               Unknown  Unknown
libc.so.6          00002AAAABAB32C3  Unknown               Unknown  Unknown
libmpi.so.1        00002AAAAB6B1D52  Unknown               Unknown  Unknown
libmpi.so.1        00002AAAAB6B2B26  Unknown               Unknown  Unknown
libmpi.so.1        00002AAAAB6B29C2  Unknown               Unknown  Unknown
libmpi.so.1        00002AAAAB6E322C  Unknown               Unknown  Unknown
libmpi.so.1        00002AAAAB611C76  Unknown               Unknown  Unknown
mca_coll_tuned.so  00002AAAB226CACB  Unknown               Unknown  Unknown
mca_coll_tuned.so  00002AAAB226C6DB  Unknown               Unknown  Unknown
mca_coll_tuned.so  00002AAAB22643A7  Unknown               Unknown  Unknown
mca_coll_sync.so   00002AAAB205B651  Unknown               Unknown  Unknown
mca_dpm_orte.so    00002AAAB00CA558  Unknown               Unknown  Unknown
libmpi.so.1        00002AAAAB62362D  Unknown               Unknown  Unknown
nwchem             0000000002BA8F5E  Unknown               Unknown  Unknown
nwchem             0000000002B97CEF  Unknown               Unknown  Unknown
nwchem             0000000002B93A89  Unknown               Unknown  Unknown
nwchem             0000000002B9A8A6  Unknown               Unknown  Unknown
nwchem             0000000002BFA4CD  Unknown               Unknown  Unknown
nwchem             0000000002AC25B3  Unknown               Unknown  Unknown
nwchem             000000000041788B  Unknown               Unknown  Unknown
nwchem             00000000004176AC  Unknown               Unknown  Unknown
libc.so.6          00002AAAAB9EBCDD  Unknown               Unknown  Unknown
nwchem             00000000004175A9  Unknown               Unknown  Unknown
Last System Error Message from Task 2:: No such file or directory


I can’t use mpi-mt with openmpi because multi-threading is not supported.


Forum >> NWChem's corner >> General Topics