Solved: Nwchem 6.3 running 2-5 times slower than 6.1.1


Click here for full thread
Gets Around
I've experimented a little bit with nwchem 6.3 (17 May release), and it appears to run much slower than 6.1.1.

For a benchmark calculation which typically takes only about 40 seconds in 6.1.1 and 6.1, I end up with 190 seconds in nwchem 6.3. All cores are engaged though, and there's nothing odd in top.

The input file is shown below:
scratch_dir /scratch
start benzene 

geometry units angstroms
C  0.100  1.396  0.000
C  1.209  0.698  0.000
C  1.209 -0.698  0.000
C  0.000 -1.396  0.000
C -1.209 -0.698  0.000
C -1.209  0.698  0.000
H  0.000  2.479  0.000
H  2.147  1.240  0.000
H  2.147 -1.240  0.000
H  0.000 -2.479  0.000
H -2.147 -1.240  0.000
H -2.147  1.240  0.000
end

basis
 H library "6-31+g*" 
 c library "6-31+g*"
end
dft
        direct
end

task dft optimize


The hardware in that particular case is AMD FX 8150 (8 cores)/32 Gb RAM running as a local calculation on debian wheezy/stable with openmpi 1.3 and ACML 5.3.1. Nwchem was kompiled as shown here:
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONVERSION=2.7
export PYTHONHOME=/usr
export BLASOPT="-L/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib -lacml"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib"
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran 2> make.err 1>make.log
cd $NWCHEM_TOP/contrib
export FC=gfortran
./getmem.nwchem


I've also tested it on an i5-2400 (four cores)/16 gb ram and openblas instead of ACML, but otherwise identical parameters. With 6.1.1 it takes ca 50 s (cpu)/52 s (wall) vs 126/127 seconds for nwchem 6.3.

Finally, I have tested this on a dual-socket xenon cluster running ROCKS 5.4.3 (based on CentOS 5.6) using openblas. Using six cores out of the available eight (again, it is contained on a single node) I get 254 seconds for nwchem 6.3 vs 121 seconds for nwchem 6.1.1.

In each pair of cases the same build file was used for both 6.1.1 and 6.3.

The question is: is this normal? Is this an issue with nwchem 6.3? Or the way I build it?