Regarding timings


Clicked A Few Times
Hi NWCHEM users,

I am a novice of nwchem. I am little worried about the scaling (timings) while running in parallel.

nwchem with latest version ( Nwchem-6.3.revision1 ) with intel MKL BLAS + SCALAPACK.

                                                      1. CPU info #########################

vendor_id  : GenuineIntel
cpu family  : 6
model  : 23
model name  : Intel(R) Xeon(R) CPU X5472 @ 3.00GHz
stepping  : 6
cpu MHz  : 3000.041
cache size  : 6144 KB




                                1. timings ###########################################
1 proc- 154 sec
2 proc- 96 sec
4 proc- 52 sec
8 proc- 69 sec not happy with timings




input file:
echo

title monomer
start monomer

geometry
C                    -0.00100000   -0.00800000    0.00400000
C 0.00300000 -0.01000000 1.40500000
C 1.20400000 -0.00900000 2.11700000
C 2.41600000 -0.00700000 1.41300000
C 2.43200000 -0.00600000 0.01600000
C 1.21600000 -0.00800000 -0.68100000
H -0.96900000 -0.01300000 1.88700000
H 3.31900000 -0.00800000 2.01300000
H 1.28400000 -0.01000000 -1.76300000
C -1.35400000 -0.01000000 -0.66600000
C 1.30200000 -0.01100000 3.62300000
C 3.68800000 -0.00600000 -0.82000000
O -2.38800000 -0.03900000 -0.00100000
O 3.63000000 -0.02100000 -2.04900000
O 2.39600000 -0.01400000 4.18600000
N -1.38200000 0.02500000 -2.02800000
N 4.88200000 0.01400000 -0.16300000
N 0.13600000 -0.00600000 4.32900000
H -0.55900000 0.05800000 -2.60500000
H -2.28700000 0.02400000 -2.47400000
H -0.77600000 -0.01300000 3.90500000
H 0.20300000 -0.01300000 5.33600000
H 4.97000000 0.02500000 0.83900000
H 5.72100000 0.01200000 -0.72400000
end

basis
* library 6-31g*
end

scf
semidirect
thresh 1.0e-6
end

dft
XC b3lyp
end

task dft




build script

export NWCHEM_TOP=/home/karteek/softwares/test/nwchem-6.3-src.2013-05-28
export LARGE_FILES=TRUE
export ENABLE_COMPONENT=yes
export TCGRSH=/usr/bin/ssh
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all

export CC=icc
export FC=ifort

export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y

export MPI_LOC=/opt/intel/impi/4.0.2.003/intel64
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI=" -lmpigf -lmpigi -lmpi_ilp64 -lmpi -lpthread -lm"


export PYTHONHOME=/usr/bin/python-2.6
export PYTHONVERSION=2.6
export USE_PYTHON64=y


export MKL_HOME=/opt/intel/mkl
export MKL_LIB=$MKL_HOME/lib/intel64
export MKL_INCLUDE=$MKL_HOME/include/intel64/ilp64

export HAS_BLAS=y
export USE_SCALAPACK=y

export BLASOPT="-L$MKL_LIB -lmkl_blas95_ilp64 -lmkl_solver_ilp64_sequential -lmkl_sequential -lmkl_core -lmkl_intel_ilp64 -lpthread -lm"
export BLAS_SIZE=8
export SCALAPACK_SIZE=8

export SCALAPACK="-L$MKL_LIB -lmkl_scalapack_ilp64 -lmkl_lapack95_ilp64 -lmkl_core -lmkl_sequential -lmkl_intel_ilp64 -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"


make nwchem_config


make > make.log




looking forward for the help... Thanks in advance

Karteek Kumar

Forum Vet
running 8 processes on a 4 core system?
Karteek
The CPU you are using has only four cores (Intel documentation at
http://ark.intel.com/products/34447
)

The scalability you can get is limited by the available hardware resources.

Cheers, Edo

Clicked A Few Times
through, /proc/cpuinfo I can see 8 processors..

on the same machine, I ran several other software on 8 proc.

and top and pressing 1, i can see 8 cpu's.

Thanks
Karteek

Forum Vet
Have tried a run with 6 processes?
Edo

Clicked A Few Times
I have just tried it on 6 Proc,

it finishes in 51 sec

Thanks
karteek

Forum Vet
Have you tried running direct (ie replacing the semidirect line with direct)?

Clicked A Few Times
with direct option,

timings:

1 proc - 316 sec
2 proc - 201 sec
4 proc - 102 sec
6 proc - 81 sec
8 proc - 72 sec


Thanks
karteek

Forum Vet
Quote:Karteek Aug 13th 8:04 pm
with direct option,

timings:

1 proc - 316 sec
2 proc - 201 sec
4 proc - 102 sec
6 proc - 81 sec
8 proc - 72 sec


This looks better.
Is your computer an SGI ICE?
My deductions is (from the benchmark number you reported) that with 8 processes you were saturating the
I/O bandwidth available on the available filesystem.

Cheers, Edo

Clicked A Few Times
Thanks Edo,

I have tested compiling on different computer ( memory is very large)

CPUINFO:

vendor_id  : AuthenticAMD
cpu family  : 21
model  : 1
model name  : AMD Opteron(tm) Processor 6282 SE
stepping  : 2
cpu MHz  : 2599.926
cache size  : 2048 KB

build file : same as above

same input file

timings with semidirect option
1proc - 186 sec
2proc - 103 sec
4proc - 53 sec
6proc - 39 sec
8proc - 31 sec
12proc - 27 sec
16proc - 22 sec
20proc - 20 sec
24proc - 19 sec
32proc - 18 sec


After 16 proc, there seems to no significant scaling..

Is there any wrong thing happen while compilation??

In my build file, I have path for BLAS, SCALPACK. when i checked src/tools/build/config.log, its using internal LAPACK.. Is this correct?


Thanks in advance

Karteek


Forum >> NWChem's corner >> General Topics