I'm running this nwchem.nw on one machine with 4CPUs / 8 virtual CPUs:
geometry nocenter noautosym
C 0.2265688 -0.56580271 0.37053473
N 1.53104275 -1.25352933 0.37698414
C 0.24487845 0.94565579 0.82884627
C -1.18278502 1.53035594 0.91007832
C 0.94733822 1.06970643 2.19799655
C -0.22962089 -0.60972299 -1.08178558
O 0.51218343 -0.53807897 -2.07032273
O -1.61832134 -0.67908207 -1.18057738
H -0.50313117 -1.10973073 0.9958513
H 2.01037607 -1.19295784 1.28583059
H 2.11635497 -0.8912782 -0.39447838
H 0.83331353 1.49436672 0.06509178
H -1.741318 1.35179101 -0.02207033
H -1.14539339 2.61527111 1.11156942
H -1.73810741 1.04465878 1.7344188
H 0.9927738 2.12810382 2.50783434
H 1.98011451 0.68259268 2.15005599
H 0.385448 0.50413825 2.96567565
H -1.83911536 -0.65210679 -2.16786412
end
start
basis
* library 3-21G
end
dft
xc xpbe96 cpbe96
mult 1
end
task dft gradient
Parallelizing only speeds it up <2X:
$ time nwchem nwchem.nw
# skip...
real 0m56.131s
user 0m53.700s
sys 0m1.297s
$ time mpirun -n 2 nwchem nwchem.nw
# skip...
real 0m45.799s
user 1m15.131s
sys 0m14.534s
$ time mpirun -n 4 nwchem nwchem.nw
# skip...
real 0m36.546s
user 1m48.988s
sys 0m33.324s
$ time mpirun -n 8 nwchem nwchem.nw
# skip...
real 0m32.027s
user 2m52.518s
sys 1m2.363s
Increasing the number of CPUs causes steep increase in user time.
Is something wrong? Why doesn't it speed up more?
mpich-3.2.1 on FreeBSD
Would using OpenMPI improve the performance in this case?
|