Hi all,
We have to install nwchem for the users on our shared cluster here at the university.
Note that we did not have the following issue on our old CentOS6 system, but we are facing it on our new RedHat7 system:
Linux ... 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
I am using the following h2o input file for the testing (although this issue was the same for some other "basic" input files):
start h2o_freq
charge 1
geometry units angstroms
O 0.0 0.0 0.0
H 0.0 0.0 1.0
H 0.0 1.0 0.0
end
basis
H library sto-3g
O library sto-3g
end
scf
uhf; doublet
print low
end
title "H2O+ : STO-3G UHF geometry optimization"
task scf optimize
basis
H library 6-31g**
O library 6-31g**
end
title "H2O+ : 6-31g** UMP2 geometry optimization"
task mp2 optimize
mp2; print none; end
scf; print none; end
title "H2O+ : 6-31g** UMP2 frequencies"
task mp2 freq
When using 2 MPI processes on a single node, the walltime is around 15 seconds.
Here are the first few and last few lines of the output:
Job information
---------------
hostname = node404.oscar.ccv.brown.edu
program = /gpfs/runtime/opt/nwchem/6.8-openmpi/bin/nwchem
date = Mon Jun 18 11:44:13 2018
compiled = Fri_Jun_15_16:31:11_2018
source = /gpfs/runtime/opt/nwchem/6.8-openmpi/src/nwchem-6.8
nwchem branch = 6.8
nwchem revision = v6.8-47-gdf6c956
ga revision = ga-5.6.3
use scalapack = F
input = h2o.nw
prefix = h2o_freq.
data base = ./h2o_freq.db
status = startup
nproc = 2
time left = 3598s
.
.
.
.
.
----------------------------------------------------------------------------
Normal Eigenvalue || Projected Derivative Dipole Moments (debye/angs)
Mode [cm**-1] || [d/dqX] [d/dqY] [d/dqZ]
------ ---------- || ------------------ ------------------ -----------------
1 -0.000 || -1.131 0.000 0.000
2 -0.000 || 1.701 0.000 0.404
3 -0.000 || -0.651 0.000 1.057
4 0.000 || 0.000 -0.044 0.000
5 0.000 || 0.000 2.480 0.000
6 0.000 || 0.000 2.480 0.000
7 1484.716 || 0.000 0.000 2.112
8 3460.149 || -0.000 0.000 1.877
9 3551.507 || 3.435 0.000 -0.000
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Normal Eigenvalue || Projected Infra Red Intensities
Mode [cm**-1] || [atomic units] [(debye/angs)**2] [(KM/mol)] [arbitrary]
------ ---------- || -------------- ----------------- ---------- -----------
1 -0.000 || 0.055473 1.280 54.077 3.034
2 -0.000 || 0.132537 3.058 129.203 7.249
3 -0.000 || 0.066795 1.541 65.115 3.653
4 0.000 || 0.000084 0.002 0.082 0.005
5 0.000 || 0.266538 6.149 259.834 14.578
6 0.000 || 0.266538 6.149 259.834 14.578
7 1484.716 || 0.193397 4.462 188.533 10.577
8 3460.149 || 0.152660 3.522 148.821 8.349
9 3551.507 || 0.511546 11.802 498.680 27.978
----------------------------------------------------------------------------
vib:animation F
Task times cpu: 8.2s wall: 9.3s
NWChem Input Module
-------------------
Summary of allocated global arrays
-----------------------------------
No active global arrays
GA Statistics for process 0
------------------------------
create destroy get put acc scatter gather read&inc
calls: 1.78e+04 1.78e+04 2.40e+05 5.73e+04 7.71e+04 2485 0 1.39e+04
number of processes/call 1.03e+00 1.04e+00 1.06e+00 0.00e+00 0.00e+00
bytes total: 6.87e+07 4.90e+07 2.00e+07 4.00e+02 0.00e+00 1.11e+05
bytes remote: 5.69e+06 7.90e+06 3.69e+06 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 514056 bytes
...
Total times cpu: 12.3s wall: 14.7s
MA_summarize_allocated_blocks: starting scan ...
MA_summarize_allocated_blocks: scan completed: 0 heap blocks, 0 stack blocks
MA usage statistics:
allocation statistics:
heap stack
---- -----
current number of blocks 0 0
maximum number of blocks 25 51
current total bytes 0 0
maximum total bytes 31471376 22510232
maximum total K-bytes 31472 22511
maximum total M-bytes 32 23
When running using 1 process each on 2 nodes, the walltime is 227 seconds:
Job information
---------------
hostname = node404.oscar.ccv.brown.edu
program = /gpfs/runtime/opt/nwchem/6.8-openmpi/bin/nwchem
date = Mon Jun 18 11:25:00 2018
compiled = Fri_Jun_15_16:31:11_2018
source = /gpfs/runtime/opt/nwchem/6.8-openmpi/src/nwchem-6.8
nwchem branch = 6.8
nwchem revision = v6.8-47-gdf6c956
ga revision = ga-5.6.3
use scalapack = F
input = h2o.nw
prefix = h2o_freq.
data base = ./h2o_freq.db
status = startup
nproc = 2
time left = 3599s
.
.
.
.
.
----------------------------------------------------------------------------
Normal Eigenvalue || Projected Derivative Dipole Moments (debye/angs)
Mode [cm**-1] || [d/dqX] [d/dqY] [d/dqZ]
------ ---------- || ------------------ ------------------ -----------------
1 -0.000 || -0.651 0.000 1.057
2 0.000 || 0.000 -0.044 0.000
3 0.000 || 0.000 2.480 0.000
4 0.000 || 0.000 2.480 0.000
5 0.000 || -1.131 0.000 0.000
6 0.000 || 1.701 0.000 0.404
7 1484.768 || 0.000 0.000 2.112
8 3460.171 || 0.000 0.000 1.877
9 3551.514 || -3.435 0.000 0.000
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Normal Eigenvalue || Projected Infra Red Intensities
Mode [cm**-1] || [atomic units] [(debye/angs)**2] [(KM/mol)] [arbitrary]
------ ---------- || -------------- ----------------- ---------- -----------
1 -0.000 || 0.066797 1.541 65.117 3.653
2 0.000 || 0.000084 0.002 0.082 0.005
3 0.000 || 0.266531 6.149 259.828 14.578
4 0.000 || 0.266531 6.149 259.828 14.578
5 0.000 || 0.055472 1.280 54.077 3.034
6 0.000 || 0.132548 3.058 129.215 7.250
7 1484.768 || 0.193382 4.461 188.519 10.577
8 3460.171 || 0.152668 3.522 148.828 8.350
9 3551.514 || 0.511486 11.800 498.622 27.976
----------------------------------------------------------------------------
vib:animation F
Task times cpu: 134.5s wall: 135.4s
NWChem Input Module
-------------------
Summary of allocated global arrays
-----------------------------------
No active global arrays
GA Statistics for process 0
------------------------------
create destroy get put acc scatter gather read&inc
calls: 1.77e+04 1.77e+04 3.30e+05 6.34e+04 8.87e+04 2475 0 2.57e+04
number of processes/call 1.02e+00 1.03e+00 1.05e+00 0.00e+00 0.00e+00
bytes total: 7.45e+07 5.29e+07 2.42e+07 4.00e+02 0.00e+00 2.05e+05
bytes remote: 5.85e+06 8.06e+06 3.69e+06 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 514056 bytes
...
Total times cpu: 225.2s wall: 227.3s
MA_summarize_allocated_blocks: starting scan ...
MA_summarize_allocated_blocks: scan completed: 0 heap blocks, 0 stack blocks
MA usage statistics:
allocation statistics:
heap stack
---- -----
current number of blocks 0 0
maximum number of blocks 25 51
current total bytes 0 0
maximum total bytes 31471376 22510232
maximum total K-bytes 31472 22511
maximum total M-bytes 32 23
1) I am using OpenMPI version 2.0.3 currently. However, using MVAPICH2 version 2.3 is even worse as it takes about 10 minutes for the same program on 2 nodes with 1 process per node.
2) I've used intel 2017 to compile but using an older version did not make a difference.
3) We have the nwchem version 6.8 installed now as can be seen in the output above, but the issue was first observed with version 6.6.
4) And again, this was not an issue on our CentOS6 system, but it is on our RedHat7 system. (I am sure there are other differences too like MPI/SLURM configuration which can be the cause, but just in case if it is related to the OS...)
In short, running even a simple nwchem program with 2 processes on multiple nodes takes significantly more time (1 order higher) than running with 2 processes on the same node.
Has anyone else encountered a similar problem?
|