While playing around with the pspw module using nwchem 6.3 I noticed that while the jobs would finish without issue on my AMD Athlon II X3 445, AMD Phenom II X6 1055T and Intel i5-2400 nodes, they failed to converge on my FX8150 and FX8350 nodes. Instead, the structures blew up. I've tried no external blas, openblas and acml (int; with and without fma4), all with the same results. However, the jobs do converge on the bulldozer and vishera (FX 8x50) cpus when using nwchem 6.1.1, with would indicate an issue specific to nwchem 6.3.
Edit: I've never noticed any issues using the DFT module -- it's only pspw that's causing the issue, and only in nwchem 6.3
(There's also a very minor typo in the nwchem code (src/band/minimizer/band_minimizer.F) -- it says Grassman/Stiefel with one n in Grassmann, rather than two.)
I'm not trying to do any production work in pspw, but rather I just want to report a possible bug.
The following job was used (it's just a random test job that triggers the issue):
scratch_dir /home/me/scratch
Title "pspw test job"
Start biphenyl_cation_twisted-1
echo
charge 1
geometry autosym units angstrom
C 0.00000 -3.54034 0.00000
C -1.20296 -2.84049 -0.216000
C -1.20944 -1.46171 -0.206253
C 0.00000 -0.721866 0.00000
C 1.20944 -1.46171 0.206253
C 1.20296 -2.84049 0.216000
C 0.00000 0.721866 0.00000
C 1.20944 1.46171 -0.206253
C 1.20296 2.84049 -0.216000
C -1.20944 1.46171 0.206253
C 0.00000 3.54034 0.00000
C -1.20296 2.84049 0.216000
H 0.00000 -4.62590 0.00000
H -2.12200 -3.38761 -0.395378
H -2.13673 -0.938003 -0.401924
H 2.12200 -3.38761 0.395378
H 2.12200 3.38761 -0.395378
H -2.13673 0.938003 0.401924
H 0.00000 4.62590 0.00000
H -2.12200 3.38761 0.395378
H 2.13673 0.938003 -0.401924
H 2.13673 -0.938003 0.401924
end
ecce_print /home/me/neon/job1056/ecce.out
nwpw
simulation_cell
lattice_vectors
2.000000e+01 0.000000e+00 0.000000e+00
0.000000e+00 2.000000e+01 0.000000e+00
0.000000e+00 0.000000e+00 2.000000e+01
end
mult 2
np_dimensions -1 -1
tolerances 1e-7 1e-7
end
driver
default
end
task pspw optimize
The following nodes worked fine:
AMD Athlon II X3, 8 Gb RAM
AMD Phenom II X6, 8 Gb RAM
Intel i5-2400, 16 Gb RAM
The final structure was this:
22
Structure 1
C 0.00000 -3.43835 0.00000
C -1.17952 -2.76904 -0.27325
C -1.18743 -1.41309 -0.25617
C 0.00000 -0.70724 0.00000
C 1.18743 -1.41309 0.25617
C 1.17952 -2.76904 0.27325
C 0.00000 0.70724 0.00000
C 1.18743 1.41309 -0.25617
C 1.17952 2.76904 -0.27325
C -1.18743 1.41309 0.25617
C 0.00000 3.43835 0.00000
C -1.17952 2.76904 0.27325
H 0.00000 -4.51201 0.00000
H -2.07930 -3.32101 -0.49999
H -2.09224 -0.87241 -0.48912
H 2.07930 -3.32101 0.49999
H 2.07930 3.32101 -0.49999
H -2.09224 0.87241 0.48912
H 0.00000 4.51201 0.00000
H -2.07930 3.32101 0.49999
H 2.09224 0.87241 -0.48912
H 2.09224 -0.87241 0.48912
cat nwch.nwout|grep "Total PSPW energy"
Total PSPW energy : -0.7403126784E+02
Total PSPW energy : -0.7403944621E+02
Total PSPW energy : -0.7404121161E+02
Total PSPW energy : -0.7404171961E+02
Total PSPW energy : -0.7404173291E+02
Total PSPW energy : -0.7404176133E+02
Total PSPW energy : -0.7404176138E+02
Total PSPW energy : -0.7404178719E+02
Total PSPW energy : -0.7404179836E+02
Total PSPW energy : -0.7404181219E+02
Total PSPW energy : -0.7404181529E+02
Total PSPW energy : -0.7404183416E+02
Total PSPW energy : -0.7404183384E+02
Total PSPW energy : -0.7404182554E+02
Total PSPW energy : -0.7404183894E+02
Total PSPW energy : -0.7404184777E+02
Total PSPW energy : -0.7404184781E+02
Total PSPW energy : -0.7404185248E+02
Total PSPW energy : -0.7404185252E+02
Total PSPW energy : -0.7404183390E+02
Total PSPW energy : -0.7404185160E+02
Total PSPW energy : -0.7404185179E+02
Total PSPW energy : -0.7404185172E+02
Total PSPW energy : -0.7404185163E+02
Total PSPW energy : -0.7404185167E+02
The step/energy data for the first geometry cycle behaves:
== Energy Calculation ==
====== Grassmann conjugate gradient iteration ======
>>> ITERATION STARTED AT Wed Nov 13 14:25:02 2013 <<<
iter. Energy DeltaE DeltaRho
------------------------------------------------------
- 15 steepest descent iterations performed
10 -0.3692699170E+02 -0.26760E+01 0.53590E-02
- 10 steepest descent iterations performed
20 -0.6158204365E+02 -0.41876E+00 0.31092E-03
- 10 steepest descent iterations performed
30 -0.7003979846E+02 -0.12758E+00 0.42769E-04
- 10 steepest descent iterations performed
40 -0.7263392229E+02 -0.34324E-01 0.21020E-04
- 10 steepest descent iterations performed
50 -0.7336419645E+02 -0.13767E-01 0.10561E-04
- 10 steepest descent iterations performed
60 -0.7380342913E+02 -0.14352E-01 0.99511E-06
- 10 steepest descent iterations performed
70 -0.7397649640E+02 -0.61870E-02 0.24790E-04
80 -0.7401568945E+02 -0.25650E-02 0.10674E-04
[..]
270 -0.7403126784E+02 -0.99193E-07 0.18719E-09
*** tolerance ok. iteration terminated
>>> ITERATION ENDED AT Wed Nov 13 14:49:47 2013 <<<
The following nodes lead to exploding structures:
AMD FX8150, 32 Gb RAM
AMD FX8350, 32 Gb RAM
22
Structure 23
C 0.00000 -3.28702 0.00000
C -3.07661 -4.04679 -3.00814
C -2.98013 -0.93045 -3.30301
C 0.00000 -1.11917 0.00000
C 2.98013 -0.93045 3.30301
C 3.07661 -4.04679 3.00814
C 0.00000 1.11917 0.00000
C 2.98013 0.93045 -3.30301
C 3.07661 4.04679 -3.00814
C -2.98013 0.93045 3.30301
C 0.00000 3.28702 0.00000
C -3.07661 4.04679 3.00814
H 0.00000 -4.78561 0.00000
H -4.23747 -6.33817 -3.55692
H -4.05310 0.84076 -3.86806
H 4.23747 -6.33817 3.55692
H 4.23747 6.33817 -3.55692
H -4.05310 -0.84076 3.86806
H 0.00000 4.78561 0.00000
H -4.23747 6.33817 3.55692
H 4.05310 -0.84076 -3.86806
H 4.05310 0.84076 3.86806
cat nwch.nwout|grep "Total PSPW energy"
Total PSPW energy : 0.7974031861E+02
Total PSPW energy : 0.7246518114E+02
Total PSPW energy : 0.6606772951E+02
Total PSPW energy : 0.5673312109E+02
Total PSPW energy : 0.4840063736E+02
Total PSPW energy : 0.4059627772E+02
Total PSPW energy : 0.3478817836E+02
Total PSPW energy : 0.2592070851E+02
Total PSPW energy : 0.1961993049E+02
Total PSPW energy : 0.1337599142E+02
Total PSPW energy : 0.8368934197E+01
Total PSPW energy : 0.4070828454E+01
Total PSPW energy : 0.4890631969E+00
Total PSPW energy : -0.5836265579E+01
Total PSPW energy : -0.7890745466E+01
Total PSPW energy : -0.1408910732E+02
Total PSPW energy : -0.1376509590E+02
Total PSPW energy : -0.1592566062E+02
Total PSPW energy : -0.1795967556E+02
Total PSPW energy : -0.2040058313E+02
Total PSPW energy : -0.2225738586E+02
Total PSPW energy : -0.2338027226E+02
Total PSPW energy : -0.2338044505E+02
Total PSPW energy : -0.2461935725E+02
Total PSPW energy : -0.2515495572E+02
Total PSPW energy : -0.2562143558E+02
Total PSPW energy : -0.2561106019E+02
Total PSPW energy : -0.2618964488E+02
Total PSPW energy : -0.2615732606E+02
Total PSPW energy : -0.2636682412E+02
Total PSPW energy : -0.2633912461E+02
Total PSPW energy : -0.2639189830E+02
Total PSPW energy : -0.2638634234E+02
Total PSPW energy : -0.2646209976E+02
Total PSPW energy : -0.2645190214E+02
Total PSPW energy : -0.2649565926E+02
Total PSPW energy : -0.2655957858E+02
Total PSPW energy : -0.2634583373E+02
Total PSPW energy : -0.2656000596E+02
Total PSPW energy : -0.2615164061E+02
Total PSPW energy : -0.2648948900E+02
The step/energy data for the first cycle has positive, rather than negative, energies that descend towards zero:
== Energy Calculation ==
====== Grassmann conjugate gradient iteration ======
>>> ITERATION STARTED AT Wed Nov 13 23:30:15 2013 <<<
iter. Energy DeltaE DeltaRho
------------------------------------------------------
- 15 steepest descent iterations performed
10 0.9582453153E+02 -0.17155E+00 0.52948E-04
- 10 steepest descent iterations performed
20 0.9035909073E+02 -0.19625E+00 0.47200E-05
- 10 steepest descent iterations performed
30 0.8605965298E+02 -0.65226E-01 0.72813E-05
- 10 steepest descent iterations performed
40 0.8398586767E+02 -0.73599E-01 0.65304E-06
- 10 steepest descent iterations performed
50 0.8228907713E+02 -0.30066E-01 0.22806E-05
- 10 steepest descent iterations performed
60 0.8161797538E+02 -0.21635E-01 0.33675E-06
- 10 steepest descent iterations performed
70 0.8081694861E+02 -0.16072E-01 0.31301E-06
- 10 steepest descent iterations performed
80 0.8048826777E+02 -0.10237E-01 0.27982E-06
- 10 steepest descent iterations performed
90 0.8013146841E+02 -0.20216E-02 0.15217E-06
100 0.8000423850E+02 -0.22420E-01 0.14079E-04
- 10 steepest descent iterations performed
110 0.7984322686E+02 -0.14763E-02 0.19813E-06
120 0.7979788211E+02 -0.13952E-01 0.70838E-05
- 10 steepest descent iterations performed
130 0.7974031861E+02 -0.64662E-03 0.10059E-06
140 0.7974031861E+02 0.17764E-14 0.31320E-30
*** energy going up. iteration not terminated
*** tolerance ok. iteration terminated
>>> ITERATION ENDED AT Wed Nov 13 23:49:46 2013 <<<
NWChem was compiled using the following script:
export NWCHEM_TOP=`pwd`
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONVERSION=2.7
export PYTHONHOME=/usr
export BLASOPT="-L/opt/openblas/lib -lopenblas"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib"
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
export ARMCI_NETWORK=SOCKETS
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran 1> make.log 2>make.err
cd $NWCHEM_TOP/contrib
export FC=gfortran
./getmem.nwchem
The main thing that shows up in make.err for nwchem 6.3 on the bulldozer cores that doesn't show up for e.g. the Phenom II cpu is
/usr/bin/ld: Warning: alignment 16 of symbol `cface_' in /opt/nwchem/nwchem-6.3-src.2013-05-28/lib/LINUX64/libstepper.a(stpr_face.o) is smaller than 32 in /opt/nwchem/nwchem-6.3-src.2013-05-28/lib/LINUX64/libstepper.a(stpr_partit.o)
This doesn't show up for nwchem 6.1.1.
Using nwchem 6.1.1 both FX8150 and FX8350 work:
====== Grassmann conjugate gradient iteration ======
>>> ITERATION STARTED AT Thu Nov 14 16:38:02 2013 <<<
iter. Energy DeltaE DeltaRho
------------------------------------------------------
10 -0.5440813275E+02 -0.23233E+01 0.72286E-02
- 10 steepest descent iterations performed
20 -0.7080853812E+02 -0.14463E+00 0.19753E-03
- 10 steepest descent iterations performed
30 -0.7308197239E+02 -0.20838E-01 0.16698E-04
- 10 steepest descent iterations performed
40 -0.7349928511E+02 -0.14939E-01 0.93367E-03
- 10 steepest descent iterations performed
50 -0.7361462983E+02 -0.92625E-02 0.88102E-05
60 -0.7369416281E+02 -0.57087E-02 0.34645E-04
[..]
310 -0.7403126685E+02 -0.12879E-06 0.44149E-09
320 -0.7403126810E+02 -0.86892E-07 0.25214E-09
*** tolerance ok. iteration terminated
>>> ITERATION ENDED AT Thu Nov 14 17:03:46 2013 <<<
|