Solved: Nwchem 6.3 running 2-5 times slower than 6.1.1


Gets Around
I've experimented a little bit with nwchem 6.3 (17 May release), and it appears to run much slower than 6.1.1.

For a benchmark calculation which typically takes only about 40 seconds in 6.1.1 and 6.1, I end up with 190 seconds in nwchem 6.3. All cores are engaged though, and there's nothing odd in top.

The input file is shown below:
scratch_dir /scratch
start benzene 

geometry units angstroms
C  0.100  1.396  0.000
C  1.209  0.698  0.000
C  1.209 -0.698  0.000
C  0.000 -1.396  0.000
C -1.209 -0.698  0.000
C -1.209  0.698  0.000
H  0.000  2.479  0.000
H  2.147  1.240  0.000
H  2.147 -1.240  0.000
H  0.000 -2.479  0.000
H -2.147 -1.240  0.000
H -2.147  1.240  0.000
end

basis
 H library "6-31+g*" 
 c library "6-31+g*"
end
dft
        direct
end

task dft optimize


The hardware in that particular case is AMD FX 8150 (8 cores)/32 Gb RAM running as a local calculation on debian wheezy/stable with openmpi 1.3 and ACML 5.3.1. Nwchem was kompiled as shown here:
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONVERSION=2.7
export PYTHONHOME=/usr
export BLASOPT="-L/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib -lacml"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib"
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran 2> make.err 1>make.log
cd $NWCHEM_TOP/contrib
export FC=gfortran
./getmem.nwchem


I've also tested it on an i5-2400 (four cores)/16 gb ram and openblas instead of ACML, but otherwise identical parameters. With 6.1.1 it takes ca 50 s (cpu)/52 s (wall) vs 126/127 seconds for nwchem 6.3.

Finally, I have tested this on a dual-socket xenon cluster running ROCKS 5.4.3 (based on CentOS 5.6) using openblas. Using six cores out of the available eight (again, it is contained on a single node) I get 254 seconds for nwchem 6.3 vs 121 seconds for nwchem 6.1.1.

In each pair of cases the same build file was used for both 6.1.1 and 6.3.

The question is: is this normal? Is this an issue with nwchem 6.3? Or the way I build it?


Gets Around
Same for me.
nwchem 6.3 is aboun 30% slower in HF SCF than 6.1.1 version.

Forum Vet
ARMCI_NETWORK=SOCKETS should be faster
The default ARMCI_NETWORK is now MPI-TS. In the case of runs that use a single node,
ARMCI_NETWORK=SOCKETS might give faster run times.

Here is how to switch
export ARMCI_NETWORK=SOCKETS
cd $NWCHEM_TOP/src/tools
rm -rf build install
make
cd ..
make link

Gets Around
Cheers Edo.

Using SOCKETS brought down the times a fair bit:
On the amd fx8150 where 6.1.1 takes 44+/-3 seconds, 6.3 now takes 56+/-1 seconds (10 repeats).
On the intel i5-2400 where 6.1.1 took 50 s, 6.3 now takes 70 s.
On the dual-socket xenon node 6.1.1 took 78 seconds (retested), while 6.3 now takes 107 seconds.

My times are now more in line with Vladimir's i.e. 30% above 6.1.1.

Forum Vet
Unfortunately, I have just realized that there was an issue in $NWCHEM_TOP/src/tools/GNUmakefile that was preventing ARMCI_NETWORK from being correctly recognized.
Therefore, I am not 100% sure that you last build did get SOCKETS in place (you can check it by analyzing tools/build/config.log)

Anyhow,
the patch is available at
http://nwchemgit.github.io/images/Armcisock.patch.gz

To install:
export ARMCI_NETWORK=SOCKETS
cd $NWCHEM_TOP/src
wget http://nwchemgit.github.io/images/Armcisock.patch.gz
gzip -d Armcisock.patch
patch -p0 < Armcisock.patch
cd tools
rm -rf build install
make
cd ..
make link


Thanks again for your valuable feedback

Gets Around
1. Looking through src/tools/build/config.log for SOCKETS for an UNPATCHED build:
configure:6762: result: LINUX64
configure:10516: WARNING: No ARMCI_NETWORK specified, defaulting to SOCKETS
configure:12957: checking whether to enable assertions

and
configure:42954: **************************************************************
configure:42956:  Global Arrays (GA) configured as follows:
configure:42958: **************************************************************
configure:42960: 
configure:42962:                 TARGET=LINUX64
configure:42964:              MSG_COMMS=MPI
configure:42966:             GA_MP_LIBS= -lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread
configure:42968:          GA_MP_LDFLAGS= -L/usr/lib/openmpi/lib
configure:42970:         GA_MP_CPPFLAGS= -I/usr/lib/openmpi/include
configure:42972:          ARMCI_NETWORK=SOCKETS
configure:42974:  ARMCI_NETWORK_LDFLAGS

and
ARMCI_NETWORK_SOCKETS_FALSE='#'
ARMCI_NETWORK_SOCKETS_TRUE='

(there are two apostrophes at the end of the second line)
and
#define DATA_SERVER 1
#define SOCKETS 1
#define ENABLE_ARMCI_MEM_OPTION 1


2. The patched version of 6.3
Re-running my short calculation ten times with the PATCHED version I get 56 s cpu and 58 s wall for every single run i.e. the unpatched version must have recognised SOCKETS. There is at least no functional difference.

Forum Vet
SOCKETS recognized
Yes,
You were right by saying that SOCKETS got recognized. I forgot that was the backup in the ARMCI autoconf structure.

As far as the 6.3 vs 6.1 comparison is concerned, what kind of benchmark was it?
If it was DFT or HF, did you check if the SCF took the same number of iterations?

Thanks, Edo

Gets Around
Edo,
I get four geometry steps for both 6.1.1 and 6.3.

The 'benchmark' is the benzene dft optimization in my first post in this thread. It's not much of a benchmark, but I have used it extensively in the past to do quick tests. I'll be happy to run a longer, better one -- with SOCKETS at the moment it's a difference of mere seconds.

Example (AMD FX 8150, 8 cores, 32 gb, ACML 5.3.1 (int, fma4, gfortran)) using 6.1.1
@ Step       Energy      Delta E   Gmax     Grms     Xrms     Xmax   Walltime
@ ---- ---------------- -------- -------- -------- -------- -------- --------
@    0    -230.08801905  0.0D+00  0.07175  0.01289  0.00000  0.00000     12.5
@    1    -230.10026114 -1.2D-02  0.01388  0.00296  0.03473  0.13124     21.3
@    2    -230.10088594 -6.2D-04  0.00276  0.00056  0.01271  0.03164     32.8
@    3    -230.10091297 -2.7D-05  0.00009  0.00002  0.00139  0.00502     40.0
@    4    -230.10091305 -7.9D-08  0.00002  0.00000  0.00010  0.00029     44.6
@    4    -230.10091305 -7.9D-08  0.00002  0.00000  0.00010  0.00029     44.6


Example using 6.3 without specifying SOCKETS
@ Step       Energy      Delta E   Gmax     Grms     Xrms     Xmax   Walltime
@ ---- ---------------- -------- -------- -------- -------- -------- --------
@    0    -230.08801906  0.0D+00  0.07175  0.01289  0.00000  0.00000     38.2
@    1    -230.10023344 -1.2D-02  0.01700  0.00285  0.03171  0.12027     89.2
@    2    -230.10089367 -6.6D-04  0.00258  0.00048  0.01180  0.03599    117.1
@    3    -230.10091297 -1.9D-05  0.00015  0.00003  0.00162  0.00529    164.1
@    4    -230.10091306 -9.2D-08  0.00001  0.00000  0.00014  0.00035    191.5
@    4    -230.10091306 -9.2D-08  0.00001  0.00000  0.00014  0.00035    191.5


Example using 6.3 while specifying SOCKETS:
@ Step       Energy      Delta E   Gmax     Grms     Xrms     Xmax   Walltime
@ ---- ---------------- -------- -------- -------- -------- -------- --------
@    0    -230.08801906  0.0D+00  0.07175  0.01289  0.00000  0.00000     12.5
@    1    -230.10028794 -1.2D-02  0.01611  0.00278  0.03101  0.12160     27.3
@    2    -230.10089384 -6.1D-04  0.00270  0.00048  0.01079  0.03564     35.6
@    3    -230.10091290 -1.9D-05  0.00026  0.00005  0.00124  0.00482     49.6
@    4    -230.10091306 -1.7D-07  0.00000  0.00000  0.00019  0.00048     58.0
@    4    -230.10091306 -1.7D-07  0.00000  0.00000  0.00019  0.00048     58.0



Edit: the above was obviously the number of DFT optimisation steps, not the SCF iterations that you asked for. There's a difference here:

6.1.1 does 9, 7, 6, 4, 5 and 2 SCF iterations.
6.3 does 10, 8, 7, 7, 7, 7 and 7 SCF iterations.

(note: I've repeated the calcs so often that the output below might not be from the exact same run as the output above)

6.1.1
 d= 0,ls=0.0,diis     1   -230.1846873327 -4.34D+02  1.53D-02  2.12D+00     2.1
 d= 0,ls=0.0,diis     2   -230.0401177850  1.45D-01  1.44D-02  2.62D-01     2.9
 d= 0,ls=0.0,diis     3   -229.9189400075  1.21D-01  6.88D-03  1.11D+00     3.7
 d= 0,ls=0.0,diis     4   -230.0859601490 -1.67D-01  2.03D-03  1.51D-02     4.5
 d= 0,ls=0.0,diis     5   -230.0874343200 -1.47D-03  6.04D-04  4.09D-03     5.3
 d= 0,ls=0.0,diis     6   -230.0879684761 -5.34D-04  1.70D-04  3.63D-04     6.1
 d= 0,ls=0.0,diis     7   -230.0880070289 -3.86D-05  8.09D-05  8.00D-05     7.0
 d= 0,ls=0.0,diis     8   -230.0880181862 -1.12D-05  1.98D-05  6.07D-06     7.8
 d= 0,ls=0.0,diis     9   -230.0880190533 -8.67D-07  4.12D-06  1.45D-07     8.6
 d= 0,ls=0.0,diis     1   -230.0940137547 -4.32D+02  3.37D-03  2.61D-02    12.3
 d= 0,ls=0.0,diis     2   -230.0985355007 -4.52D-03  1.32D-03  1.33D-02    13.2
 d= 0,ls=0.0,diis     3   -230.0983868780  1.49D-04  8.98D-04  1.27D-02    14.0
 d= 0,ls=0.0,diis     4   -230.1001155261 -1.73D-03  2.72D-04  9.31D-04    14.8
 d= 0,ls=0.0,diis     5   -230.1002464623 -1.31D-04  7.14D-05  1.15D-04    15.6
 d= 0,ls=0.0,diis     6   -230.1002609443 -1.45D-05  1.96D-05  1.74D-06    16.4
 d= 0,ls=0.0,diis     7   -230.1002611376 -1.93D-07  6.88D-06  3.08D-07    17.2
 d= 0,ls=0.0,diis     1   -230.1006971476 -4.33D+02  6.87D-04  9.65D-04    21.0
 d= 0,ls=0.0,diis     2   -230.1007907138 -9.36D-05  3.24D-04  6.71D-04    21.8
 d= 0,ls=0.0,diis     3   -230.1008012839 -1.06D-05  1.84D-04  5.24D-04    22.6
 d= 0,ls=0.0,diis     4   -230.1008761633 -7.49D-05  6.07D-05  3.72D-05    23.4
 d= 0,ls=0.0,diis     5   -230.1008816761 -5.51D-06  1.01D-05  1.13D-06    24.2
 d= 0,ls=0.0,diis     6   -230.1008818053 -1.29D-07  3.79D-06  1.74D-07    25.0
 d= 0,ls=0.0,diis     1   -230.1008836620 -4.33D+02  7.86D-05  1.23D-05    26.1
 d= 0,ls=0.0,diis     2   -230.1008848385 -1.18D-06  3.70D-05  8.57D-06    26.9
 d= 0,ls=0.0,diis     3   -230.1008849763 -1.38D-07  2.07D-05  6.71D-06    27.7
 d= 0,ls=0.0,diis     4   -230.1008859360 -9.60D-07  6.90D-06  4.69D-07    28.5
 d= 0,ls=0.0,diis     1   -230.1009013337 -4.33D+02  1.36D-04  4.95D-05    32.2
 d= 0,ls=0.0,diis     2   -230.1009089686 -7.63D-06  6.20D-05  3.09D-05    33.0
 d= 0,ls=0.0,diis     3   -230.1009089399  2.87D-08  4.18D-05  2.70D-05    33.8
 d= 0,ls=0.0,diis     4   -230.1009127547 -3.81D-06  1.09D-05  1.57D-06    34.7
 d= 0,ls=0.0,diis     5   -230.1009129705 -2.16D-07  2.91D-06  2.17D-07    35.5
 d= 0,ls=0.0,diis     1   -230.1009130375 -4.33D+02  2.05D-05  1.19D-07    39.2
 d= 0,ls=0.0,diis     2   -230.1009130490 -1.15D-08  3.57D-06  9.41D-08    40.0



6.3 (sockets):
 d= 0,ls=0.0,diis     1   -230.1846873327 -4.34D+02  1.53D-02  2.12D+00     1.7
 d= 0,ls=0.0,diis     2   -230.0401171170  1.45D-01  1.44D-02  2.62D-01     2.4
 d= 0,ls=0.0,diis     3   -229.9189392569  1.21D-01  6.88D-03  1.11D+00     3.2
 d= 0,ls=0.0,diis     4   -230.0859593631 -1.67D-01  2.02D-03  1.51D-02     4.0
 d= 0,ls=0.0,diis     5   -230.0874336088 -1.47D-03  6.05D-04  4.09D-03     4.7
 d= 0,ls=0.0,diis     6   -230.0879684673 -5.35D-04  2.03D-04  3.63D-04     5.5
 d= 0,ls=0.0,diis     7   -230.0880070291 -3.86D-05  2.52D-04  7.99D-05     6.3
 d= 0,ls=0.0,diis     8   -230.0879494068  5.76D-05  1.54D-04  4.63D-04     7.0
 d= 0,ls=0.0,diis     9   -230.0880181652 -6.88D-05  2.67D-05  6.27D-06     7.8
 d= 0,ls=0.0,diis    10   -230.0880190566 -8.91D-07  3.58D-06  1.37D-07     8.6
 d= 0,ls=0.0,diis     1   -230.0936723836 -4.33D+02  3.20D-03  2.75D-02    12.2
 d= 0,ls=0.0,diis     2   -230.0984010434 -4.73D-03  1.37D-03  1.42D-02    13.0
 d= 0,ls=0.0,diis     3   -230.0982162514  1.85D-04  9.29D-04  1.38D-02    13.8
 d= 0,ls=0.0,diis     4   -230.1001000219 -1.88D-03  2.75D-04  9.56D-04    14.5
 d= 0,ls=0.0,diis     5   -230.1002342285 -1.34D-04  7.22D-05  1.19D-04    15.3
 d= 0,ls=0.0,diis     6   -230.1002491001 -1.49D-05  1.96D-05  1.73D-06    16.1
 d= 0,ls=0.0,diis     7   -230.1002492881 -1.88D-07  1.65D-05  3.35D-07    16.9
 d= 0,ls=0.0,diis     8   -230.1002492104  7.77D-08  9.64D-06  8.94D-07    17.6
 d= 0,ls=0.0,diis     1   -230.1002121452 -4.33D+02  3.33D-04  3.14D-04    18.6
 d= 0,ls=0.0,diis     2   -230.1002675856 -5.54D-05  1.43D-04  1.58D-04    19.3
 d= 0,ls=0.0,diis     3   -230.1002652577  2.33D-06  9.86D-05  1.54D-04    20.1
 d= 0,ls=0.0,diis     4   -230.1002862056 -2.09D-05  2.94D-05  1.11D-05    20.9
 d= 0,ls=0.0,diis     5   -230.1002877573 -1.55D-06  7.69D-06  1.40D-06    21.7
 d= 0,ls=0.0,diis     6   -230.1002879343 -1.77D-07  1.99D-06  1.61D-08    22.4
 d= 0,ls=0.0,diis     7   -230.1002879361 -1.75D-09  1.65D-06  3.26D-09    23.2
 d= 0,ls=0.0,diis     1   -230.1006320799 -4.33D+02  8.28D-04  1.29D-03    26.8
 d= 0,ls=0.0,diis     2   -230.1007729436 -1.41D-04  3.60D-04  9.01D-04    27.6
 d= 0,ls=0.0,diis     3   -230.1007791735 -6.23D-06  2.17D-04  7.47D-04    28.3
 d= 0,ls=0.0,diis     4   -230.1008878211 -1.09D-04  6.13D-05  3.94D-05    29.1
 d= 0,ls=0.0,diis     5   -230.1008935156 -5.69D-06  1.22D-05  2.51D-06    29.9
 d= 0,ls=0.0,diis     6   -230.1008938217 -3.06D-07  4.72D-06  1.55D-07    30.7
 d= 0,ls=0.0,diis     7   -230.1008938410 -1.93D-08  2.33D-06  8.52D-09    31.4
 d= 0,ls=0.0,diis     1   -230.1009059134 -4.33D+02  1.01D-04  2.75D-05    35.0
 d= 0,ls=0.0,diis     2   -230.1009108873 -4.97D-06  4.12D-05  1.31D-05    35.8
 d= 0,ls=0.0,diis     3   -230.1009106237  2.64D-07  2.90D-05  1.33D-05    36.6
 d= 0,ls=0.0,diis     4   -230.1009124240 -1.80D-06  8.44D-06  9.03D-07    37.4
 d= 0,ls=0.0,diis     5   -230.1009125483 -1.24D-07  2.28D-06  1.29D-07    38.1
 d= 0,ls=0.0,diis     6   -230.1009125647 -1.64D-08  4.62D-07  8.26D-10    38.9
 d= 0,ls=0.0,diis     7   -230.1009125648 -8.00D-11  3.84D-07  2.41D-10    39.7
 d= 0,ls=0.0,diis     1   -230.1009127343 -4.33D+02  1.57D-05  6.68D-07    40.6
 d= 0,ls=0.0,diis     2   -230.1009128522 -1.18D-07  6.43D-06  3.39D-07    41.4
 d= 0,ls=0.0,diis     3   -230.1009128494  2.77D-09  4.48D-06  3.14D-07    42.2
 d= 0,ls=0.0,diis     4   -230.1009128916 -4.22D-08  1.34D-06  2.33D-08    43.0
 d= 0,ls=0.0,diis     5   -230.1009128948 -3.20D-09  3.60D-07  3.29D-09    43.7
 d= 0,ls=0.0,diis     6   -230.1009128952 -4.10D-10  6.91D-08  9.27D-11    44.5
 d= 0,ls=0.0,diis     7   -230.1009128952  5.68D-14  5.56D-08  7.97D-11    45.3
 d= 0,ls=0.0,diis     1   -230.1009130006 -4.33D+02  1.15D-05  3.22D-07    48.9
 d= 0,ls=0.0,diis     2   -230.1009130278 -2.72D-08  6.00D-06  2.58D-07    49.6
 d= 0,ls=0.0,diis     3   -230.1009130351 -7.32D-09  3.42D-06  1.79D-07    50.4
 d= 0,ls=0.0,diis     4   -230.1009130611 -2.60D-08  1.07D-06  1.14D-08    51.2
 d= 0,ls=0.0,diis     5   -230.1009130628 -1.67D-09  1.96D-07  5.17D-10    52.0
 d= 0,ls=0.0,diis     6   -230.1009130628 -5.19D-11  6.89D-08  1.26D-10    52.7
 d= 0,ls=0.0,diis     7   -230.1009130628 -6.65D-12  3.19D-08  7.77D-11    53.5

Gets Around
Edo,
there's another thing puzzling me:
Quote:
You were right by saying that SOCKETS got recognized. I forgot that was the backup in the ARMCI autoconf structure.


Looking through src/tools/build/config.log for both building with export ARMCI_NETWORK=SOCKETS and without, I see the message about defaulting to SOCKETS. Yet the performance with and without explicitly setting ARMCI_NETWORK to SOCKETS is very different. Either different stages of the compilation default to different, well, defaults, or something fishy is going on.

Based on what you said in your first post,
Quote:

The default ARMCI_NETWORK is now MPI-TS.

I guess the defaults are currently being inconsistent, which is something that should be fixed somewhere down the line.

Forum Vet
Fix for reducing the number of SCF cycles
The following is a fix, not fully validated yet, for getting the number of SCF iterations
of the DFT code from the 6.3 version close to the 6.1 behavior

The patch is available at the following URL
http://nwchemgit.github.io/images/Iswtch.patch.gz

To apply

cd $NWCHEM_TOP/src
wget http://nwchemgit.github.io/images/Iswtch.patch.gz
gzip -d Iswtch.patch
patch -p0 < Iswtch.patch
cd nwdft/scf_dft
make
cd ../..
make link

Gets Around
I've patched and tested it, and discovered that I made a typo above in the list over the number of SCF iterations -- 6.3 does one more cycle than 6.1.1 it seems (still same number of DFT optimisation steps, 4, though)

I think it worked: I get 10, 6, 6, 2 and 3 SCF iterations
c.f.
Quote:

6.1.1 does 9, 7, 6, 4, 5 and 2 SCF iterations.
6.3 does 10, 8, 7, 7, 7, 7 and 7 SCF iterations.


I've got too many versions and too much data floating around, so I'll start with freshly extracted sources and redo the tests.

From what I can see the execution times are now down to about the same as 6.1.1 when you patch 6.3 with Iswtch,path and set ARMCI_NETWORK=SOCKETS:
 Total times  cpu:       39.1s     wall:       40.9s
 Total times  cpu:       45.3s     wall:       47.1s
 Total times  cpu:       46.1s     wall:       48.1s
 Total times  cpu:       41.9s     wall:       43.7s
 Total times  cpu:       47.2s     wall:       51.2s
 Total times  cpu:       44.6s     wall:       47.2s
 Total times  cpu:       38.9s     wall:       40.8s
 Total times  cpu:       44.9s     wall:       47.1s
 Total times  cpu:       44.7s     wall:       46.9s
 Total times  cpu:       45.9s     wall:       47.9s



Thank you Edo!

I'll do some more tests and report back if it looks different.

Gets Around
Edo,
I've started with fresh sources, and have compared patched and unpatched versions (with openblas or acml).

It worked.
I get
  patched: 10,8,6,4,3 scf steps
unpatched: 10,8,7,7,7,5 scf steps

I also confirmed that unpatched 6.3, in addition to doing more SCF cycles per optimisation step, also does one extra set of cycles (1-7-1-7) during the second DFT geometry optimisation step:
(cat test.out|egrep "d= 0|@")

Unpatched version
@    2    -230.10089843 -4.4D-04  0.00174  0.00035  0.00946  0.02027     30.0
 d= 0,ls=0.0,diis     1   -230.1009079200 -4.33D+02  1.34D-04  2.13D-05    29.3
 d= 0,ls=0.0,diis     2   -230.1009116988 -3.78D-06  3.31D-05  6.41D-06    30.1
 d= 0,ls=0.0,diis     3   -230.1009113295  3.69D-07  2.19D-05  8.30D-06    30.9
 d= 0,ls=0.0,diis     4   -230.1009124163 -1.09D-06  7.07D-06  5.27D-07    31.7
 d= 0,ls=0.0,diis     5   -230.1009124888 -7.24D-08  1.69D-06  8.46D-08    32.4
 d= 0,ls=0.0,diis     6   -230.1009124997 -1.10D-08  4.71D-07  7.33D-10    33.2
 d= 0,ls=0.0,diis     7   -230.1009124998 -6.67D-11  3.56D-07  2.42D-10    34.0
 d= 0,ls=0.0,diis     1   -230.1009125958 -4.33D+02  1.55D-05  2.78D-07    34.9
 d= 0,ls=0.0,diis     2   -230.1009126449 -4.91D-08  3.61D-06  8.82D-08    35.7
 d= 0,ls=0.0,diis     3   -230.1009126412  3.69D-09  2.42D-06  1.06D-07    36.5
 d= 0,ls=0.0,diis     4   -230.1009126548 -1.36D-08  7.88D-07  6.60D-09    37.3
 d= 0,ls=0.0,diis     5   -230.1009126557 -8.82D-10  2.04D-07  1.32D-09    38.0
 d= 0,ls=0.0,diis     6   -230.1009126558 -1.67D-10  3.35D-08  7.94D-11    38.8
 d= 0,ls=0.0,diis     7   -230.1009126558  1.36D-12  1.38D-08  7.66D-11    39.6
@    3    -230.10091266 -1.4D-05  0.00021  0.00004  0.00193  0.00737     44.0


Patched version:
@    2    -230.10090574 -2.5D-04  0.00126  0.00030  0.00679  0.01605     29.3
 d= 0,ls=0.0,diis     1   -230.1009101932 -4.33D+02  1.13D-04  1.13D-05    28.7
 d= 0,ls=0.0,diis     2   -230.1009123213 -2.13D-06  2.11D-05  2.38D-06    29.4
 d= 0,ls=0.0,diis     3   -230.1009122054  1.16D-07  1.34D-05  2.98D-06    30.2
 d= 0,ls=0.0,diis     4   -230.1009125831 -3.78D-07  4.50D-06  2.33D-07    31.0
@    3    -230.10091258 -6.8D-06  0.00029  0.00006  0.00150  0.00369     35.4


This is reproducible -- I've tested it on five different machines, using intel or amd, and linked against acml or openblas.

Anyway, the patch makes the performance of 6.3 comparable to that of 6.1.1.
Thanks again!


Forum >> NWChem's corner >> Compiling NWChem