Dear all,
I have compiled a parallel version of nwchem and started testing on the QAs. The following is my setting files:
# file content of compile_nwchem.sh
#!/bin/bash
# intel compilers
source /opt/intel/2013/bin/compilervars.sh intel64
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=16777216
export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so
export MKLROOT=/opt/intel/2013/mkl
export BLASOPT="-Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"
export FC="ifort"
export CC="icc $CFLAGS"
cd $NWCHEM_TOP/src
make realclean
cp util/*.fh include/.
pwd
make FC=$FC CC=$CC nwchem_config
make FC=$FC CC=$CC 32_to_64
make FC=$FC CC=$CC -j4
and install file:
#!/bin/bash
# intel compilers
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM=/opt/nwchem_intel2013_openmpi16
mkdir -p $NWCHEM/bin $NWCHEM/data
cp $NWCHEM_TOP/bin/LINUX64/nwchem $NWCHEM/bin
cp $NWCHEM_TOP/bin/LINUX64/depend.x $NWCHEM/bin/
chmod a+rx $NWCHEM/bin
cd $NWCHEM_TOP/src/
cp -r data $NWCHEM
cd $NWCHEM_TOP/src/basis
cp -r libraries $NWCHEM/data/
cd $NWCHEM_TOP/src/nwpw
cp -r libraryps $NWCHEM/data/
chmod -R 755 $NWCHEM/data/*
#content of .nwchemrc
nwchem_basis_library /opt/nwchem_intel2013_openmpi16/data/libraries/
nwchem_nwpw_library /opt/nwchem_intel2013_openmpi16/data/libraryps/
ffield amber
amber_1 /opt/nwchem_intel2013_openmpi16/data/amber_s/
amber_2 /opt/nwchem_intel2013_openmpi16/data/amber_q/
amber_3 /opt/nwchem_intel2013_openmpi16/data/amber_x/
amber_4 /opt/nwchem_intel2013_openmpi16/data/amber_u/
spce /opt/nwchem_intel2013_openmpi16/data/solvents/spce.rst
charmm_s /opt/nwchem_intel2013_openmpi16/data/charmm_s/
charmm_x /opt/nwchem_intel2013_openmpi16/data/charmm_x/
I have now run some qm tests and got some fails. Generally, the fails are likely from precisions. I have no idea whether these are serious or not. Please help me to identify. As the files are many I only picked some of them to show the difference. I have not fully run the tests as I only did single process calculations. There are problems running parallel version of the QAs, and I want to address this later.
user@localhost testoutputs$ cat ../singleqmtests.log | grep -c OK
120
user@localhost testoutputs$ cat ../singleqmtests.log | grep -c fail
12
From running about 66 jobs, I got 6 failed. (one failed job contributes to two "failed" words, same as an OK job)
Take dft_cr2 as an example:
[user@localhost testoutputs]$ diff dft_cr2.ok.out.nwparse dft_cr2.out.nwparse
61c61
< Effective nuclear repulsion energy (a.u.) 189.5566
---
> Effective nuclear repulsion energy (a.u.) 189.5565
It seems the difference is minute, but I cannot be very sure.
For example, in prop_h2o, I get:
[user@localhost testoutputs]$ diff prop_h2o.ok.out.nwparse prop_h2o.out.nwparse
78,84c78,84
< XYZ 0.000 0.000 0.000
< isotropic = 232.456
< anisotropy = 38.112
< isotropic = 28.667
< anisotropy = 6.794
< isotropic = 28.667
< anisotropy = 6.794
---
> XYZ -0.000 0.000 -0.000
> isotropic = 223.154
> anisotropy = 38.458
> isotropic = 31.395
> anisotropy = 2.916
> isotropic = 30.362
> anisotropy = 4.268
122c122
And from md runs, I get the following difference:
[user@localhost testoutputs]$ diff ethanol_ti.ok.tst ethanol_ti.tst
5c5
< Energy = -8.957E+03
---
> Energy = -8.944E+03
8c8
< Energy = -8.961E+03
---
> Energy = -8.948E+03
11c11
< Energy = -8.955E+03
---
> Energy = -8.942E+03
14c14
< Energy = -8.957E+03
---
> Energy = -8.942E+03
17c17
< Energy = -8.957E+03
---
> Energy = -8.945E+03
20c20
< Energy = -8.956E+03
---
> Energy = -8.942E+03
23c23
< Energy = -8.959E+03
---
> Energy = -8.946E+03
26c26
< Energy = -8.964E+03
---
> Energy = -8.951E+03
29c29
< Energy = -8.957E+03
---
> Energy = -8.943E+03
Can someone tell me what have I done wrong and what should I do in order to fix the errors.
Plus, when I run dna in example folder, I got the following error:
Task times cpu: 7.0s wall: 8.4s
NWChem Input Module
-------------------
Analysis Input Module
---------------------
Analysis Module
---------------
Reference coordinates read from dna_em.qrs
Number of atoms is 388
Topology read from dna.top
Opening trj file dna_md.trj
Closing trj file
Trajectory file header from dna_md.trj
Opening trj file dna_md.trj
Opening copy file dna_super.trj
0:Segmentation Violation error, status=: 11
(rank:0 hostname:localhost.localdomain pid:3543):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
0: ARMCI aborting 11 (0xb).
0: ARMCI aborting 11 (0xb).
system error message: Inappropriate ioctl for device
Thanks in advance!!!
|