Compiling NWCHEM on Centos6.1 openmpi1.6 ifort icc


Click here for full thread
Clicked A Few Times
Dear all,

I have compiled a parallel version of nwchem and started testing on the QAs. The following is my setting files:

# file content of compile_nwchem.sh
#!/bin/bash
# intel compilers
source /opt/intel/2013/bin/compilervars.sh intel64
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=16777216

export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

export MKLROOT=/opt/intel/2013/mkl
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"

export FC="ifort"
export CC="icc $CFLAGS"

cd $NWCHEM_TOP/src
make realclean
cp util/*.fh include/.

pwd
make FC=$FC CC=$CC nwchem_config
make FC=$FC CC=$CC 32_to_64
make FC=$FC CC=$CC -j4


and install file:

#!/bin/bash
# intel compilers
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM=/opt/nwchem_intel2013_openmpi16

mkdir -p $NWCHEM/bin $NWCHEM/data
cp $NWCHEM_TOP/bin/LINUX64/nwchem $NWCHEM/bin
cp $NWCHEM_TOP/bin/LINUX64/depend.x $NWCHEM/bin/
chmod a+rx $NWCHEM/bin

cd $NWCHEM_TOP/src/
cp -r data $NWCHEM

cd $NWCHEM_TOP/src/basis
cp -r libraries $NWCHEM/data/

cd $NWCHEM_TOP/src/nwpw
cp -r libraryps $NWCHEM/data/
chmod -R 755 $NWCHEM/data/*


#content of .nwchemrc
nwchem_basis_library /opt/nwchem_intel2013_openmpi16/data/libraries/
nwchem_nwpw_library /opt/nwchem_intel2013_openmpi16/data/libraryps/
ffield amber
amber_1 /opt/nwchem_intel2013_openmpi16/data/amber_s/
amber_2 /opt/nwchem_intel2013_openmpi16/data/amber_q/
amber_3 /opt/nwchem_intel2013_openmpi16/data/amber_x/
amber_4 /opt/nwchem_intel2013_openmpi16/data/amber_u/
spce    /opt/nwchem_intel2013_openmpi16/data/solvents/spce.rst
charmm_s /opt/nwchem_intel2013_openmpi16/data/charmm_s/
charmm_x /opt/nwchem_intel2013_openmpi16/data/charmm_x/


I have now run some qm tests and got some fails. Generally, the fails are likely from precisions. I have no idea whether these are serious or not. Please help me to identify. As the files are many I only picked some of them to show the difference. I have not fully run the tests as I only did single process calculations. There are problems running parallel version of the QAs, and I want to address this later.

user@localhost testoutputs$ cat ../singleqmtests.log | grep -c OK
120
user@localhost testoutputs$ cat ../singleqmtests.log | grep -c fail
12


From running about 66 jobs, I got 6 failed. (one failed job contributes to two "failed" words, same as an OK job)
Take dft_cr2 as an example:
[user@localhost testoutputs]$ diff dft_cr2.ok.out.nwparse dft_cr2.out.nwparse
61c61
< Effective nuclear repulsion energy (a.u.) 189.5566
---
> Effective nuclear repulsion energy (a.u.) 189.5565


It seems the difference is minute, but I cannot be very sure.
For example, in prop_h2o, I get:

[user@localhost testoutputs]$ diff prop_h2o.ok.out.nwparse prop_h2o.out.nwparse 
78,84c78,84
< XYZ 0.000 0.000 0.000
< isotropic = 232.456
< anisotropy = 38.112
< isotropic = 28.667
< anisotropy = 6.794
< isotropic = 28.667
< anisotropy = 6.794
---
> XYZ -0.000 0.000 -0.000
> isotropic = 223.154
> anisotropy = 38.458
> isotropic = 31.395
> anisotropy = 2.916
> isotropic = 30.362
> anisotropy = 4.268
122c122


And from md runs, I get the following difference:

[user@localhost testoutputs]$ diff ethanol_ti.ok.tst ethanol_ti.tst
5c5
< Energy           =   -8.957E+03
---
> Energy           =   -8.944E+03
8c8
< Energy           =   -8.961E+03
---
> Energy           =   -8.948E+03
11c11
< Energy           =   -8.955E+03
---
> Energy           =   -8.942E+03
14c14
< Energy           =   -8.957E+03
---
> Energy           =   -8.942E+03
17c17
< Energy           =   -8.957E+03
---
> Energy           =   -8.945E+03
20c20
< Energy           =   -8.956E+03
---
> Energy           =   -8.942E+03
23c23
< Energy           =   -8.959E+03
---
> Energy           =   -8.946E+03
26c26
< Energy           =   -8.964E+03
---
> Energy           =   -8.951E+03
29c29
< Energy           =   -8.957E+03
---
> Energy           =   -8.943E+03


Can someone tell me what have I done wrong and what should I do in order to fix the errors.

Plus, when I run dna in example folder, I got the following error:

 Task  times  cpu:        7.0s     wall:        8.4s


                                NWChem Input Module
                                -------------------


                               Analysis Input Module
                               ---------------------


                                  Analysis Module
                                  ---------------


 Reference coordinates read from dna_em.qrs

 Number of atoms is   388
 Topology read from dna.top

 Opening trj file dna_md.trj
 Closing trj file

 Trajectory file header from dna_md.trj

 Opening trj file dna_md.trj

 Opening copy file dna_super.trj

0:Segmentation Violation error, status=: 11
(rank:0 hostname:localhost.localdomain pid:3543):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
  0: ARMCI aborting 11 (0xb).
  0: ARMCI aborting 11 (0xb).
system error message: Inappropriate ioctl for device


Thanks in advance!!!