Compiling NWCHEM on Centos6.1 openmpi1.6 ifort icc


Clicked A Few Times
Dear all,

I have compiled a parallel version of nwchem and started testing on the QAs. The following is my setting files:

# file content of compile_nwchem.sh
#!/bin/bash
# intel compilers
source /opt/intel/2013/bin/compilervars.sh intel64
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=16777216

export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

export MKLROOT=/opt/intel/2013/mkl
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"

export FC="ifort"
export CC="icc $CFLAGS"

cd $NWCHEM_TOP/src
make realclean
cp util/*.fh include/.

pwd
make FC=$FC CC=$CC nwchem_config
make FC=$FC CC=$CC 32_to_64
make FC=$FC CC=$CC -j4


and install file:

#!/bin/bash
# intel compilers
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM=/opt/nwchem_intel2013_openmpi16

mkdir -p $NWCHEM/bin $NWCHEM/data
cp $NWCHEM_TOP/bin/LINUX64/nwchem $NWCHEM/bin
cp $NWCHEM_TOP/bin/LINUX64/depend.x $NWCHEM/bin/
chmod a+rx $NWCHEM/bin

cd $NWCHEM_TOP/src/
cp -r data $NWCHEM

cd $NWCHEM_TOP/src/basis
cp -r libraries $NWCHEM/data/

cd $NWCHEM_TOP/src/nwpw
cp -r libraryps $NWCHEM/data/
chmod -R 755 $NWCHEM/data/*


#content of .nwchemrc
nwchem_basis_library /opt/nwchem_intel2013_openmpi16/data/libraries/
nwchem_nwpw_library /opt/nwchem_intel2013_openmpi16/data/libraryps/
ffield amber
amber_1 /opt/nwchem_intel2013_openmpi16/data/amber_s/
amber_2 /opt/nwchem_intel2013_openmpi16/data/amber_q/
amber_3 /opt/nwchem_intel2013_openmpi16/data/amber_x/
amber_4 /opt/nwchem_intel2013_openmpi16/data/amber_u/
spce    /opt/nwchem_intel2013_openmpi16/data/solvents/spce.rst
charmm_s /opt/nwchem_intel2013_openmpi16/data/charmm_s/
charmm_x /opt/nwchem_intel2013_openmpi16/data/charmm_x/


I have now run some qm tests and got some fails. Generally, the fails are likely from precisions. I have no idea whether these are serious or not. Please help me to identify. As the files are many I only picked some of them to show the difference. I have not fully run the tests as I only did single process calculations. There are problems running parallel version of the QAs, and I want to address this later.

user@localhost testoutputs$ cat ../singleqmtests.log | grep -c OK
120
user@localhost testoutputs$ cat ../singleqmtests.log | grep -c fail
12


From running about 66 jobs, I got 6 failed. (one failed job contributes to two "failed" words, same as an OK job)
Take dft_cr2 as an example:
[user@localhost testoutputs]$ diff dft_cr2.ok.out.nwparse dft_cr2.out.nwparse
61c61
< Effective nuclear repulsion energy (a.u.) 189.5566
---
> Effective nuclear repulsion energy (a.u.) 189.5565


It seems the difference is minute, but I cannot be very sure.
For example, in prop_h2o, I get:

[user@localhost testoutputs]$ diff prop_h2o.ok.out.nwparse prop_h2o.out.nwparse 
78,84c78,84
< XYZ 0.000 0.000 0.000
< isotropic = 232.456
< anisotropy = 38.112
< isotropic = 28.667
< anisotropy = 6.794
< isotropic = 28.667
< anisotropy = 6.794
---
> XYZ -0.000 0.000 -0.000
> isotropic = 223.154
> anisotropy = 38.458
> isotropic = 31.395
> anisotropy = 2.916
> isotropic = 30.362
> anisotropy = 4.268
122c122


And from md runs, I get the following difference:

[user@localhost testoutputs]$ diff ethanol_ti.ok.tst ethanol_ti.tst
5c5
< Energy           =   -8.957E+03
---
> Energy           =   -8.944E+03
8c8
< Energy           =   -8.961E+03
---
> Energy           =   -8.948E+03
11c11
< Energy           =   -8.955E+03
---
> Energy           =   -8.942E+03
14c14
< Energy           =   -8.957E+03
---
> Energy           =   -8.942E+03
17c17
< Energy           =   -8.957E+03
---
> Energy           =   -8.945E+03
20c20
< Energy           =   -8.956E+03
---
> Energy           =   -8.942E+03
23c23
< Energy           =   -8.959E+03
---
> Energy           =   -8.946E+03
26c26
< Energy           =   -8.964E+03
---
> Energy           =   -8.951E+03
29c29
< Energy           =   -8.957E+03
---
> Energy           =   -8.943E+03


Can someone tell me what have I done wrong and what should I do in order to fix the errors.

Plus, when I run dna in example folder, I got the following error:

 Task  times  cpu:        7.0s     wall:        8.4s


                                NWChem Input Module
                                -------------------


                               Analysis Input Module
                               ---------------------


                                  Analysis Module
                                  ---------------


 Reference coordinates read from dna_em.qrs

 Number of atoms is   388
 Topology read from dna.top

 Opening trj file dna_md.trj
 Closing trj file

 Trajectory file header from dna_md.trj

 Opening trj file dna_md.trj

 Opening copy file dna_super.trj

0:Segmentation Violation error, status=: 11
(rank:0 hostname:localhost.localdomain pid:3543):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
  0: ARMCI aborting 11 (0xb).
  0: ARMCI aborting 11 (0xb).
system error message: Inappropriate ioctl for device


Thanks in advance!!!

Clicked A Few Times
my ldd nwchem... (as it seems to be asked on other thread...)

[user@localhost md]$ ldd /opt/nwchem_intel2013_openmpi16/bin/nwchem
        linux-vdso.so.1 =>  (0x00007fc85f1dd000)
        libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x0000003cae200000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a17800000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003a17c00000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003a28800000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003a17400000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003a17000000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003a16c00000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003a1dc00000)

Forum Vet
Sacch
The Intel compiler sees to be generating bad code.
Please do the following

cd $NWCHEM_TOP/src/NWints/hondo
touch hnd_giaxyz.F
make FC=ifort FOPTIMIZE="-O1"
cd ../..
make FC=ifort link

Cheers, Edo

Clicked A Few Times
Dear Edo,

The ldd now is
[user@localhost LINUX64]$ l
total 77916
drwxrwxr-x. 2 user user     4096 Nov  9 23:40 .
drwxrwxr-x. 3 user user     4096 Oct 12 19:10 ..
-rwxrwxr-x. 1 user user    20548 Nov  9 22:40 depend.x
-rwxrwxr-x. 1 user user 79750571 Nov  9 23:40 nwchem
[user@localhost LINUX64]$ ldd ./nwchem
        linux-vdso.so.1 =>  (0x00007fff2f5ff000)
        libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x0000003cae200000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a17800000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003a28800000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003a17400000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003a17c00000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003a17000000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003a1dc00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003a16c00000)


The result of running nwchem on dna still fails.


[user@localhost LINUX64]$ cd ../../examples/md/dna
[user@localhost dna]$ rm ./*
rm: cannot remove `./temp': Is a directory
[user@localhost dna]$ rm .header
[user@localhost dna]$ cp temp/* .
[user@localhost dna]$ l
total 44

drwxr-x---.  3 user user  4096 Nov  9 23:50 .
drwxr-x---. 23 user user  4096 Oct 31 15:19 ..
-rw-r-----.  1 user user   420 Nov  9 23:50 dna.nw
-rw-r-----.  1 user user 28631 Nov  9 23:50 dna.pdb
drwxrwxr-x.  2 user user  4096 Nov  1 16:17 temp

       Job information
           ---------------

    hostname      = localhost.localdomain
    program       = /opt/nwchem_intel2013_openmpi16/bin/nwchem
    date          = Fri Nov  9 23:51:08 2012                  

    compiled      = Fri_Nov_09_22:46:51_2012
    source        = /home/user/Documents/codes/nwchem-6.1.1-src
    nwchem branch = 6.1.1                                      
    input         = dna.nw                                     
    prefix        = dna.                                       
    data base     = ./dna.db                                   
    status        = startup                                    
    nproc         =        1                                   
    time left     =     -1s                 

          Memory information
           ------------------

    heap     =  131072001 doubles =   1000.0 Mbytes
    stack    =  131072001 doubles =   1000.0 Mbytes
    global   =  262144000 doubles =   2000.0 Mbytes (distinct from heap & stack)
    total    =  524288002 doubles =   4000.0 Mbytes                             
    verify   = yes                                                              
    hardfail = no                                                               
...
Force field                           amber

 Directories used for fragment and segment files

                                       /opt/nwchem_intel2013_openmpi16/data/amber_s/
                                       /opt/nwchem_intel2013_openmpi16/data/amber_q/
                                       /opt/nwchem_intel2013_openmpi16/data/amber_x/
                                       /opt/nwchem_intel2013_openmpi16/data/amber_u/
                                       ./                                           

 Parameter files used to resolve force field parameters

                                       /opt/nwchem_intel2013_openmpi16/data/amber_s/amber.par
                                       /opt/nwchem_intel2013_openmpi16/data/amber_q/amber.par
                                       /opt/nwchem_intel2013_openmpi16/data/amber_x/amber.par
                                       /opt/nwchem_intel2013_openmpi16/data/amber_u/amber.par
                                       ./amber.par                     

 PDB geometry                          dna.pdb                                               

 Created segment                       ./DT_5.sgm
 Created segment                       ./DG.sgm  
 Created segment                       ./DC.sgm  
 Created segment                       ./DA_3.sgm

 Created sequence                      ./dna.seq

 
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_s/amber.par
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_q/amber.par
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_x/amber.par
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_u/amber.par
                                                                                             
 Total charge                              0.000000                            

Created topology                      dna.top

 Topology                              dna.top

 No command file found: Default restart directives

 Solute centered in x-dimension
 Solute centered in y-dimension
 Solute centered in z-dimension

 Boxsize determined to                     1.968800    2.003300    2.599200

 Created restart                       dna_em.rst

Reference coordinates read from dna_em.qrs

 Number of atoms is   388
 Topology read from dna.top

 Opening trj file dna_md.trj
 Closing trj file

 Trajectory file header from dna_md.trj

 Opening trj file dna_md.trj

 Opening copy file dna_super.trj

0:Segmentation Violation error, status=: 11
(rank:0 hostname:localhost.localdomain pid:25539):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
  0: ARMCI aborting 11 (0xb).
  0: ARMCI aborting 11 (0xb).
system error message: Inappropriate ioctl for device




I am wondering is there a way to cleanly put the commands into my scripts?

Forum Vet
Quote:Sacch Nov 9th 8:14 am


The result of running nwchem on dna still fails.




0:Segmentation Violation error, status=: 11
(rank:0 hostname:localhost.localdomain pid:25539):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
  0: ARMCI aborting 11 (0xb).
  0: ARMCI aborting 11 (0xb).
system error message: Inappropriate ioctl for device




The easiest fix is to recompile the whole NWChem using the FOPTIMIZE=-O1 argument

Quote:

I am wondering is there a way to cleanly put the commands into my scripts?


I am not quite sure to what commands you are referring to. Could you please
give me more details?

Thanks, Edo

Clicked A Few Times
Hi Edo,

The compiling result https://www.dropbox.com/s/30qmbn41easl004/addO1.log

The script

#!/bin/bash
# intel compilers
source /opt/intel/2013/bin/compilervars.sh intel64

export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64

export NWCHEM_MODULES="all python"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=16777216

export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

#sed -i 's/libpython$(PYTHONVERSION).a/libpython$(PYTHONVERSION).$(PYTHONLIBTYPE)/g' config/makefile.h
export MKLROOT=/opt/intel/2013/mkl
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"

export FC="ifort"
export CC="icc $CFLAGS"

cd $NWCHEM_TOP/src

make realclean
cp util/*.fh include/.

pwd

make FC=$FC FOPTIMIZE="-O1" CC=$CC nwchem_config
make FC=$FC FOPTIMIZE="-O1" CC=$CC 32_to_64
make FC=$FC FOPTIMIZE="-O1" CC=$CC -j4


There seems to have some forbidden words in this forum.. when those words appear, the post cannot be published.

Thanks in advance!

Forum Vet
need to set FDEBUG, too
Unfortunately, you'd better set FDEBUG, since its cuttent value is -O2.
Therefore, I suggest you to use the following compilation command

make FC=$FC FOPTIMIZE="-O1" FDEBUG="-O0 -g" CC=$CC -j4

Clicked A Few Times
Hi Edo,

The compilation process is attached here,
https://www.dropbox.com/s/lqmf5mvgkxc336i/addO1debug.log

And the run result of DNA example is
https://www.dropbox.com/s/nv6hpunnup3r14o/runadddrbug.log

ldd nwchem gives similar output as what I posted above but different addresses.

It doesn't seem to work. Is there anything I can do to double check the reason why it failed?

Many thanks,

Alvyn

Forum Vet
QM tests
Alvyn
Before dealing with the MD problems, I would like to know if the QM tests are running OK ...
is this the case?
As far as the MD examples are concerned, I am not 100% confident they are compatible with NWchem 6.1
Instead, I suggest you to run a few examples that are contained in a tutorial we routinely used to show how
to use QM/MM. I have uploaded a tarball named

http://nwchemgit.github.io/images/Tutorialqmm.tgz

Please download and un-tar the tarball and run the examples following the instructions contained in the included handouts.pdf file

Thanks, Edo

Clicked A Few Times
Hi Edo,

I am running nwchem through a virtual machine running in VirtualBox. For unknown reason it failed at a point and the machine hangs. The following zip file contains what have failed and the log file of the single process run "xrun.log".

https://www.dropbox.com/s/5bjowj1dqbzw6q2/failed.zip

I am wondering should I follow other's successful script or keep doing what I have been doing. Any suggestions? I need to learn to run some small cases on this machine before a mini super come to our facility.

Thanks and regards,

Alvyn

Forum Vet
Alvyn,
I had a look at some of the QM failures and they seem harmless to me.
Some of the QMMM numbers are clear failures.
Did you try to run the QMMM tutorial I sent you?

Clicked A Few Times
Hi Edo,

The prepare session went well and quick. As I run equilibrate with single core, it went well too. I guess. With the parallel run, there were errors.

The following files are diffs and error run of the code.

https://www.dropbox.com/s/j35iy6i9w8occu8/dyn-0.out.diff
https://www.dropbox.com/s/qyd2dn5i0hn98z7/dyn-0.parallel.out
https://www.dropbox.com/s/sfnx5p3c5vzcxfs/h2o_dyn.out.diff
https://www.dropbox.com/s/yzk3mmsp55bk0rc/h2o_dyn.rst.diff

In dyn-0.parallel.out, I found that nproc=1 in my output whereas in the tutorial it is 16.
The command I run the code was

mpiexec -np 4 /opt/nwchem_intel2013_openmpi16/bin/nwchem ./dyn-0.nw &> dyn-0.out &


after launching a mpd through

/opt/intel/2013/composer_xe_2013.0.079/mpirt/bin/intel64/mpd &


I guess there is something wrong with how I launch the program in the parallel mode.

And, for optimize job

https://www.dropbox.com/s/u7gixxc72ijld8q/opt-0.out.diff

I noticed that Bq-nuclear interaction energy differs from the tutorial output throughout the calculation.

For the rst files, it seems that the tutorial output has a part near the end of the file, but that part is absent from my rst file. The part looked like below:

>          2         2         0       229
>          4         4         0       552
>          6         6         0       898
>          8         8         0      1186
>         10        10         0      1428
>       1     14
>          1         1         1         0
>          0         0         1       218
>          3         3         1       538
>          4         4         1       862
>          5         5         1      1133


https://www.dropbox.com/s/xwsaxama8jq1amr/h2o_opt.rst.diff

BTW. as I crawl the forum, I found that the config.h of my armci build folder did not build correctly.

Here is the file $NWCHEM_TOP/src/tools/build/armci/config.log

https://www.dropbox.com/s/a9a6gdqc5g2b525/config.log

However, I have no idea how to fix this error, and to know if it is the culprit.

Many thanks,

Alvyn

Clicked A Few Times
It appeared that I ran the code with a wrong mpiexec. Now it runs well with multicore. Thanks Edo.


Forum >> NWChem's corner >> Compiling NWChem