Compiling NWCHEM on Centos6.1 openmpi1.6 ifort icc

Clicked A Few Times

3:57:52 AM PST - Wed, Nov 7th 2012

Dear all,

I have compiled a parallel version of nwchem and started testing on the QAs. The following is my setting files:

# file content of compile_nwchem.sh
#!/bin/bash
# intel compilers
source /opt/intel/2013/bin/compilervars.sh intel64
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=16777216

export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

export MKLROOT=/opt/intel/2013/mkl
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"

export FC="ifort"
export CC="icc $CFLAGS"

cd $NWCHEM_TOP/src
make realclean
cp util/*.fh include/.

pwd
make FC=$FC CC=$CC nwchem_config
make FC=$FC CC=$CC 32_to_64
make FC=$FC CC=$CC -j4

and install file:

#!/bin/bash
# intel compilers
export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM=/opt/nwchem_intel2013_openmpi16

mkdir -p $NWCHEM/bin $NWCHEM/data
cp $NWCHEM_TOP/bin/LINUX64/nwchem $NWCHEM/bin
cp $NWCHEM_TOP/bin/LINUX64/depend.x $NWCHEM/bin/
chmod a+rx $NWCHEM/bin

cd $NWCHEM_TOP/src/
cp -r data $NWCHEM

cd $NWCHEM_TOP/src/basis
cp -r libraries $NWCHEM/data/

cd $NWCHEM_TOP/src/nwpw
cp -r libraryps $NWCHEM/data/
chmod -R 755 $NWCHEM/data/*

#content of .nwchemrc
nwchem_basis_library /opt/nwchem_intel2013_openmpi16/data/libraries/
nwchem_nwpw_library /opt/nwchem_intel2013_openmpi16/data/libraryps/
ffield amber
amber_1 /opt/nwchem_intel2013_openmpi16/data/amber_s/
amber_2 /opt/nwchem_intel2013_openmpi16/data/amber_q/
amber_3 /opt/nwchem_intel2013_openmpi16/data/amber_x/
amber_4 /opt/nwchem_intel2013_openmpi16/data/amber_u/
spce    /opt/nwchem_intel2013_openmpi16/data/solvents/spce.rst
charmm_s /opt/nwchem_intel2013_openmpi16/data/charmm_s/
charmm_x /opt/nwchem_intel2013_openmpi16/data/charmm_x/

I have now run some qm tests and got some fails. Generally, the fails are likely from precisions. I have no idea whether these are serious or not. Please help me to identify. As the files are many I only picked some of them to show the difference. I have not fully run the tests as I only did single process calculations. There are problems running parallel version of the QAs, and I want to address this later.

user@localhost testoutputs$ cat ../singleqmtests.log | grep -c OK
120
user@localhost testoutputs$ cat ../singleqmtests.log | grep -c fail
12

From running about 66 jobs, I got 6 failed. (one failed job contributes to two "failed" words, same as an OK job)
Take dft_cr2 as an example:

[user@localhost testoutputs]$ diff dft_cr2.ok.out.nwparse dft_cr2.out.nwparse
61c61
< Effective nuclear repulsion energy (a.u.) 189.5566
---
> Effective nuclear repulsion energy (a.u.) 189.5565

It seems the difference is minute, but I cannot be very sure.
For example, in prop_h2o, I get:

[user@localhost testoutputs]$ diff prop_h2o.ok.out.nwparse prop_h2o.out.nwparse 
78,84c78,84
< XYZ 0.000 0.000 0.000
< isotropic = 232.456
< anisotropy = 38.112
< isotropic = 28.667
< anisotropy = 6.794
< isotropic = 28.667
< anisotropy = 6.794
---
> XYZ -0.000 0.000 -0.000
> isotropic = 223.154
> anisotropy = 38.458
> isotropic = 31.395
> anisotropy = 2.916
> isotropic = 30.362
> anisotropy = 4.268
122c122

And from md runs, I get the following difference:

[user@localhost testoutputs]$ diff ethanol_ti.ok.tst ethanol_ti.tst
5c5
< Energy           =   -8.957E+03
---
> Energy           =   -8.944E+03
8c8
< Energy           =   -8.961E+03
---
> Energy           =   -8.948E+03
11c11
< Energy           =   -8.955E+03
---
> Energy           =   -8.942E+03
14c14
< Energy           =   -8.957E+03
---
> Energy           =   -8.942E+03
17c17
< Energy           =   -8.957E+03
---
> Energy           =   -8.945E+03
20c20
< Energy           =   -8.956E+03
---
> Energy           =   -8.942E+03
23c23
< Energy           =   -8.959E+03
---
> Energy           =   -8.946E+03
26c26
< Energy           =   -8.964E+03
---
> Energy           =   -8.951E+03
29c29
< Energy           =   -8.957E+03
---
> Energy           =   -8.943E+03

Can someone tell me what have I done wrong and what should I do in order to fix the errors.

Plus, when I run dna in example folder, I got the following error:

 Task  times  cpu:        7.0s     wall:        8.4s


                                NWChem Input Module
                                -------------------


                               Analysis Input Module
                               ---------------------


                                  Analysis Module
                                  ---------------


 Reference coordinates read from dna_em.qrs

 Number of atoms is   388
 Topology read from dna.top

 Opening trj file dna_md.trj
 Closing trj file

 Trajectory file header from dna_md.trj

 Opening trj file dna_md.trj

 Opening copy file dna_super.trj

0:Segmentation Violation error, status=: 11
(rank:0 hostname:localhost.localdomain pid:3543):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
  0: ARMCI aborting 11 (0xb).
  0: ARMCI aborting 11 (0xb).
system error message: Inappropriate ioctl for device

Thanks in advance!!!

Clicked A Few Times

5:59:14 AM PST - Wed, Nov 7th 2012

my ldd nwchem... (as it seems to be asked on other thread...)

[user@localhost md]$ ldd /opt/nwchem_intel2013_openmpi16/bin/nwchem
        linux-vdso.so.1 =>  (0x00007fc85f1dd000)
        libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x0000003cae200000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a17800000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003a17c00000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003a28800000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003a17400000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003a17000000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003a16c00000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003a1dc00000)

Forum Vet

4:44:11 PM PST - Wed, Nov 7th 2012
Sacch The Intel compiler sees to be generating bad code. Please do the following cd $NWCHEM_TOP/src/NWints/hondo touch hnd_giaxyz.F make FC=ifort FOPTIMIZE="-O1" cd ../.. make FC=ifort link Cheers, Edo

Clicked A Few Times

9:14:49 AM PST - Fri, Nov 9th 2012

Dear Edo,

The ldd now is

[user@localhost LINUX64]$ l
total 77916
drwxrwxr-x. 2 user user     4096 Nov  9 23:40 .
drwxrwxr-x. 3 user user     4096 Oct 12 19:10 ..
-rwxrwxr-x. 1 user user    20548 Nov  9 22:40 depend.x
-rwxrwxr-x. 1 user user 79750571 Nov  9 23:40 nwchem
[user@localhost LINUX64]$ ldd ./nwchem
        linux-vdso.so.1 =>  (0x00007fff2f5ff000)
        libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x0000003cae200000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a17800000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003a28800000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003a17400000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003a17c00000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003a17000000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003a1dc00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003a16c00000)

The result of running nwchem on dna still fails.


[user@localhost LINUX64]$ cd ../../examples/md/dna
[user@localhost dna]$ rm ./*
rm: cannot remove `./temp': Is a directory
[user@localhost dna]$ rm .header
[user@localhost dna]$ cp temp/* .
[user@localhost dna]$ l
total 44

drwxr-x---.  3 user user  4096 Nov  9 23:50 .
drwxr-x---. 23 user user  4096 Oct 31 15:19 ..
-rw-r-----.  1 user user   420 Nov  9 23:50 dna.nw
-rw-r-----.  1 user user 28631 Nov  9 23:50 dna.pdb
drwxrwxr-x.  2 user user  4096 Nov  1 16:17 temp

       Job information
           ---------------

    hostname      = localhost.localdomain
    program       = /opt/nwchem_intel2013_openmpi16/bin/nwchem
    date          = Fri Nov  9 23:51:08 2012                  

    compiled      = Fri_Nov_09_22:46:51_2012
    source        = /home/user/Documents/codes/nwchem-6.1.1-src
    nwchem branch = 6.1.1                                      
    input         = dna.nw                                     
    prefix        = dna.                                       
    data base     = ./dna.db                                   
    status        = startup                                    
    nproc         =        1                                   
    time left     =     -1s                 

          Memory information
           ------------------

    heap     =  131072001 doubles =   1000.0 Mbytes
    stack    =  131072001 doubles =   1000.0 Mbytes
    global   =  262144000 doubles =   2000.0 Mbytes (distinct from heap & stack)
    total    =  524288002 doubles =   4000.0 Mbytes                             
    verify   = yes                                                              
    hardfail = no                                                               
...
Force field                           amber

 Directories used for fragment and segment files

                                       /opt/nwchem_intel2013_openmpi16/data/amber_s/
                                       /opt/nwchem_intel2013_openmpi16/data/amber_q/
                                       /opt/nwchem_intel2013_openmpi16/data/amber_x/
                                       /opt/nwchem_intel2013_openmpi16/data/amber_u/
                                       ./                                           

 Parameter files used to resolve force field parameters

                                       /opt/nwchem_intel2013_openmpi16/data/amber_s/amber.par
                                       /opt/nwchem_intel2013_openmpi16/data/amber_q/amber.par
                                       /opt/nwchem_intel2013_openmpi16/data/amber_x/amber.par
                                       /opt/nwchem_intel2013_openmpi16/data/amber_u/amber.par
                                       ./amber.par                     

 PDB geometry                          dna.pdb                                               

 Created segment                       ./DT_5.sgm
 Created segment                       ./DG.sgm  
 Created segment                       ./DC.sgm  
 Created segment                       ./DA_3.sgm

 Created sequence                      ./dna.seq

 
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_s/amber.par
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_q/amber.par
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_x/amber.par
 Parameter file                        /opt/nwchem_intel2013_openmpi16/data/amber_u/amber.par
                                                                                             
 Total charge                              0.000000                            

Created topology                      dna.top

 Topology                              dna.top

 No command file found: Default restart directives

 Solute centered in x-dimension
 Solute centered in y-dimension
 Solute centered in z-dimension

 Boxsize determined to                     1.968800    2.003300    2.599200

 Created restart                       dna_em.rst

Reference coordinates read from dna_em.qrs

 Number of atoms is   388
 Topology read from dna.top

 Opening trj file dna_md.trj
 Closing trj file

 Trajectory file header from dna_md.trj

 Opening trj file dna_md.trj

 Opening copy file dna_super.trj

0:Segmentation Violation error, status=: 11
(rank:0 hostname:localhost.localdomain pid:25539):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
  0: ARMCI aborting 11 (0xb).
  0: ARMCI aborting 11 (0xb).
system error message: Inappropriate ioctl for device

I am wondering is there a way to cleanly put the commands into my scripts?

Forum Vet

1:00:59 PM PST - Mon, Nov 12th 2012
Quote:Sacch Nov 9th 8:14 am The result of running nwchem on dna still fails. 0:Segmentation Violation error, status=: 11 (rank:0 hostname:localhost.localdomain pid:25539):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0 Last System Error Message from Task 0:: Inappropriate ioctl for device 0: ARMCI aborting 11 (0xb). 0: ARMCI aborting 11 (0xb). system error message: Inappropriate ioctl for device The easiest fix is to recompile the whole NWChem using the FOPTIMIZE=-O1 argument Quote: I am wondering is there a way to cleanly put the commands into my scripts? I am not quite sure to what commands you are referring to. Could you please give me more details? Thanks, Edo

Clicked A Few Times

4:25:45 AM PST - Wed, Nov 14th 2012

Hi Edo,

The compiling result https://www.dropbox.com/s/30qmbn41easl004/addO1.log

The script

#!/bin/bash
# intel compilers
source /opt/intel/2013/bin/compilervars.sh intel64

export NWCHEM_TOP=/home/user/Documents/codes/nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64

export NWCHEM_MODULES="all python"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=16777216

export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

#sed -i 's/libpython$(PYTHONVERSION).a/libpython$(PYTHONVERSION).$(PYTHONLIBTYPE)/g' config/makefile.h
export MKLROOT=/opt/intel/2013/mkl
export BLASOPT="-Wl,--start-group  $MKLROOT/lib/intel64/libmkl_intel_ilp64.a $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm"

export FC="ifort"
export CC="icc $CFLAGS"

cd $NWCHEM_TOP/src

make realclean
cp util/*.fh include/.

pwd

make FC=$FC FOPTIMIZE="-O1" CC=$CC nwchem_config
make FC=$FC FOPTIMIZE="-O1" CC=$CC 32_to_64
make FC=$FC FOPTIMIZE="-O1" CC=$CC -j4

There seems to have some forbidden words in this forum.. when those words appear, the post cannot be published.

Thanks in advance!

Forum Vet

11:33:13 AM PST - Wed, Nov 14th 2012
need to set FDEBUG, too
Unfortunately, you'd better set FDEBUG, since its cuttent value is -O2. Therefore, I suggest you to use the following compilation command make FC=$FC FOPTIMIZE="-O1" FDEBUG="-O0 -g" CC=$CC -j4

Clicked A Few Times

4:53:23 PM PST - Thu, Nov 15th 2012
Hi Edo, The compilation process is attached here, https://www.dropbox.com/s/lqmf5mvgkxc336i/addO1debug.log And the run result of DNA example is https://www.dropbox.com/s/nv6hpunnup3r14o/runadddrbug.log ldd nwchem gives similar output as what I posted above but different addresses. It doesn't seem to work. Is there anything I can do to double check the reason why it failed? Many thanks, Alvyn

Forum Vet

10:15:14 AM PST - Fri, Nov 16th 2012
QM tests
Alvyn Before dealing with the MD problems, I would like to know if the QM tests are running OK ... is this the case? As far as the MD examples are concerned, I am not 100% confident they are compatible with NWchem 6.1 Instead, I suggest you to run a few examples that are contained in a tutorial we routinely used to show how to use QM/MM. I have uploaded a tarball named http://nwchemgit.github.io/images/Tutorialqmm.tgz Please download and un-tar the tarball and run the examples following the instructions contained in the included handouts.pdf file Thanks, Edo

Clicked A Few Times

4:58:37 AM PST - Sat, Nov 17th 2012
Hi Edo, I am running nwchem through a virtual machine running in VirtualBox. For unknown reason it failed at a point and the machine hangs. The following zip file contains what have failed and the log file of the single process run "xrun.log". https://www.dropbox.com/s/5bjowj1dqbzw6q2/failed.zip I am wondering should I follow other's successful script or keep doing what I have been doing. Any suggestions? I need to learn to run some small cases on this machine before a mini super come to our facility. Thanks and regards, Alvyn

Forum Vet

11:55:22 AM PST - Mon, Nov 19th 2012
Alvyn, I had a look at some of the QM failures and they seem harmless to me. Some of the QMMM numbers are clear failures. Did you try to run the QMMM tutorial I sent you?

Clicked A Few Times

10:52:57 PM PST - Mon, Nov 19th 2012

Hi Edo,

The prepare session went well and quick. As I run equilibrate with single core, it went well too. I guess. With the parallel run, there were errors.

The following files are diffs and error run of the code.

https://www.dropbox.com/s/j35iy6i9w8occu8/dyn-0.out.diff
https://www.dropbox.com/s/qyd2dn5i0hn98z7/dyn-0.parallel.out
https://www.dropbox.com/s/sfnx5p3c5vzcxfs/h2o_dyn.out.diff
https://www.dropbox.com/s/yzk3mmsp55bk0rc/h2o_dyn.rst.diff

In dyn-0.parallel.out, I found that nproc=1 in my output whereas in the tutorial it is 16.
The command I run the code was

mpiexec -np 4 /opt/nwchem_intel2013_openmpi16/bin/nwchem ./dyn-0.nw &> dyn-0.out &

after launching a mpd through

/opt/intel/2013/composer_xe_2013.0.079/mpirt/bin/intel64/mpd &

I guess there is something wrong with how I launch the program in the parallel mode.

And, for optimize job

https://www.dropbox.com/s/u7gixxc72ijld8q/opt-0.out.diff

I noticed that Bq-nuclear interaction energy differs from the tutorial output throughout the calculation.

For the rst files, it seems that the tutorial output has a part near the end of the file, but that part is absent from my rst file. The part looked like below:

>          2         2         0       229
>          4         4         0       552
>          6         6         0       898
>          8         8         0      1186
>         10        10         0      1428
>       1     14
>          1         1         1         0
>          0         0         1       218
>          3         3         1       538
>          4         4         1       862
>          5         5         1      1133

https://www.dropbox.com/s/xwsaxama8jq1amr/h2o_opt.rst.diff

BTW. as I crawl the forum, I found that the config.h of my armci build folder did not build correctly.

Here is the file $NWCHEM_TOP/src/tools/build/armci/config.log

https://www.dropbox.com/s/a9a6gdqc5g2b525/config.log

However, I have no idea how to fix this error, and to know if it is the culprit.

Many thanks,

Alvyn

Clicked A Few Times

2:48:23 AM PST - Fri, Nov 23rd 2012
It appeared that I ran the code with a wrong mpiexec. Now it runs well with multicore. Thanks Edo.

Forum >> NWChem's corner >> Compiling NWChem