Compiling NWChem Advice....


Clicked A Few Times
Hi All,

I same across the following web-site with good instructions on compiling NWChem:

HPC Advisory Council - NWChem Best Practice


cheers,

Chris.

Forum Vet
Thanks for sharing. Bert


Quote:Chrisn77 Dec 13th 11:35 am
Hi All,

I same across the following web-site with good instructions on compiling NWChem:

HPC Advisory Council - NWChem Best Practice


cheers,

Chris.

  • Guest -

Clicked A Few Times
I got the following error message right after the initialization of the program after building NWChem following these directions:

                                NWChem DFT Module
-----------------


Ti2O4 1Ag C2h BP86/aug-cc-pVDZ(-PP) TDDFT


rank 0 in job 5 computea-01_34827 caused collective abort of all ranks
 exit status of rank 0: killed by signal 9

I was able to run a simple H2O calculation, but as soon as I run a little larger calculation, the job failed. I am using the following compilers and libraries:

/app/intel/Compiler/11.1/072/bin/intel64/ifort
/app/intel/Compiler/11.1/064/bin/intel64/icc
/app/intel/impi/4.0.0.025/
Interconnect is OpenIB.
And I am running on a single node.

Any help will be greatly appreciated.

Shenggang.

Forum Vet
How much memory are you asking for and how much memory do you have? It's unclear why the calculation would abort at this stage. Could you post the input deck and a little bit more about the output.

Bert

Quote:Shenggangli Jun 13th 7:09 am
I got the following error message right after the initialization of the program after building NWChem following these directions:

                                NWChem DFT Module
-----------------


Ti2O4 1Ag C2h BP86/aug-cc-pVDZ(-PP) TDDFT


rank 0 in job 5 computea-01_34827 caused collective abort of all ranks
 exit status of rank 0: killed by signal 9

I was able to run a simple H2O calculation, but as soon as I run a little larger calculation, the job failed. I am using the following compilers and libraries:

/app/intel/Compiler/11.1/072/bin/intel64/ifort
/app/intel/Compiler/11.1/064/bin/intel64/icc
/app/intel/impi/4.0.0.025/
Interconnect is OpenIB.
And I am running on a single node.

Any help will be greatly appreciated.

Shenggang.

Clicked A Few Times
The input is pretty regular, and I only claim 512MB of memory per processor, and I have 16 GB in total. The same job runs with a build using the GNU compilers.

echo
start ti2o4.1ag.c2h.tddft.bp86.ad
memory 512 mb
title "Ti2O4 1Ag C2h BP86/aug-cc-pVDZ(-PP) TDDFT"
charge 0

geometry
 TI     0.000000    1.358790    0.000000
O 0.000000 0.000000 1.259175
O 0.000000 0.000000 -1.259175
TI 0.000000 -1.358790 0.000000
O 1.390706 -2.234545 0.000000
O -1.390706 2.234545 0.000000
symmetry c2h
end

basis spherical
O library aug-cc-pVDZ
  1. cc-pVDZ-PP
Ti s
     1179.85   0.000280
175.264 0.001649
15.4371 0.054881
5.46437 -0.384598
1.11430 0.726613
0.474843 0.464209
0.084602 0.018129
0.032314 -0.003034
Ti s
     1179.85   -0.000073
175.264 -0.000495
15.4371 -0.013825
5.46437 0.105472
1.11430 -0.248678
0.474843 -0.287478
0.084602 0.598418
0.032314 0.558246
Ti s
     1179.85   0.000171
175.264 0.000836
15.4371 0.032278
5.46437 -0.235207
1.11430 0.924182
0.474843 -0.350426
0.084602 -1.780466
0.032314 1.729235
Ti s
     0.032314   1.0
Ti p
     30.4206   0.005258
8.26043 -0.072253
2.11785 0.314105
0.960790 0.506006
0.418681 0.290650
0.116841 0.027032
0.039440 -0.003770
Ti p
     30.4206   -0.001192
8.26043 0.018066
2.11785 -0.087998
0.960790 -0.174911
0.418681 -0.045726
0.116841 0.558068
0.039440 0.556683
Ti p
     30.4206   -0.001470
8.26043 0.022249
2.11785 -0.105142
0.960790 -0.216938
0.418681 -0.081574
0.116841 0.803506
0.039440 0.315929
Ti p
     0.039440   1.0
Ti d
     21.7936   0.025345
6.7014 0.112634
2.30235 0.289321
0.81126 0.427201
0.268201 0.390352
0.077546 0.148302
Ti d
     21.7936   -0.025853
6.7014 -0.118705
2.30235 -0.292861
0.81126 -0.337560
0.268201 0.306027
0.077546 0.711637
Ti d
     0.077546   1.0
Ti f
     0.4980   1.0
  1. aug-cc-pVDZ-PP
Ti s
     0.0123   1.0
Ti p
     0.0133   1.0
Ti d
     0.0224   1.0
Ti f
     0.1185   1.0
end

ecp
Ti nelec 10
Ti S
2 12.68545278327510 130.80332818219401
2 5.60443403028738 20.28841794549380
Ti P
2 12.16470843145440 32.17007242591840
2 11.83573845353970 54.94536681653020
2 4.20933536504579 2.08550079252638
2 4.87027289311104 6.50179581409069
Ti D
2 16.69258854674990 -9.43456596695940
2 17.44887629757160 -15.01358236125450
2 4.87042604412202 0.06111200391995
2 4.88118144645471 0.05753055973493
Ti F
2 8.70406407124751 -1.96536087719181
2 10.12717654527650 -3.75480071683613
end

dft
  vectors swap 28 29 output ti2o4.1ag.c2h.tddft.bp86.ad.movecs
xc becke88 perdew86
grid xfine
mult 1
iterations 100
end
task dft

Forum Vet
This clearly sounds like a compiler issue. I've seen others finding problems with the latest Intel compiler. Do you have access to another compiler (other then GNU that works), or Intel 10?

Bert



Quote:Shenggangli Jul 18th 1:55 am
The input is pretty regular, and I only claim 512MB of memory per processor, and I have 16 GB in total. The same job runs with a build using the GNU compilers.

echo
start ti2o4.1ag.c2h.tddft.bp86.ad
memory 512 mb
title "Ti2O4 1Ag C2h BP86/aug-cc-pVDZ(-PP) TDDFT"
charge 0

geometry
 TI     0.000000    1.358790    0.000000
O 0.000000 0.000000 1.259175
O 0.000000 0.000000 -1.259175
TI 0.000000 -1.358790 0.000000
O 1.390706 -2.234545 0.000000
O -1.390706 2.234545 0.000000
symmetry c2h
end

basis spherical
O library aug-cc-pVDZ
  1. cc-pVDZ-PP
Ti s
     1179.85   0.000280
175.264 0.001649
15.4371 0.054881
5.46437 -0.384598
1.11430 0.726613
0.474843 0.464209
0.084602 0.018129
0.032314 -0.003034
Ti s
     1179.85   -0.000073
175.264 -0.000495
15.4371 -0.013825
5.46437 0.105472
1.11430 -0.248678
0.474843 -0.287478
0.084602 0.598418
0.032314 0.558246
Ti s
     1179.85   0.000171
175.264 0.000836
15.4371 0.032278
5.46437 -0.235207
1.11430 0.924182
0.474843 -0.350426
0.084602 -1.780466
0.032314 1.729235
Ti s
     0.032314   1.0
Ti p
     30.4206   0.005258
8.26043 -0.072253
2.11785 0.314105
0.960790 0.506006
0.418681 0.290650
0.116841 0.027032
0.039440 -0.003770
Ti p
     30.4206   -0.001192
8.26043 0.018066
2.11785 -0.087998
0.960790 -0.174911
0.418681 -0.045726
0.116841 0.558068
0.039440 0.556683
Ti p
     30.4206   -0.001470
8.26043 0.022249
2.11785 -0.105142
0.960790 -0.216938
0.418681 -0.081574
0.116841 0.803506
0.039440 0.315929
Ti p
     0.039440   1.0
Ti d
     21.7936   0.025345
6.7014 0.112634
2.30235 0.289321
0.81126 0.427201
0.268201 0.390352
0.077546 0.148302
Ti d
     21.7936   -0.025853
6.7014 -0.118705
2.30235 -0.292861
0.81126 -0.337560
0.268201 0.306027
0.077546 0.711637
Ti d
     0.077546   1.0
Ti f
     0.4980   1.0
  1. aug-cc-pVDZ-PP
Ti s
     0.0123   1.0
Ti p
     0.0133   1.0
Ti d
     0.0224   1.0
Ti f
     0.1185   1.0
end

ecp
Ti nelec 10
Ti S
2 12.68545278327510 130.80332818219401
2 5.60443403028738 20.28841794549380
Ti P
2 12.16470843145440 32.17007242591840
2 11.83573845353970 54.94536681653020
2 4.20933536504579 2.08550079252638
2 4.87027289311104 6.50179581409069
Ti D
2 16.69258854674990 -9.43456596695940
2 17.44887629757160 -15.01358236125450
2 4.87042604412202 0.06111200391995
2 4.88118144645471 0.05753055973493
Ti F
2 8.70406407124751 -1.96536087719181
2 10.12717654527650 -3.75480071683613
end

dft
  vectors swap 28 29 output ti2o4.1ag.c2h.tddft.bp86.ad.movecs
xc becke88 perdew86
grid xfine
mult 1
iterations 100
end
task dft

Clicked A Few Times
We have. I will try some old versions of the Intel compilers. I will let you know what happenes.

Thanks.

Quote:Bert Jul 18th 8:59 pm
This clearly sounds like a compiler issue. I've seen others finding problems with the latest Intel compiler. Do you have access to another compiler (other then GNU that works), or Intel 10?

Bert


Clicked A Few Times
I doubt it is really due to the compiler problem. As indicated by the following output (NWChem built with Intel 10.1.025 and MKL 10.1 and MPICH), maybe it has something to with MPI. The same MPI library works for Molpro and other programs.

 
NWChem SCF Module
-----------------


                       WO3 1A1 C3v B3LYP/aug-cc-pVDZ(-PP)



 ao basis        = "ao basis"
functions = 123
atoms = 4
closed shells = 19
open shells = 0
charge = 0.00
wavefunction = RHF
input vectors = atomic
output vectors = ./wo3.1a1.c3v.tddft.b3lyp.ad.movecs
use symmetry = T
symmetry adapt = T


Summary of "ao basis" -> "ao basis" (spherical)
------------------------------------------------------------------------------
Tag Description Shells Functions and Types
---------------- ------------------------------ ------ ---------------------
O aug-cc-pVDZ 9 23 4s3p2d
W user specified 16 54 5s5p4d2f


     Symmetry analysis of basis
--------------------------

       a'         74
a" 49

p2_26088: p4_error: net_recv read: probable EOF on socket: 1
p4_26092: p4_error: net_recv read: probable EOF on socket: 1
p1_26087: p4_error: net_recv read: probable EOF on socket: 1
Last System Error Message from Task 0:: Inappropriate ioctl for device
rm_l_2_26091: (1.585938) net_send: could not write to fd=7, errno = 32
p3_26090: p4_error: net_recv read: probable EOF on socket: 1
rm_l_1_26089: (1.589844) net_send: could not write to fd=6, errno = 32
   p4_error: latest msg from perror: Bad file descriptor
rm_l_5_26096: (1.582031) net_send: could not write to fd=11, errno = 9
rm_l_5_26096: p4_error: net_send write: -1
rm_l_3_26093: (1.585938) net_send: could not write to fd=8, errno = 32
rm_l_4_26095: (1.585938) net_send: could not write to fd=9, errno = 32
[0] MPI Abort by user Aborting program !
0:Child process terminated prematurely, status=: 11
(rank:0 hostname:n13 pid:26015):ARMCI DASSERT fail. signaltrap.c:SigChldHandler():167 cond:0
[0] Aborting program!
p0_26015: p4_error: : 0
p3_26090: (7.585938) net_send: could not write to fd=8, errno = 32
p0_26015: (11.617188) net_send: could not write to fd=4, errno = 32
mpiexec: Warning: task 0 exited with status 1.

Quote:Shenggangli Jul 19th 2:09 am
We have. I will try some old versions of the Intel compilers. I will let you know what happenes.

Thanks.

Quote:Bert Jul 18th 8:59 pm
This clearly sounds like a compiler issue. I've seen others finding problems with the latest Intel compiler. Do you have access to another compiler (other then GNU that works), or Intel 10?

Bert


Forum Vet
As NWChem doesn't really use much MPI, but rather directly runs over ARMCI (GA network layer) that uses the IB API, I doubt it is MPI. Can you post the complete build environment (i.e., which environment variables have you set)?

Bert


Quote:Shenggangli Jul 21st 8:49 am
I doubt it is really due to the compiler problem. As indicated by the following output (NWChem built with Intel 10.1.025 and MKL 10.1 and MPICH), maybe it has something to with MPI. The same MPI library works for Molpro and other programs.

 
NWChem SCF Module
-----------------


                       WO3 1A1 C3v B3LYP/aug-cc-pVDZ(-PP)



 ao basis        = "ao basis"
functions = 123
atoms = 4
closed shells = 19
open shells = 0
charge = 0.00
wavefunction = RHF
input vectors = atomic
output vectors = ./wo3.1a1.c3v.tddft.b3lyp.ad.movecs
use symmetry = T
symmetry adapt = T


Summary of "ao basis" -> "ao basis" (spherical)
------------------------------------------------------------------------------
Tag Description Shells Functions and Types
---------------- ------------------------------ ------ ---------------------
O aug-cc-pVDZ 9 23 4s3p2d
W user specified 16 54 5s5p4d2f


     Symmetry analysis of basis
--------------------------

       a'         74
a" 49

p2_26088: p4_error: net_recv read: probable EOF on socket: 1
p4_26092: p4_error: net_recv read: probable EOF on socket: 1
p1_26087: p4_error: net_recv read: probable EOF on socket: 1
Last System Error Message from Task 0:: Inappropriate ioctl for device
rm_l_2_26091: (1.585938) net_send: could not write to fd=7, errno = 32
p3_26090: p4_error: net_recv read: probable EOF on socket: 1
rm_l_1_26089: (1.589844) net_send: could not write to fd=6, errno = 32
   p4_error: latest msg from perror: Bad file descriptor
rm_l_5_26096: (1.582031) net_send: could not write to fd=11, errno = 9
rm_l_5_26096: p4_error: net_send write: -1
rm_l_3_26093: (1.585938) net_send: could not write to fd=8, errno = 32
rm_l_4_26095: (1.585938) net_send: could not write to fd=9, errno = 32
[0] MPI Abort by user Aborting program !
0:Child process terminated prematurely, status=: 11
(rank:0 hostname:n13 pid:26015):ARMCI DASSERT fail. signaltrap.c:SigChldHandler():167 cond:0
[0] Aborting program!
p0_26015: p4_error: : 0
p3_26090: (7.585938) net_send: could not write to fd=8, errno = 32
p0_26015: (11.617188) net_send: could not write to fd=4, errno = 32
mpiexec: Warning: task 0 exited with status 1.

Quote:Shenggangli Jul 19th 2:09 am
We have. I will try some old versions of the Intel compilers. I will let you know what happenes.

Thanks.

Quote:Bert Jul 18th 8:59 pm
This clearly sounds like a compiler issue. I've seen others finding problems with the latest Intel compiler. Do you have access to another compiler (other then GNU that works), or Intel 10?

Bert


Clicked A Few Times
Hi, Bert, I used ifort/icc 10.1.026.

  1. !/bin/bash

export NWCHEM_TOP=/home/sli/source/nwchem-6.0
export LARGE_FILES=TRUE
export NWCHEM_TARGET=LINUX64
export USE_MPI=y
export USE_MPIF=y
export FOPTIMIZE="-O3 -axSTPOW"
export COPTIMIZE="-O3 -axSTPOW"
export MPI_LIB=/usr/lib64/MPICH/p4/intel
export MPI_INCLUDE=/usr/include
export LIBMPI="-lmpich"
export BLASOPT="-L/opt/intel/Compiler/11.0/084/mkl/lib/em64t -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -liomp5 -lpthread"
export FC=ifort
export CC=icc
export NWCHEM_MODULES=all
make FC=ifort CC=icc NWCHEM_TOP=NWCHEM_TOP=/home/sli/source/nwchem-6.0 LARGE_FILES=TRUE NWCHEM_TARGET=LINUX64 USE_MPI=y USE_MPIF=y OPTIMIZE="-O3 -axSTPOW" COPTIMIZE="-O3 -axSTPOW" MPI_LIB=/usr/lib64/MPICH/p4/intel MPI_INCLUDE=/usr/include LIBMPI="-lmpich" BLASOPT="-L/opt/intel/Compiler/11.0/084/mkl/lib/em64t -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -liomp5 -lpthread" NWCHEM_MODULES=all nwchem_config
make FC=ifort CC=icc NWCHEM_TOP=NWCHEM_TOP=/home/sli/source/nwchem-6.0 LARGE_FILES=TRUE NWCHEM_TARGET=LINUX64 USE_MPI=y USE_MPIF=y OPTIMIZE="-O3 -axSTPOW" COPTIMIZE="-O3 -axSTPOW" MPI_LIB=/usr/lib64/MPICH/p4/intel MPI_INCLUDE=/usr/include LIBMPI="-lmpich" BLASOPT="-L/opt/intel/Compiler/11.0/084/mkl/lib/em64t -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -liomp5 -lpthread" NWCHEM_MODULES=all

Clicked A Few Times
The following is what I did using the latest versions of the Intel compilers (version 2011 update 4 I believe). Does the use TCGMSG affect the performance significantly?

  1. !/bin/bash
export LARGE_FILES=TRUE
export ENABLE_COMPONENT=yes
export TCGRSH=/usr/bin/ssh
export USE_MPIF=y
export FOPTIMIZE="-O3 -axSSE2,SSE3,SSSE3,SSE4.1,xSSE4.2 -no-prec-div -funroll-loops -unroll-aggressive"
export COPTIMIZE="-O3 -axSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 -no-prec-div -funroll-loops"
export BLASOPT="-L/opt/intel/composerxe/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core"
export FC=ifort
export CC=icc

  1. export USE_GPROF=yes
export USE_SUBGROUPS=yes
export USE_MPI=yes
  1. export OLD_GA=yes
export MSG_COMMS=MPI
export USE_PYTHON64=yes
export MPI_LOC=/usr
export MPI_INCLUDE=$MPI_LOC/include
export MPI_LIB=$MPI_LOC/lib64/MPICH/p4/intel
export LIBMPI="-lmpich-intel -lmpi -lmpichfarg -lbproc -lmpe"
  1. export LIBMPI="-lfmpich -lmpich -lpthread" # MPICH2 1.2
  2. export LIBMPI="-lmpichf90 -lmpich -lmpl -lpthread" # MPICH2 1.3.1
export PYTHONHOME=/usr
export PYTHONVERSION=2.4
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export NWCHEM_EXECUTABLE=$NWCHEM_TOP/bin/LINUX64/nwchem
export PYTHONPATH=./:$NWCHEM_TOP/contrib/python/
cd $NWCHEM_TOP/src
  1. make DIAG=PAR FC=gfortran CC=gcc CDEBUG="-g -ffpe-trap=invalid,zero,overflow" FDEBUG="-g -ffpe-trap=invalid,zero,overflow" FOPTIMIZE="-g -ffpe-trap=invalid,zero,overflow" COPTIMIZE="-g -ffpe-trap=invalid,zero,overflow" nwchem_config
  2. make DIAG=PAR FC=gfortran CC=gcc CDEBUG="-g -ffpe-trap=invalid,zero,overflow" FDEBUG="-g -ffpe-trap=invalid,zero,overflow" FOPTIMIZE="-g -ffpe-trap=invalid,zero,overflow" COPTIMIZE="-g -ffpe-trap=invalid,zero,overflow" $1
  3. make DIAG=PAR FC=gfortran CC=gcc CDEBUG="-pg -g" FDEBUG="-pg -g" FOPTIMIZE="-pg -g -O0" COPTIMIZE="-pg -g -O0" nwchem_config
  4. make DIAG=PAR FC=gfortran CC=gcc CDEBUG="-pg -g" FDEBUG="-pg -g" FOPTIMIZE="-pg -g -O0" COPTIMIZE="-pg -g -O0" $1
make DIAG=PAR FC=ifort CC=icc LARGE_FILES=TRUE ENABLE_COMPONENT=yes TCGRSH=/usr/bin/ssh USE_MPIF=y FOPTIMIZE="-O3 -axSSE2,SSE3,SSSE3,SSE4.1,xSSE4.2 -no-prec-div -funroll-loops -unroll-aggressive" COPTIMIZE="-O3 -axSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 -no-prec-div -funroll-loops" BLASOPT="-L/opt/intel/composerxe/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core" USE_SUBGROUPS=yes USE_MPI=yes MSG_COMMS=MPI USE_PYTHON64=yes MPI_LOC=/usr MPI_INCLUDE=/usr/include MPI_LIB=/usr/lib64/MPICH/p4/intel LIBMPI="-lmpich-intel -lmpi -lmpichfarg -lbproc -lmpe" PYTHONHOME=/usr PYTHONVERSION=2.4 NWCHEM_TOP=/home/sli/source/nwchem-6.0 NWCHEM_TARGET=LINUX64 NWCHEM_MODULES="all python" NWCHEM_EXECUTABLE=/home/sli/source/nwchem-6.0/bin/LINUX64/nwchem PYTHONPATH=./:/home/sli/source/nwchem-6.0/contrib/python/ nwchem_config
make DIAG=PAR FC=ifort CC=icc LARGE_FILES=TRUE ENABLE_COMPONENT=yes TCGRSH=/usr/bin/ssh USE_MPIF=y FOPTIMIZE="-O3 -axSSE2,SSE3,SSSE3,SSE4.1,xSSE4.2 -no-prec-div -funroll-loops -unroll-aggressive" COPTIMIZE="-O3 -axSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 -no-prec-div -funroll-loops" BLASOPT="-L/opt/intel/composerxe/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core" USE_SUBGROUPS=yes USE_MPI=yes MSG_COMMS=MPI USE_PYTHON64=yes MPI_LOC=/usr MPI_INCLUDE=/usr/include MPI_LIB=/usr/lib64/MPICH/p4/intel LIBMPI="-lmpich-intel -lmpi -lmpichfarg -lbproc -lmpe" PYTHONHOME=/usr PYTHONVERSION=2.4 NWCHEM_TOP=/home/sli/source/nwchem-6.0 NWCHEM_TARGET=LINUX64 NWCHEM_MODULES="all python" NWCHEM_EXECUTABLE=/home/sli/source/nwchem-6.0/bin/LINUX64/nwchem PYTHONPATH=./:/home/sli/source/nwchem-6.0/contrib/python/ $1

Forum Vet
Do you get NWChem to run correctly with this environment?

With TCGMSG the parallel performance beyond one node will not be good as the communication goes over sockets. To go beyond one node you should choose the appropriate network and set ARMCI_NETWORK and other parameters as defined in the INSTALL file.

Bert

  • Guest -
No, both builds with versions 10.1 and 2011 fail to run.

To my impression, the TCGMSG build usually works, although I get poor performance on the dual quad-core processor system with a single node. How scalable is NWChem on these dual quad-core processor systems (I mean for just one node)?

Quote:Bert Aug 9th 12:18 am
Do you get NWChem to run correctly with this environment?

With TCGMSG the parallel performance beyond one node will not be good as the communication goes over sockets. To go beyond one node you should choose the appropriate network and set ARMCI_NETWORK and other parameters as defined in the INSTALL file.

Bert


Forum >> NWChem's corner >> Compiling NWChem