CCSD(T) issues with NWChem 6.1.1


Forum Vet
I hope somebody can help me with a running (or maybe kompiling?) problem of NWChem 6.1.1.
I have to say first that I'm just the system administrator who kompiled the software for some users and try now to run a simple job to test the installation. So I don't have the background of a "real" NWChem user.

Everything is ok as long as the job is running on one host but always breaks otherwise at CCSD(T) calculations with ARMCI errors like


================================================================
             the segmented parallel ccsd program:   24 nodes  
================================================================

  level of theory    ccsd(t)
number of core 8
number of occupied 22
number of virtual 241
number of deleted 0
total functions 274
number of shells 122
basis label 566

  ==== ccsd parameters ====
  iprt   =     0
convi = 0.100E-07
maxit = 35
mxvec = 5
memory 235792780
 IO offset    20.0000000000000     
IO error message >End of File
file_read_ga: failing writing to /cl_tmp/winkl/1a_DC_CCSDT.t2
Failed reading restart vector from /cl_tmp/winkl/1a_DC_CCSDT.t2
Using MP2 initial guess vector

 -------------------------------------------------------------------------
iter correlation delta rms T2 Non-T2 Main
energy energy error ampl ampl Block
time time time
-------------------------------------------------------------------------
0: error ival=4
(rank:0 hostname:f38 pid:26622):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)

Submit File (job scheduler is Son of Grid Engine):

#export MA_USE_ARMCI_MEM=YES
export ARMCI_DEFAULT_SHMMAX=16384
cd $TMPDIR
mpiexec /software/nwchem/nwchem-6.1.1/bin/LINUX64/nwchem /cl_tmp/winkl/1a_CCSDT_DZ.nw > /cl_tmp/winkl/1a_CCSDT_DZ.out

NWChem Input File:
  ECHO
START 1a_DC_CCSDT
PERMANENT_DIR /cl_tmp/winkl
SCRATCH_DIR /tmp
TITLE "Benzene + MeOH (MP2/6-31+G(d,p) geometry, CCSD(T)/aug-cc-pVDZ BSSE"
MEMORY stack 1700 heap 100 global 1700 MB
Geometry noautoz noautosym
C -0.037516 1.497715 -0.368972
C 1.183399 0.816740 -0.443203
C 1.201236 -0.579954 -0.530828
C -0.001026 -1.295847 -0.545287
C -1.221398 -0.615635 -0.466875
C -1.240076 0.781046 -0.379227
H -0.051502 2.578197 -0.295989
H 2.113885 1.370596 -0.425910
H 2.146259 -1.106454 -0.585914
H 0.013113 -2.376935 -0.612861
H -2.152107 -1.169760 -0.472446
H -2.184250 1.307292 -0.312447
O 0.066650 0.168253 2.832692
C 0.093368 -1.236539 3.073676
H 0.039393 0.309660 1.877253
H -0.799336 -1.731543 2.683798
H 0.123668 -1.363055 4.152641
H 0.978240 -1.705779 2.636901
END
basis "ao basis" spherical
C library aug-cc-pVDZ
H library aug-cc-pVDZ
O library aug-cc-pVDZ
bqC library C aug-cc-pVDZ
bqH library H aug-cc-pVDZ
bqO library O aug-cc-pVDZ
end
SCF
DIRECT
THRESH 1.0E-8
END
ccsd
freeze core atomic
thresh 1d-8
maxiter 35
end


Varying stack, heap or global and ARMCI_DEFAULT_SHMMAX does not really change anything (if I set them low, then another error occurs). Setting MA_USE_ARMCI_MEM = y/n does not have any effect.

The cluster on which I try to run the program has 12 cores/node and 4G Mem/Core and Mellanox Infiniband. The job scheduler allows to use 45G mem per node.

The memory system settings on each host are:

 /etc/security/limits.conf: memlock and stack: unlimited (soft and hard)

 /proc/sys/kernel/shmmax:
68719476736

Infiniband registerable memory: 128 GiB/node
  (PAGE_SIZE = 4096 Bytes, log_num_mtt = 25, log_mtts_per_seg = 0; I did not change log_mtts_per_seg because Mellanox has advised the OpenMPI community not to change the default setting)

The software is kompiled with Intel Composer XE 13 and OpenMPI 1.6.4.
I have to say that the Intel Kompiler and OpenMPI run well on the cluster.

Kompilation settings/options:

  source /software/Intel/composer_xe_2013.3.163/mkl/bin/mklvars.csh
setenv FC ifort
setenv CC icc
setenv NWCHEM_TOP /software/nwchem/nwchem-6.1.1
setenv NWCHEM_TARGET LINUX64
setenv TARGET LINUX64
setenv NWCHEM_MODULES all
setenv USE_MPI y
setenv USE_MPIF y
setenv MPI_LOC /software/openmpi/1.6.4/intel
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv LARGE_FILES TRUE
setenv IB_HOME /usr
setenv IB_LIB /usr/lib64
setenv IB_INCLUDE /usr/include
setenv IB_LIB_NAME "-libumad -libverbs -lpthread"
setenv IBV_FORK_SAFE 1
setenv LIBMPI "-lmpi_f90 -lmpi_f77 -lmpi -ldl -lm -Wl,--export-dynamic -lrt -lnsl -lutil"
setenv HAS_BLAS y
setenv BLAS_SIZE 8
setenv BLAS_LIB "-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm"
setenv BLASOPT "$BLAS_LIB"
setenv OMP_NUM_THREADS 1
setenv ARMCI_NETWORK OPENIB

I also applied the following patches before kompiling:
  Giaxyz.patch.gz, Texasmem2.patch.gz, Dstebz3.patch.gz

The kompilation exits successfully. Afterwards I applied the memory script:
   cd $NWCHEM_TOP/contrib
./getmem.nwchem
Total Memory  : 49419604 Kb
No. of processors  : 12
Total Memory/proc  : 4118300 KB = 4 GB
Executing make LIB_DEFINES+=" -DDFLT_TOT_MEM=523210240"/software/nwchem/nwchem-6.1.1/bin/LINUX64/depend.x
-I/software/nwchem/nwchem-6.1.1/src/tools/install/include > dependencies
   ifort -i8 -I/software/Intel/composer_xe_2013.3.163/mkl/include 
-I/software/Intel/composer_xe_2013.3.163/mkl/mkl/include -c -i8 -g
   -I. -I/software/nwchem/nwchem-6.1.1/src/include 
-I/software/nwchem/nwchem-6.1.1/src/tools/install/include -DEXT_INT -DLINUX
   -DLINUX64 -DPARALLEL_DIAG -DDFLT_TOT_MEM=523210240   memory_def.F
...

ldd /software/nwchem/nwchem-6.1.1/bin/LINUX64/nwchem:
       linux-vdso.so.1 =>  (0x00007fff69bff000)
libmkl_intel_ilp64.so => /software/Intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_ilp64.so (0x00002b619cb92000)
libmkl_sequential.so => /software/Intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_sequential.so (0x00002b619d2a9000)
libmkl_core.so => /software/Intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so (0x00002b619d956000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003565600000)
libm.so.6 => /lib64/libm.so.6 (0x0000003a6ae00000)
libmpi_f90.so.1 => /software/openmpi/1.6.4/intel/lib/libmpi_f90.so.1 (0x00002b619ebcb000)
libmpi_f77.so.1 => /software/openmpi/1.6.4/intel/lib/libmpi_f77.so.1 (0x00002b619edce000)
libmpi.so.1 => /software/openmpi/1.6.4/intel/lib/libmpi.so.1 (0x00002b619f006000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003a6a200000)
librt.so.1 => /lib64/librt.so.1 (0x0000003566e00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003a7aa00000)
libutil.so.1 => /lib64/libutil.so.1 (0x00000033c2600000)
libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x0000003a9b800000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003a9b400000)
libc.so.6 => /lib64/libc.so.6 (0x0000003a6a600000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003a75600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003a69e00000)
libifport.so.5 => /software/Intel/composer_xe_2013.3.163/kompiler/lib/intel64/libifport.so.5 (0x00002b619f405000)
libifcore.so.5 => /software/Intel/composer_xe_2013.3.163/kompiler/lib/intel64/libifcore.so.5 (0x00002b619f634000)
libimf.so => /software/Intel/composer_xe_2013.3.163/kompiler/lib/intel64/libimf.so (0x00002b619f96a000)
libintlc.so.5 => /software/Intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5 (0x00002b619fe27000)
libsvml.so => /software/Intel/composer_xe_2013.3.163/kompiler/lib/intel64/libsvml.so (0x00002b61a0075000)
libifcoremt.so.5 => /software/Intel/composer_xe_2013.3.163/compiler/lib/intel64/libifcoremt.so.5 (0x00002b61a0a3f000)
libirng.so => /software/Intel/composer_xe_2013.3.163/kompiler/lib/intel64/libirng.so (0x00002b61a0da5000)


So what could be the reason for the failure? Any help would be appreciated.

Ursula

Forum Vet
I did a test run on 32 and 24 processors, didn’t see any problems.

The configuration used is:

setenv ARMCI_DEFAULT_SHMMAX 2048
  1. setenv MA_USE_ARMCI_MEM Y

I assume you used two nodes, each with 12 cores. One option would be to increase ARMCI_DEFAULT_SHMMAX to 3072 , giving it a similar SHMMAX to core ratio.

Another test that would be worthwhile pursuing is running 8 cores per node.

Bert

Clicked A Few Times
I tried out your proposals.
Setting ARMCI_DEFAULT_SHMMAX lower than 4096 let the jobs crash immediately after submitting with:

in the output-file:
"rank:23 hostname:f37 pid:22579):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void *)0))"
....

and in the SGE Error-File:
Last System Error Message from Task 16:: Cannot allocate memory
Last System Error Message from Task 20:: Cannot allocate memory
...

Something with memory is wrong and I guess it's a system setting. But I have no idea anymore what to set different than above mentioned.

Ursula

Forum Vet
Ursula,
The failure you are experience is likely to be due to the configuration of your infiniband kernel driver.
From what I see in your detailed report, the current value of
/sys/module/mlx4_core/parameters/log_mtts_per_seg
seems set to 0

After reading the two web pages listed at the bottom, I think that for RDMA on IB to work you need log_mtts_per_seg to be set to a non zero value, for example 1.

http://web.archive.org/web/20121013223600/http:/www.ibm.com/developerworks/wikis/display/h...

https://groups.google.com/forum/?fromgroups=#!topic/hpctools/d_5mu1tNh7E

Clicked A Few Times
Thank you.
I tried out "log_mtts_per_seg=1" (with "log_num_mtt=24") and "log_mtts_per_seg=3" (with "log_num_mtt=22").
Unfortunately it does not change anything (the error messages stay the same).

After all a compilation problem? Did you try out Intel Composer XE 13 with OpenMPI 1.6.4 on some (Mellanox)
Infiniband hosts?

I also made a compilation with mvapich2 1.9 - the result is ever worse: the jobs hang immediately after submitting
and simple do nothing.

Forum Vet
Ursula,
Do you happen to have Intel Xeon Phi cards attached to the nodes of your cluster, by any chance?
Edo

Clicked A Few Times
Edo,
no, there are no such cards involved.
Ursula

Just Got Here
Ursula,
I have the same problem, now has some solution?


Forum >> NWChem's corner >> Running NWChem