CCSD(T) on SGI UV1000


Just Got Here
A help is needed!
0) I encountered a trouble in running CCSD(T) calculations.
The jobs is a CCSD(T) calculations of  (H2O)n (n=11) with CCSD(T)/aug-cc-pvdz.
I successfully run up to n=6. But by skipping n=7 - 10, I calculated n=11.
But I failed. I have tried the jobs, changing the number of nodes of the computer systems.
It seems the error is not influenced by the number of nodes. The largest number of nodes I tried is 60.
The last few lines are
_
0:CreateSharedRegion:kr_malloc failed KB=: 1548452
(rank:0 hostname:ccuv1ka.center.ims.ac.jp pid:47810):ARMCI DASSERT fail. ../../ga-5-1/armci/src/memory/shmem.c:Create_Shared\
_Region():1188 cond:0

The followings are the details.

1) The computer system used and the load module.
SGI UV1000 installed at Research center of computer science (RCCS) in Okazaki (probably known as the
computer center of Institute of Molecular Science)
CPU: intel Xenon E7-8837
   576 core
Memory 16GB/core

NWChem was compiled  under the following environment


setenv NWCHEM_TOP /home2/users/myaccount/program2012/nwchem-6.1
setenv NWCHEM_TARGET LINUX64
setenv NWCHEM_MODULES all

setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y

setenv MPI_LOC /opt/sgi/mpt/mpt-2.05
setenv MPI_LIB /opt/sgi/mpt/mpt-2.05/lib
setenv MPI_INCLUDE /opt/sgi/mpt/mpt-2.05/include
setenv LIBMPI "-lmpi"

setenv CC icc
setenv CXX icpc
setenv FC ifort
setenv F77 ifort
setenv F90 ifort



2) input file


start w11_P4205_ccsd
  1. from p.27 of manual

memory total 16000 mb
  #  for ccuv (SGI UV1000 has 16GB/core)

include /home/users/ae6/nw_calc/.def_scratch

charge 0


geometry units angstrom
  1. insert a xyz file
  2. geometry units angstrom determined by Sotiris
O   -0.89748105    0.19007807    2.41553173
H -0.90291269 -0.78982041 2.22872917
H -1.04785349 0.27853849 3.36471241
O 1.39877181 0.56736651 0.91789436
H 1.81866430 -0.31073616 1.03344319
H 0.62848267 0.55182303 1.52779025
O 0.90206313 0.44181904 -1.82183644
H 1.36993685 1.21679615 -2.16228341
H 1.00082599 0.52185251 -0.84646752
O -0.66287491 -2.36910810 1.73040685
H 0.29520467 -2.46605025 1.57607600
H -1.06049245 -2.53529272 0.85010455
O -1.51520884 -2.61006183 -0.93637255
H -2.20050226 -3.18853328 -1.29294824
H -1.76415373 -1.68198056 -1.22549659
O 1.22769215 -2.37937925 -1.58501744
H 1.21134500 -1.46149368 -1.91660881
H 0.28025260 -2.60853792 -1.49252851
O 2.09334964 -2.12957989 0.91511356
H 1.86835131 -2.29795608 -0.04311880
H 2.88556577 -2.64907069 1.09850351
O -2.34666113 1.60589920 0.37931333
H -1.96192984 1.17886162 1.16585687
H -1.78867173 2.39647873 0.23785913
O -0.61749903 3.81777583 -0.09636677
H 0.32546558 3.58863637 -0.22608917
H -0.84594567 4.37438441 -0.85098537
O 2.05080080 2.91341547 -0.30375456
H 1.99955778 2.13307419 0.29025643
H 2.75249231 3.46685605 0.06100942
O -1.90623274 -0.11951976 -1.69528543
H -2.15309886 0.50202474 -0.96739409
H -1.01011718 0.17690106 -1.93873929
end

basis spherical
H library aug-cc-pvdz
O library aug-cc-pvdz
end

ccsd
MAXITER 100
FREEZE atomic
end


title "CCSD(T) single point"

task CCSD(T)

ccsd; print none; end
scf; print none; end
---

3) The last few lines of the output

------------------------------------------
MP2 Energy (coupled cluster initial guess)
------------------------------------------
Reference energy: -836.552750653743715
MP2 Corr. energy: -2.488442765580423
Total MP2 energy: -839.041193419324145


****************************************************************************
the segmented parallel ccsd program: 60 nodes
****************************************************************************

level of theory    ccsd(t)
number of core 11
number of occupied 44
number of virtual 396
number of deleted 0
total functions 451
number of shells 209
basis label 566



  ***** ccsd parameters *****
iprt = 0
convi = 0.100E-05
maxit = 100
mxvec = 5
memory 1048351105
IO offset 20.0000000000000
IO error message >End of File
file_read_ga: failing writing to ./w11_P4205_ccsd.t2
Failed reading restart vector from ./w11_P4205_ccsd.t2
Using MP2 initial guess vector




iter     correlation     delta       rms       T2     Non-T2      Main
energy energy error ampl ampl Block
time time time


0:CreateSharedRegion:kr_malloc failed KB=: 1548452
(rank:0 hostname:ccuv1ka.center.ims.ac.jp pid:47810):ARMCI DASSERT fail. ../../ga-5-1/armci/src/memory/shmem.c:Create_Shared\
_Region():1188 cond:0

Forum Vet
ARMCI_DEFAULT_SHMMAX
Dear Prof. Iwata,
Your calculations are crashing while creating shared memory segments.
If you set the environmental variable ARMCI_DEFAULT_SHMMAX to a value of 2048 (or larger),
you should be able to overcome this problem.
Please keep in mind that
ARMCI_DEFAULT_SHMMAX has to be greater or equal than the kernel parameter kernel.shmmax
(Root can only change kernel.shmmax, therefore you might have to ask the system
administrator to do it).
For example, if the value of kernel.shmmax is 4294967296 as in the example below,
ARMCI_DEFAULT_SHMMAX can be at most 4096 (4294967296=4096*1024*1024)

$ sysctl kernel.shmmax
kernel.shmmax = 4294967296

Cheers, Edo


Forum >> NWChem's corner >> Compiling NWChem