TCE ARMCI DASSERT fail in running NWChem-6.1


Just Got Here
Hello,

We have NWChem 6.1 built with intel-11.0.083 and Openmpi-1.4.2 on Infiniband (IB) cluster back in 2012. We did not see problem until recently a user tried to run TCE (I guess nobobdy use TCE in nwchem before) with large memory size (small test seems fine).

For large memory, the error message is


Last System Error Message from Task 0:: Inappropriate ioctl for device


MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode 1880870161.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.


0:0:createfile: failed ga_create size=:: 1880870161
(rank:0 hostname:hnd17 pid:25208):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0


The input file is
---
restart ni-c2h4 ccsd
 title "UCCSD calculation of NiC2H4"
echo

memory total 20000 Mb
geometry units angstroms
symmetry c2v
Ni 0.000000 0.000000 1.781951
C 0.000000 -0.700112 -0.461338
H 0.928257 1.259678 -0.576692
H -0.928257 -1.259678 -0.576692
end
BASIS spherical
(omit the basis sets here)

relativistic
 douglas-krollend
scf
 SINGLET  rohf
maxiter 300end
TCE
MAXITER 300 FREEZE core atomic
print debug SCF
diis 8 2eorb
2emet 13
split 2
CCSD(T)
END
TASK TCE ENERGY
---

Is this a TCE issue or a system IB bandwidth issue? Thanks.

Forum Vet
It's a memory problem.
How many processor did you use? Most of the memory usage gets reduced by using more processors in the calculatiob


PS NWChem 6.5 has been out for a while. We have improved some of the error message that the TCE uses in 6.5

Just Got Here
NWChem-6.5, error for TCE
I compiled NWChem-6.5 on our IB cluster using intel-12.1.3 and Openmpi-1.6.2, compiled successfully and the simply test runs were fine, however the TCE part still failed the same as I posted for NWChem-6.1 before, the error message is



Fock matrix recomputed
1-e file size   =            18218
1-e file name = ./nic2h4.f1
Cpu & wall time / sec 4628.5 4653.5
4-electron integrals stored in orbital form
create a file: size = 195297739 file name = ./nic2h4.v2

Zero scratch handle:      -999 size: 195297739
nblocks: 66 size: 2959057

v2    file size   =        195297739
4-index algorithm nr. 13 is used
imaxsize = 30
imaxsize ichop = 0
create a file: size = 1880870161 file name = ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
resource(s) deleted
0:allocate: failed to attach to shared region id=: 1245195
(rank:0 hostname:red21 pid:15221):ARMCI DASSERT fail. ../../ga-5-3/armci/src/memory/shmem.c:armci_allocate():1133 cond:0
Last System Error Message from Task 0:: Cannot allocate memory



For your info, I have these settings for my run using 4 cpus (each cpu has 20GB memory allocated)

memory total 15000 Mb

export ARMCI_DEFAULT_SHMMAX=8192

The main compiling settings are
--
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=OPENIB
export IB_INCLUDE=/usr/include/infiniband
export IB_LIB=/usr/lib64
export MSG_COMMS=MPI
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/sharcnet/openmpi/1.6.2/intel
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-lmpi_f90 -lmpi_f77 -lmpi -ldl -Wl, -lnsl -lutil"

export NWCHEM_MODULES=all

export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export LIB_DEFINES="-DDFLT_TOT_MEM=16777216"
  1. MKL



Any helps are welcomed!
Thanks a lot.


Forum >> NWChem's corner >> Running NWChem