PBE+CCSD calculation memory problem


Clicked A Few Times
Hi all,

I am having a problem with a PBE+CCSD total energy calculation (NWchem 6.8) for the C2H4 molecule. It seems like a memory allocation issue. Here is the error message (24 processors):

half-transformed integrals in memory

2-e (intermediate) file size =      2469456640
2-e (intermediate) file name = ./C2H4.v2i
1: WARNING:armci_set_mem_offset: offset changed 247229054976 to 247233249280
Cpu & wall time / sec           14.1           20.9
0:CreateSharedRegion:kr_malloc failed KB=: -1855277
(rank:0 hostname:cfa057 pid:21093):ARMCI DASSERT fail. ../../ga-5.6.3/armci/src/memory/shmem.c:Create_Shared_Region():1209 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
application called MPI_Abort(comm=0x84000001, -1855277) - process 0

I tied different combinations for memory partitioning and increasing the total memory but nothing worked. The only thing that kind of helped is to increase the number of processors. I ran the same job (same input file) but with 48 processors. It stored the 2-e files but still it crashed. Here is the error message:

2-e (intermediate) file size =      2469456640
2-e (intermediate) file name = ./C2H4.v2i
25: WARNING:armci_set_mem_offset: offset changed -104622718976 to -104525406208
1: WARNING:armci_set_mem_offset: offset changed 508886515712 to 508983828480
Cpu & wall time / sec            8.6           13.5

tce_mo2e: fast2e=1
2-e integrals stored in memory

2-e file size   =       1910008264
2-e file name = ./C2H4.v2
Cpu & wall time / sec 58.4 64.5
T1-number-of-tasks 12

t1 file size   =             2816
t1 file name = ./C2H4.t1
t1 file handle = -998
T2-number-of-boxes 78
0: error ival=4

Here is the Input file:

charge 0
memory stack 1600 mb heap 200 mb global 2000 mb

geometry units angstrom noautosym
zmatrix
C
C 1 RCC
H 2 RCH 1 TH
H 2 RCH 1 TH 3 180.0
H 1 RCH 2 TH 3 0.0
H 1 RCH 2 TH 3 180.0
variables
RCC 1.3342
RCH 1.0823
constants
TH 121.44
end
end

basis spherical
  * library aug-cc-pvtz
end

DFT
  ODFT
direct
vectors output c2h4_pbe.movecs
grid fine
iterations 300
xc xpbe96 cpbe96
mult 1
convergence energy 1e-10
convergence gradient 1e-10
convergence density 1e-10
TOLERANCES accCoul 10
TOLERANCES tol_rho 1e-10
noio
END

TCE
DFT
CCSD
THRESH 10e-15
END

set lindep:n_dep 0

TASK TCE ENERGY


I tried changing the 2emet value but it also didn't help. Is there something I can do in order to reduce the memory requirements? I would be grateful for any help

Thank you

Forum Vet
Please post the ARMCI_NETWORK used for compilation and has many details as possible about the compilation itself and the run settings (i.e. env. variables)

Clicked A Few Times
Hi Edoapra,

Thank you for this fast response: ARMCI_NETWORK was set to OPENIB.

Here is the relevant part of the script:
  1. !/bin/bash
export NWCHEM_TOP=/usr/local/src/nwchem-6.8
export NWCHEM_MODULES=all
export NWCHEM_TARGET=LINUX64
export NWCHEM_LONG_PATHS=y
export ARMCI_NETWORK=OPENIB

However ARMCI_OPENIB_DEVICE was not set. I noticed in your website that when ARMCI_NETWORK=OPENIB, ARMCI_OPENIB_DEVICE=mlx4_0 should be included. Should we recompile?

In addition kernel.shmmax = 68719476736 and I do not alter ARMCI_DEFAULT_SHMMAX in my job script.

Let me know if you need further info.

Forum Vet
ARMCI_DEFAULT_SHMMAX>=8192
You need to set ARMCI_DEFAULT_SHMMAX to at least 8192 for this job to run


Forum >> NWChem's corner >> NWChem functionality