Is there a systematic way of finding out how much memory is needed?


Click here for full thread
Clicked A Few Times
Dear nwchem users,
I'm using nwchem on infiniband cluster and strugling with memory problems when doing TDDFT. The input is:

Title "dye2Nex"
Start dye2Nex
set fock:replicated logical .false.

permanent_dir /Data/Users/syesylevsky/QM/dye2/N
memory total 400 mb

echo
charge 0

geometry noautosym units angstrom
C     0.00000     0.00000     0.00000
C 1.36800 0.00000 0.00000
C -0.774000 1.26900 0.00000
C 0.0560000 2.48300 0.00300000
O 2.11800 1.15900 -0.00500000
O -0.652000 -1.20200 0.00400000
C 2.28500 3.49300 0.00900000
C 1.70000 4.74800 0.0160000
C 0.309000 4.88600 0.0130000
C -0.507000 3.76700 0.00500000
O -1.99700 1.27200 0.00200000
C 1.45200 2.36000 0.00300000
H -1.58400 -1.02300 0.0550000
H 3.37500 3.38100 0.00500000
H 2.33400 5.64100 0.0240000
H -0.135000 5.88700 0.0160000
H -1.59900 3.87300 -0.00100000
C 2.22300 -1.17800 -0.00300000
C 4.14100 -2.24800 0.313000
O 3.55400 -0.999000 0.414000
C 5.46200 -2.57000 0.622000
C 5.82700 -3.89700 0.443000
C 4.91700 -4.85600 -0.0240000
C 3.60400 -4.52700 -0.330000
C 1.97000 -2.48200 -0.356000
C 3.20900 -3.20300 -0.158000
H 6.16900 -1.81700 0.984000
H 5.25600 -5.89000 -0.149000
H 2.89600 -5.27800 -0.693000
H 1.03900 -2.91400 -0.717000
H 6.85300 -4.20500 0.672000
end

ecce_print ecce.out

basis "ao basis" spherical print
H library "3-21G"
O library "3-21G"
C library "3-21G"
END

dft
 mult 1
XC b3lyp
iterations 5000
mulliken
direct
end

driver
 default
maxiter 2000
end

tddft
 nroots 3
target 1
end

task tddft optimize


When I'm running this I get the following error:

2: error ival=5
(rank:2 hostname:mesocomte87 pid:9679):ARMCI DASSERT fail.
../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193
cond:(pdscr->status==IBV_WC_SUCCESS)
1: error ival=10
(rank:1 hostname:mesocomte65 pid:18582):ARMCI DASSERT fail.
../../ga-5-1/armci/src/devices/openib/openib.c:armci_send_complete():459
cond:(pdscr->status==IBV_WC_SUCCESS)
5: error ival=10
(rank:5 hostname:mesocomte19 pid:20956):ARMCI DASSERT fail.
../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193
cond:(pdscr->status==IBV_WC_SUCCESS)
0:Terminate signal was sent, status=: 15
(rank:0 hostname:mesocomte21 pid:30562):ARMCI DASSERT fail.
../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0


As it was advised on this forum I set
export ARMCI_DEFAULT_SHMMAX=2048
but this does not help. I spent a lot of time playing with different memory values and finally got it working with

memory stack 150 mb heap 50 mb global 200 mb

but this was a blind guesswork, which I really don't want to do for every new system or basis level.

EDIT: it crashed after few hours. I still can't get it running.

Is there a good systematic way of finding out how much memory particular job needs to run normally in parallel environment? Which disgnostic messages should I use for this?

Thank you very much in advance!

Semen