Dear nwchem users,
I'm using nwchem on infiniband cluster and strugling with memory problems when doing TDDFT. The input is:
Title "dye2Nex"
Start dye2Nex
set fock:replicated logical .false.
permanent_dir /Data/Users/syesylevsky/QM/dye2/N
memory total 400 mb
echo
charge 0
geometry noautosym units angstrom
C 0.00000 0.00000 0.00000
C 1.36800 0.00000 0.00000
C -0.774000 1.26900 0.00000
C 0.0560000 2.48300 0.00300000
O 2.11800 1.15900 -0.00500000
O -0.652000 -1.20200 0.00400000
C 2.28500 3.49300 0.00900000
C 1.70000 4.74800 0.0160000
C 0.309000 4.88600 0.0130000
C -0.507000 3.76700 0.00500000
O -1.99700 1.27200 0.00200000
C 1.45200 2.36000 0.00300000
H -1.58400 -1.02300 0.0550000
H 3.37500 3.38100 0.00500000
H 2.33400 5.64100 0.0240000
H -0.135000 5.88700 0.0160000
H -1.59900 3.87300 -0.00100000
C 2.22300 -1.17800 -0.00300000
C 4.14100 -2.24800 0.313000
O 3.55400 -0.999000 0.414000
C 5.46200 -2.57000 0.622000
C 5.82700 -3.89700 0.443000
C 4.91700 -4.85600 -0.0240000
C 3.60400 -4.52700 -0.330000
C 1.97000 -2.48200 -0.356000
C 3.20900 -3.20300 -0.158000
H 6.16900 -1.81700 0.984000
H 5.25600 -5.89000 -0.149000
H 2.89600 -5.27800 -0.693000
H 1.03900 -2.91400 -0.717000
H 6.85300 -4.20500 0.672000
end
ecce_print ecce.out
basis "ao basis" spherical print
H library "3-21G"
O library "3-21G"
C library "3-21G"
END
dft
mult 1
XC b3lyp
iterations 5000
mulliken
direct
end
driver
default
maxiter 2000
end
tddft
nroots 3
target 1
end
task tddft optimize
When I'm running this I get the following error:
2: error ival=5
(rank:2 hostname:mesocomte87 pid:9679):ARMCI DASSERT fail.
../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193
cond:(pdscr->status==IBV_WC_SUCCESS)
1: error ival=10
(rank:1 hostname:mesocomte65 pid:18582):ARMCI DASSERT fail.
../../ga-5-1/armci/src/devices/openib/openib.c:armci_send_complete():459
cond:(pdscr->status==IBV_WC_SUCCESS)
5: error ival=10
(rank:5 hostname:mesocomte19 pid:20956):ARMCI DASSERT fail.
../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193
cond:(pdscr->status==IBV_WC_SUCCESS)
0:Terminate signal was sent, status=: 15
(rank:0 hostname:mesocomte21 pid:30562):ARMCI DASSERT fail.
../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0
As it was advised on this forum I set
export ARMCI_DEFAULT_SHMMAX=2048
but this does not help. I spent a lot of time playing with different memory values and finally got it working with
memory stack 150 mb heap 50 mb global 200 mb
but this was a blind guesswork, which I really don't want to do for every new system or basis level.
EDIT: it crashed after few hours. I still can't get it running.
Is there a good systematic way of finding out how much memory particular job needs to run normally in parallel environment? Which disgnostic messages should I use for this?
Thank you very much in advance!
Semen
|