12:28:36 PM PST - Thu, Dec 20th 2012 |
|
Thank you very much Bert.
I've tried this for a EOM-CCSDT calculation with a def2-SV(P) basis-set but after a significant number of iterations the job dies with the following error message:
Iteration 18 using 54 trial vectors
72: error ival=4
(rank:72 hostname:red0501 pid:11964):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)
0:Terminate signal was sent, status=: 15
(rank:0 hostname:red0050 pid:9814):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0
I run the calculation on 10 nodes of 12 cores and 22 GB memory each (i.e. > 1800 Mb per core). I set the memory in the inputfile as "MEMORY stack 800 mb heap 100 mb global 900 mb" and ARMCI_DEFAULT_SHMMAX in the script via "export ARMCI_DEFAULT_SHMMAX=10800". The SHMAX values is, as you suggested, equal to the amount of cores per node times the global memory per core (12*900).
So if we have declared enough memory for the global memory using ARMCI_DEFAULT_SHMMAX why am I still running into an ARMCI error? Does it simply mean the amount of memory is not sufficient or would you get a different error message in that case.
Thanks for your (further) help (in advance),
Martijn
|