Further to the above, I run into exactly the same problem. For a calculation on a slightly larger system. Here I use 20 nodes, 6 cores per node in use and the other 6 idle (i.e. I underpopulate the node). I set memory in the input file to "MEMORY stack 1000 mb heap 100 mb global 2550 mb" and set ARMCI_DEFAULT_SHMMAX in the run file to 15300. This time the EOM-CCSDT calculation stops in the ground state CCSDT calculation but with the same error message:
CCSDT iterations
--------------------------------------------------------
Iter Residuum Correlation Cpu Wall
--------------------------------------------------------
1 1.6034927654134 -1.1164830885139 290.1 287.2
84: error ival=4
(rank:84 hostname:red0627 pid:21926):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)
0: error ival=10
(rank:0 hostname:red0003 pid:28395):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)
Ironically, a calculation with ARMCI_DEFAULT_SHMMAX set to 4096 (i.e. with much less memory and less than no. of cores x requested global memory per core) continuous well beyond this point (although I'm fairly sure it will eventually crash).
Are ARMCI_DEFAULT_SHMMAX values restricted to certain special values? I noticed that all the examples given on the forum tend to be integer multiples of 1024.
Thanks,
Martijn
|