After running for 8 hours in 15 nodes I got a following error.
t2 file size = 372112063
t2 file name = ./QCISD.t2
t2 file handle = -996
CCSD(T)
Using plain CCSD(T) code
41: error ival=11
75: error ival=11
(rank:41 hostname:g432 pid:19888):ARMCI DASSERT fail. ../../ga-5-3/armci/src/devices/openib/openib.c:armci_send_complete():459 cond:(pdscr->status==IBV_WC_SUCCESS)
35: error ival=11
(rank:35 hostname:g431 pid:15507):ARMCI DASSERT fail. ../../ga-5-3/armci/src/devices/openib/openib.c:armci_send_complete():459 cond:(pdscr->status==IBV_WC_SUCCESS)
49: error ival=11
45: error ival=11
(rank:49 hostname:g433 pid:5427):ARMCI DASSERT fail. ../../ga-5-3/armci/src/devices/openib/openib.c:armci_send_complete():459 cond:(pdscr->status==IBV_WC_SUCCESS)
This is usually associated with out-of-memory errors, but could also be due to the design of ARMCI for InfiniBand. Errors like this are why I use ARMCI-MPI (https://wiki.mpich.org/armci-mpi/index.php/Main_Page) instead. You should report this issue to the GA team (hpctools@emsl.pnl.gov) and if they fail to address the issue, you may consider trying ARMCI-MPI (which I co-wrote, hence I am biased about its utility).
|