Run time error on multinode runs.


Just Got Here
Hello: I am trying to get NWchem to run correctly on multiple nodes. It seems like my installation
runs correctly on a single node (8 cores). Whenever I request more than 1 node (let say 16 cores), the job aborts
with the following error before it complete (note that it has done SCF, CCSDt correctly and then fail during
EOM-CCSDt step). My input is one of the example from the test suite (nwchem-6.1/QA/tests/tce_active_ccsdt/tce_active_ccsdt.nw). May be someone in this forum has run into the same problem and
found a solution.

Thanks, Ajith Perera

mpirun: killing job...



mpirun noticed that process rank 0 with PID 2125 on node r11a-s20.ufhpc exited on signal 0 (Unknown signal 0).


0:Terminate signal was sent, status=: 15
(rank:0 hostname:r11a-s20.ufhpc pid:2125):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)

Clicked A Few Times
Hi Ajit,
would you please send me the whole output (karol.kowalski@pnnl.gov). It seems to be linked to a large number of initial vectors used in the first iteration of the EOMCCSDt method.

Best,
Karol


Forum >> NWChem's corner >> Running NWChem