Run time error on multinode runs.

Just Got Here

10:33:41 AM PDT - Fri, Jul 13th 2012
Hello: I am trying to get NWchem to run correctly on multiple nodes. It seems like my installation runs correctly on a single node (8 cores). Whenever I request more than 1 node (let say 16 cores), the job aborts with the following error before it complete (note that it has done SCF, CCSDt correctly and then fail during EOM-CCSDt step). My input is one of the example from the test suite (nwchem-6.1/QA/tests/tce_active_ccsdt/tce_active_ccsdt.nw). May be someone in this forum has run into the same problem and found a solution. Thanks, Ajith Perera mpirun: killing job... mpirun noticed that process rank 0 with PID 2125 on node r11a-s20.ufhpc exited on signal 0 (Unknown signal 0). 0:Terminate signal was sent, status=: 15 (rank:0 hostname:r11a-s20.ufhpc pid:2125):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigTermHandler():472 cond:0 Last System Error Message from Task 0:: Inappropriate ioctl for device forrtl: error (78): process killed (SIGTERM) forrtl: error (78): process killed (SIGTERM) forrtl: error (78): process killed (SIGTERM)

Clicked A Few Times

4:33:27 PM PDT - Mon, Jul 16th 2012
Hi Ajit, would you please send me the whole output (karol.kowalski@pnnl.gov). It seems to be linked to a large number of initial vectors used in the first iteration of the EOMCCSDt method. Best, Karol