Compiling for MPI.


Click here for full thread
Forum Vet
User specified 16Gbyte in the input deck, where the input is per processor.

Bert



Quote:Bert Dec 28th 9:08 pm
COuld you send me the complete input and output file at bert.dejong@pnnl.gov. And, can you tell me how much memory you have per node (which has 12 processors I see).

Bert


Quote:Davis68 Dec 26th 3:49 pm
OK, so I have reverted to specifying ARMCI_NETWORK as per Bert's advice, with the environment variables
export ARMCI_NETWORK=OPENIB
export ARMCI_DEFAULT_SHMMAX=256
export IB_HOME=/usr
export IB_INCLUDE=$IB_HOME/include
export IB_LIB=$IB_HOME/lib64
export IB_LIB_NAME="-libverbs -libumad -lpthread"


This is mostly successful. Execution on two nodes yields the following output.
ARMCI configured for 2 cluster nodes. Network protocol is 'OpenIB Verbs API'.
 argument  1 = lda-147.nw



============================== echo of input deck ==============================
...
normal output for initial processing
...


NWChem correctly gets the information that there are 24 processors (2 nodes x 12), so the program is getting the MPI support information from the OS (great!). Then it crashes on an ARMCI DASSERT fail. The errors which appear (in order, I think, but stderr is interleaved from each node) follow. This is immediately as a pspw geometry optimization starts.
          *               NWPW PSPW Calculation              *
...
     >>>  JOB STARTED       AT Fri Dec 23 14:18:03 2011  <<<
          ================ input data ========================
 Pack_init:error pushing stack        0
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
  current input line : 
     0: 
...
Last System Error Message from Task X:: No such file or directory
(rank:X hostname:taub510 pid:18040):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process X
0:Terminate signal was sent, status=: 15

In order to make sure that this wasn't my NW file's fault, I tried it with the pspw example for C2H6 with the same results. What would you suggest to get past this impasse? Thanks.