Memory allocation


Just Got Here
Hi all!
I have recently compiled NWChem with OpenIB (ifort 12.0.2, openmpi-1.4.2, OFED-1.5.2) and tried to run CCSD calculation. Memory limit in my input file is 'memory total 1800 mb'. This input runs well of with 8 MPI processes on a single node with 16 GB of RAM, consuming about 1.1GB of memory per process at CCSD stage.

The same input fails to run when 16 MPI processes are distributed over 2 nodes. The error is:

(rank:0 hostname:n306 pid:14660):ARMCI DASSERT fail. openib.c:armci_server_register_region():971 cond:(memhdl->memhndl!=((void *)0))


And a message in stderr:
7: WARNING:armci_set_mem_offset: offset changed 4096 to 9859072
Last System Error Message from Task 0:: Cannot allocate memory


This input runs with well with 32 MPI processes.
The problem is that I can not run production CCSD jobs (those are quite big).
Thanks for your kind support!
Roman Zubatyuk.

Forum Vet
Try running with the following environment variable set:

ARMCI_DEFAULT_SHMMAX 2048

or 4096.

Bert



Quote:Romaniz Mar 14th 6:16 pm
Hi all!
I have recently compiled NWChem with OpenIB (ifort 12.0.2, openmpi-1.4.2, OFED-1.5.2) and tried to run CCSD calculation. Memory limit in my input file is 'memory total 1800 mb'. This input runs well of with 8 MPI processes on a single node with 16 GB of RAM, consuming about 1.1GB of memory per process at CCSD stage.

The same input fails to run when 16 MPI processes are distributed over 2 nodes. The error is:

(rank:0 hostname:n306 pid:14660):ARMCI DASSERT fail. openib.c:armci_server_register_region():971 cond:(memhdl->memhndl!=((void *)0))


And a message in stderr:
7: WARNING:armci_set_mem_offset: offset changed 4096 to 9859072
Last System Error Message from Task 0:: Cannot allocate memory


This input runs with well with 32 MPI processes.
The problem is that I can not run production CCSD jobs (those are quite big).
Thanks for your kind support!
Roman Zubatyuk.

Just Got Here
Unfortunately, doesn't help. Setting this variable to 1024 or 2048 or 4096 results in calculation crashes immediately with the same error (output is completely empty). Setting to 8192 up to 32768 results in crash at the start of CCSD iterations.

Just Got Here
Failed to post my build script to forum. It is here. Could you see anything wrong with it? BTW, QA tests were passed.

Forum Vet
Please try:

1. Running with 1500 mb for memory.

2. Fewer cores per node.

Bert

Quote:Romaniz Mar 19th 6:31 pm
Failed to post my build script to forum. It is here. Could you see anything wrong with it? BTW, QA tests were passed.

Just Got Here
Tried both options. Same result. Will try to recompile with mvapich2 or ipmi.

Forum Vet
What does your failing input deck look like (memory allocation wise)?

Quote:Romaniz Mar 31st 7:03 pm
Tried both options. Same result. Will try to recompile with mvapich2 or ipmi.


Forum >> NWChem's corner >> Running NWChem