CreateSharedRegion: kr malloc Numerical result out of range


Click here for full thread
Forum Vet
For one, you are requesting 22 Gbyte per processor. Assuming you are running on 8 cores per node, you are asking (potentially) for 176 Gbyte of memory for the calculation.

Remember, the memory keyword is per processor or process, not for the whole calculation! You should keep your memory allocation per process below the available memory / # processors per node.

Given the size of the calculation I don't believe you need that much memory.

Bert



Quote:DBauer May 21st 2:57 pm
Someone in the lab that I work in has been trying to run some calculations with NWChem and I've been trying to help him get started running it with MPI. It has been a bumpy ride, however. At first, we were getting problems about not being able to allocate a shared block of memory. SHMMAX was already plenty high (as large as all of the physical memory), so I created a swap file, started calculations again, and waited.

Now they are crashing again, but this time with a much different error message. From what I can tell, it is another problem with allocating shared memory, but it seems like NWChem is passing an invalid (i.e., negative) number to the allocation function.

Here is the relevant error message:
 0:CreateSharedRegion:kr_malloc failed KB=: -772361
(rank:0 hostname:vivaldi.chem.utk.edu pid:3128):ARMCI DASSERT fail. ../../ga-5-1/armci/src/memory/shmem.c:Create_Shared_Region():1188 cond:0
Last System Error Message from Task 0:: Numerical result out of range
application called MPI_Abort(comm=0x84000007, -772361) - process 0
rank 0 in job 2 vivaldi.chem.utk.edu_35229 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

And just in case, here is a link to the full output file:
http://web.eecs.utk.edu/~dbauer3/nwchem/macrofe_full631f.out

Running under Ubuntu 11.04 with MPICH2.