CreateSharedRegion: kr malloc Numerical result out of range


Click here for full thread
Clicked A Few Times
Will look into the setenv ARMCI_DEFAULT_SHMMAX 4096 in the morning...

The cluster I'm currently running on is a 48 core node with ~94 gigs of ram per node and running on a single node @24 cores gives me half a node as well (with full ram usage).

It seems the error codes do indeed have something about armci memory usage...

(Run1 Error)
force: Command not found.
2: WARNING:armci_set_mem_offset: offset changed -204456148992 to -204454051840
Last System Error Message from Task 0:: Inappropriate ioctl for device
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode -1977.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 23323 on
node chi16 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
(/run1 error)

***************************
***********Run2***********
***************************


(run2 error)
force: Command not found.
23: WARNING:armci_set_mem_offset: offset changed 440439992320 to 440442089472
Last System Error Message from Task 0:: Inappropriate ioctl for device
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode -1977.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 14312 on
node chi14 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
(/run2 error)


Thank you for your assistance!
Karl