Setting ARMCI DEFAULT SHMMAX properly


Click here for full thread
Clicked A Few Times
Hi,

I'm running coupled cluster calculation on a large cluster machine that has 12 cores and 22 GB in usable memory per node. Now occasionally I run into ARMCI DASSERT fail type errors I think there might be something wrong with my shared memory settings.

Based on previous discussions on the forum I always set the following in my script files:

ulimit -s unlimited

export ARMCI_DEFAULT_SHMMAX=2096

unset MA_USE_ARMCI_MEM

But I'm unsure how to set the ARMCI_DEFAULT_SHMMAX value properly and how it relates to the memory per node, how many cores on the node I actually use (the degree of underpopulation), the amount of shared memory I request in my input file and the value of kernel shxmax (found using cat /proc/sys/kernel/shmmax), which is 68719476736 in my case. For example, I found in a previous post somewhere that ARMCI_DEFAULT_SHMMAX should be larger than shmmax, which would suggest I need to set ARMCI_DEFAULT_SHMMAX to 65536 (which in itself feels very large).

It would be great if someone could tell me how to set ARMCI_DEFAULT_SHMMAX properly and how it relates to the above mentioned properties.

Thanks in advance,

Martijn