"shm create" problem running on parallel nodes


Clicked A Few Times
Dear all,

I compiled the NWChem on the head node of an intel cluster and the binary was built successfully. The program is running normally on the head node. However, when I try to run some simple tests on the computer nodes it fails with the following error:

======================================================================================
_shm_create: shm_unlink: Permission denied
[0] Received an Error in Communication: (-1) _shm_create: shm_unlink
application called MPI_Abort(comm=0x84000000, -1) - process 0
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
nwchem 00000000048EBC65 Unknown Unknown Unknown
======================================================================================

In principle, the head and compute nodes have the same basic features for the softwares and differ only with respect to hardware. For the head node, we have the following system information:

OS: CentOS release 6.5 (Final)
Compilers: intel 16.0.2
MPI: intel mpi 5.1.3
BLAS/LAPACK: mkl 11.3

CPU = Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

I have used the following instructions for my compilation:




  1. !/bin/sh

module purge

module load compilers/intel/16.0
module load libraries/ipmi/5.1
module load libraries/mkl/16.0

export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_LONG_PATHS=Y

export USE_MPI=Y
export USE_MPIF=Y
export USE_MPIF4=Y
export MPI_LOC=/opt/intel/impi/5.1.3.181/
export MPI_INCLUDE="-I/opt/intel/impi/5.1.3.181/intel64/include"
export MPI_LIB="/opt/intel/impi/5.1.3.181/intel64/lib/release -L/opt/intel/impi/5.1.3.181/intel64/lib/"
export LIBMPI="-lmpifort -lmpi -lmpigi -ldl -lrt -lpthread"

export NWCHEM_MODULES="all python"
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
  1. export LIB_DEFINES=-DDFLT_TOT_MEM=16777216

export NWCHEM_MPIF_WRAP="mpiifort"
export NWCHEM_MPIC_WRAP="mpiicc"
export NWCHEM_MPICXX_WRAP="mpiicpc"

export CCSD=yes
export CCSDT=yes
export CCSDTQ=yes
export IPCCSD=yes
export EACCSD=yes
export MRCC_METHODS=yes

export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so

sed -i 's/libpython$(PYTHONVERSION).a/libpython$(PYTHONVERSION).$(PYTHONLIBTYPE)/g' config/makefile.h

export HAS_BLAS=yes
export USE_SCALAPACK=y
export MKLLIB=/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
export MKLINC=/opt/intel/compilers_and_libraries_2016/linux/mkl/include
export BLASOPT="-L$MKLLIB -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm"
export LAPACK_LIBS="-L$MKLLIB -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm"
  1. export LAPACK_CPPFLAGS="-DMKL_ILP64 -I$MKLINC"
export SCALAPACK="-L$MKLLIB -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"
  1. export SCALAPACK_CPPFLAGS="-DMKL_ILP64 -I$MKLINC"

export SCALAPACK_SIZE=8
export BLAS_SIZE=8
export LAPACK_SIZE=8
export HAS_BLAS=yes
export USE_64TO32=y
export MSG_COMMS=MPI
export ARMCI_NETWORK=MPI-PR # working flag
export USE_OPENMP="n"

export FC=ifort
export CC=icc
export AR=xiar

echo "cd $NWCHEM_TOP/src"
cd $NWCHEM_TOP/src

echo "BEGIN --- make realclean "
make realclean
echo "END --- make realclean "

echo "BEGIN --- make nwchem_config "
make nwchem_config
echo "END --- make nwchem_config "

echo "BEGIN --- make"
make CC=icc FC=ifort FOPTIMIZE="-O3"
echo "END --- make "

cd $NWCHEM_TOP/src/util
make CC=icc FC=ifort FOPTIMIZE="-O3" version
make CC=icc FC=ifort FOPTIMIZE="-O3"
cd $NWCHEM_TOP/src
make CC=icc FC=ifort FOPTIMIZE="-O3" link




Does anyone has experienced a similar problem? Could you give any suggestion on how to fix this issue?
I will be really grateful for any help!

Thanks & Cheers

Forum Vet
/dev/shm/
Could you please send me the output of the following commands

sudo ls -lrta /dev | grep shm

sudo ls -lrta /dev/shm/

sudo ls -lrta /dev/shm/cmx*

If you do not have sudo access, please try the following

ls -lrta /dev | grep shm

ls -lrta /dev/shm/

ls -lrta /dev/shm/cmx*

Clicked A Few Times
Yes, sure. Here are the results of each command run as root:

1. ls -lrta /dev | grep shm

drwxrwxrwt 2 root root 660 Aug 16 20:04 shm

2. ls -lrta /dev/shm/

total 132512
-rw------- 1 maxjr maxjr 16 Aug 11 16:51 cmx000002600000000
-rw------- 1 maxjr maxjr 32 Aug 11 17:26 sem.cmx000000000000003
-rw------- 1 maxjr maxjr 32 Aug 11 17:26 sem.cmx000000000000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000000100000002
-rw------- 1 maxjr maxjr 4 Aug 11 17:26 cmx000000300000002
-rw------- 1 maxjr maxjr 56 Aug 11 17:26 cmx000000200000002
-rw------- 1 maxjr maxjr 4 Aug 11 17:26 cmx000000200000001
-rw------- 1 maxjr maxjr 56 Aug 11 17:26 cmx000000100000001
-rw------- 1 maxjr maxjr 9309928 Aug 11 17:26 cmx000001000000002
-rw------- 1 maxjr maxjr 9610248 Aug 11 17:26 cmx000000900000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000001100000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000001000000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000001400000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000001300000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000001500000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000001400000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000002600000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 cmx000002500000001
-rw------- 1 maxjr maxjr 16 Aug 11 17:26 cmx000002500000000
drwxr-xr-x 18 root root 3800 Aug 16 09:28 ..
-rw------- 1 maxjr maxjr 32 Aug 16 09:37 sem.cmx000000000000001
-rw------- 1 maxjr maxjr 32 Aug 16 09:37 sem.cmx000000000000000
-rw------- 1 maxjr maxjr 8 Aug 16 09:37 cmx000000100000000
-rw------- 1 maxjr maxjr 4 Aug 16 09:37 cmx000000300000000
-rw------- 1 maxjr maxjr 56 Aug 16 09:37 cmx000000200000000
-rw------- 1 maxjr maxjr 19356488 Aug 16 09:38 cmx000001000000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 cmx000001100000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 cmx000001500000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 cmx000001400000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 cmx000001800000000
-rw------- 1 maxjr maxjr 16 Aug 16 09:38 cmx000002100000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 cmx000002000000000
drwxrwxrwt 2 root root 660 Aug 16 20:05 .

3. ls -lrta /dev/shm/cmx*

-rw------- 1 maxjr maxjr 16 Aug 11 16:51 /dev/shm/cmx000002600000000
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000000100000002
-rw------- 1 maxjr maxjr 4 Aug 11 17:26 /dev/shm/cmx000000300000002
-rw------- 1 maxjr maxjr 56 Aug 11 17:26 /dev/shm/cmx000000200000002
-rw------- 1 maxjr maxjr 4 Aug 11 17:26 /dev/shm/cmx000000200000001
-rw------- 1 maxjr maxjr 56 Aug 11 17:26 /dev/shm/cmx000000100000001
-rw------- 1 maxjr maxjr 9309928 Aug 11 17:26 /dev/shm/cmx000001000000002
-rw------- 1 maxjr maxjr 9610248 Aug 11 17:26 /dev/shm/cmx000000900000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000001100000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000001000000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000001400000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000001300000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000001500000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000001400000001
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000002600000002
-rw------- 1 maxjr maxjr 8 Aug 11 17:26 /dev/shm/cmx000002500000001
-rw------- 1 maxjr maxjr 16 Aug 11 17:26 /dev/shm/cmx000002500000000
-rw------- 1 maxjr maxjr 8 Aug 16 09:37 /dev/shm/cmx000000100000000
-rw------- 1 maxjr maxjr 4 Aug 16 09:37 /dev/shm/cmx000000300000000
-rw------- 1 maxjr maxjr 56 Aug 16 09:37 /dev/shm/cmx000000200000000
-rw------- 1 maxjr maxjr 19356488 Aug 16 09:38 /dev/shm/cmx000001000000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 /dev/shm/cmx000001100000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 /dev/shm/cmx000001500000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 /dev/shm/cmx000001400000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 /dev/shm/cmx000001800000000
-rw------- 1 maxjr maxjr 16 Aug 16 09:38 /dev/shm/cmx000002100000000
-rw------- 1 maxjr maxjr 19468808 Aug 16 09:38 /dev/shm/cmx000002000000000

If you need more information, please, let me know. Thanks.

Quote:Edoapra Aug 16th 2:37 pm
Could you please send me the output of the following commands

sudo ls -lrta /dev | grep shm

sudo ls -lrta /dev/shm/

sudo ls -lrta /dev/shm/cmx*

If you do not have sudo access, please try the following

ls -lrta /dev | grep shm

ls -lrta /dev/shm/

ls -lrta /dev/shm/cmx*[/quote]

Forum Vet
what is your userid?
Is it maxjr?
Have you executed the commands on the head node or on the compute nodes of the cluster?

Clicked A Few Times
Hi,

Yes, my user id is maxjr, and I executed the commands on the head node only. Should I do the same on the computer nodes?

Forum Vet
Quote:Maxjr Aug 16th 4:01 pm
Hi,

Yes, my user id is maxjr, and I executed the commands on the head node only. Should I do the same on the computer nodes?

Yes, since I believe the error has occurred on the compute nodes, right?

Clicked A Few Times
Oh, yes, that's true! I am sorry for this misunderstanding.

Ok, now running the same commands on one of the computer nodes, also logged as root, I obtained the following outputs:

1. ls -lrta /dev | grep shm

drwxrwxrwt 2 root root 2340 Aug 16 23:21 shm

2. ls -lrta /dev/shm/

total 76164
drwxr-xr-x 18 root root 3680 Aug 11 13:33 ..
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000039
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000038
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000037
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000036
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000035
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000034
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000033
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000032
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000031
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000030
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000029
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000028
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000027
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000026
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000025
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000024
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000023
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000022
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000021
-rw------- 1 maxjr maxjr 32 Aug 15 22:53 sem.cmx000000000000020
-rw------- 1 maxjr maxjr 8 Aug 15 22:53 cmx000000100000038
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000200000038
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000037
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000036
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000035
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000034
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000032
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000031
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000030
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000029
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000028
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000027
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000026
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000025
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000024
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000023
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000022
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000021
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000020
-rw------- 1 maxjr maxjr 4 Aug 15 22:53 cmx000000200000033
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 cmx000000100000033
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000008
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000007
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000006
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000005
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000004
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000002
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000009
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000003
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000001
-rw------- 1 maxjr maxjr 32 Aug 15 23:06 sem.cmx000000000000000
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000000100000008
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000300000008
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000200000008
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000007
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000006
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000005
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000004
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000003
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000002
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000001
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 cmx000000200000000
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000007
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000006
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000005
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000004
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000003
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000002
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000001
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 cmx000000100000000
-rw------- 1 maxjr maxjr 1995848 Aug 15 23:06 cmx000001000000008
-rw------- 1 maxjr maxjr 2027528 Aug 15 23:06 cmx000000900000007
-rw------- 1 maxjr maxjr 2154248 Aug 15 23:06 cmx000000900000006
-rw------- 1 maxjr maxjr 2064392 Aug 15 23:06 cmx000000900000005
-rw------- 1 maxjr maxjr 2097160 Aug 15 23:06 cmx000000900000004
-rw------- 1 maxjr maxjr 2228232 Aug 15 23:06 cmx000000900000003
-rw------- 1 maxjr maxjr 2193416 Aug 15 23:06 cmx000000900000002
-rw------- 1 maxjr maxjr 2228232 Aug 15 23:06 cmx000000900000001
-rw------- 1 maxjr maxjr 2367496 Aug 15 23:06 cmx000000900000000
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 cmx000001000000007
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 cmx000001000000006
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 cmx000001000000005
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 cmx000001000000003
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 cmx000001000000002
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 cmx000001000000001
-rw------- 1 maxjr maxjr 2272720 Aug 15 23:06 cmx000001000000000
-rw------- 1 maxjr maxjr 2213416 Aug 15 23:06 cmx000001000000004
-rw------- 1 maxjr maxjr 2008016 Aug 15 23:06 cmx000001100000008
-rw------- 1 maxjr maxjr 2008016 Aug 15 23:06 cmx000001500000008
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 cmx000001400000007
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 cmx000001400000006
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 cmx000001400000005
-rw------- 1 maxjr maxjr 2213416 Aug 15 23:06 cmx000001400000004
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 cmx000001400000003
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 cmx000001400000002
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 cmx000001400000001
-rw------- 1 maxjr maxjr 2272720 Aug 15 23:06 cmx000001400000000
-rw------- 1 maxjr maxjr 2008016 Aug 15 23:06 cmx000001400000008
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 cmx000001300000007
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 cmx000001300000006
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 cmx000001300000005
-rw------- 1 maxjr maxjr 2213416 Aug 15 23:06 cmx000001300000004
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 cmx000001300000003
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 cmx000001300000002
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 cmx000001300000001
-rw------- 1 maxjr maxjr 2272720 Aug 15 23:06 cmx000001300000000
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002600000008
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002500000007
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002500000006
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002500000005
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002500000004
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002500000003
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002500000002
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 cmx000002500000001
-rw------- 1 maxjr maxjr 16 Aug 15 23:06 cmx000002500000000
drwxrwxrwt 2 root root 2340 Aug 16 23:25 .

3. ls -lrta /dev/shm/cmx*

-rw------- 1 maxjr maxjr 8 Aug 15 22:53 /dev/shm/cmx000000100000038
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000200000038
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000037
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000036
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000035
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000034
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000032
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000031
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000030
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000029
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000028
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000027
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000026
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000025
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000024
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000023
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000022
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000021
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000020
-rw------- 1 maxjr maxjr 4 Aug 15 22:53 /dev/shm/cmx000000200000033
-rw------- 1 maxjr maxjr 56 Aug 15 22:53 /dev/shm/cmx000000100000033
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000000100000008
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000300000008
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000200000008
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000007
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000006
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000005
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000004
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000003
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000002
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000001
-rw------- 1 maxjr maxjr 4 Aug 15 23:06 /dev/shm/cmx000000200000000
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000007
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000006
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000005
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000004
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000003
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000002
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000001
-rw------- 1 maxjr maxjr 56 Aug 15 23:06 /dev/shm/cmx000000100000000
-rw------- 1 maxjr maxjr 1995848 Aug 15 23:06 /dev/shm/cmx000001000000008
-rw------- 1 maxjr maxjr 2027528 Aug 15 23:06 /dev/shm/cmx000000900000007
-rw------- 1 maxjr maxjr 2154248 Aug 15 23:06 /dev/shm/cmx000000900000006
-rw------- 1 maxjr maxjr 2064392 Aug 15 23:06 /dev/shm/cmx000000900000005
-rw------- 1 maxjr maxjr 2097160 Aug 15 23:06 /dev/shm/cmx000000900000004
-rw------- 1 maxjr maxjr 2228232 Aug 15 23:06 /dev/shm/cmx000000900000003
-rw------- 1 maxjr maxjr 2193416 Aug 15 23:06 /dev/shm/cmx000000900000002
-rw------- 1 maxjr maxjr 2228232 Aug 15 23:06 /dev/shm/cmx000000900000001
-rw------- 1 maxjr maxjr 2367496 Aug 15 23:06 /dev/shm/cmx000000900000000
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 /dev/shm/cmx000001000000007
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 /dev/shm/cmx000001000000006
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 /dev/shm/cmx000001000000005
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 /dev/shm/cmx000001000000003
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 /dev/shm/cmx000001000000002
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 /dev/shm/cmx000001000000001
-rw------- 1 maxjr maxjr 2272720 Aug 15 23:06 /dev/shm/cmx000001000000000
-rw------- 1 maxjr maxjr 2213416 Aug 15 23:06 /dev/shm/cmx000001000000004
-rw------- 1 maxjr maxjr 2008016 Aug 15 23:06 /dev/shm/cmx000001100000008
-rw------- 1 maxjr maxjr 2008016 Aug 15 23:06 /dev/shm/cmx000001500000008
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 /dev/shm/cmx000001400000007
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 /dev/shm/cmx000001400000006
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 /dev/shm/cmx000001400000005
-rw------- 1 maxjr maxjr 2213416 Aug 15 23:06 /dev/shm/cmx000001400000004
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 /dev/shm/cmx000001400000003
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 /dev/shm/cmx000001400000002
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 /dev/shm/cmx000001400000001
-rw------- 1 maxjr maxjr 2272720 Aug 15 23:06 /dev/shm/cmx000001400000000
-rw------- 1 maxjr maxjr 2008016 Aug 15 23:06 /dev/shm/cmx000001400000008
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 /dev/shm/cmx000001300000007
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 /dev/shm/cmx000001300000006
-rw------- 1 maxjr maxjr 2108216 Aug 15 23:06 /dev/shm/cmx000001300000005
-rw------- 1 maxjr maxjr 2213416 Aug 15 23:06 /dev/shm/cmx000001300000004
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 /dev/shm/cmx000001300000003
-rw------- 1 maxjr maxjr 2136272 Aug 15 23:06 /dev/shm/cmx000001300000002
-rw------- 1 maxjr maxjr 2242872 Aug 15 23:06 /dev/shm/cmx000001300000001
-rw------- 1 maxjr maxjr 2272720 Aug 15 23:06 /dev/shm/cmx000001300000000
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002600000008
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002500000007
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002500000006
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002500000005
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002500000004
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002500000003
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002500000002
-rw------- 1 maxjr maxjr 8 Aug 15 23:06 /dev/shm/cmx000002500000001
-rw------- 1 maxjr maxjr 16 Aug 15 23:06 /dev/shm/cmx000002500000000

The permissions seems to be the same as in the head node. Is the "shm" somehow related to the global arrays installation?

Thanks!

Forum Vet
disk space
Yes, those shm files are used by the global arrays and are the cause of your failures.
Could check if there is enough disk space available by typing
df -h /dev/shm
?

Forum Vet
concurrent NWChem jobs
By any chance, have you been running multiple NWChem jobs on the same node at the same time?

Clicked A Few Times
Hi,

I have checked the disk space of the computer nodes, and I have here 64GB available on a free node, without any calculations running:

Filesystem Size Used Avail Use% Mounted on
tmpfs 64G 112K 64G 1% /dev/shm

I recompiled the NWChem changing the ARMCI_NETWORK for "MPI-MT", and seems to be working now. I am running a big MP2 calculation to test the memory usage by GA and the everything seems to be ok so far. However, I think the MPI-MT option is not so efficient for my computer systems. So it would be great to have the "MPI-PR" version working to compare.

I don't think that multiple jobs would be the problem because I have always used the node-exclusive option in the submission script. Do you think 64GB of disk space is enough at least for small calculations? I quite not understand why the version with "MPI-MT" works and the other one doesn't. Do you have any guess to explain that?

Thanks & Have a nice day

Forum Vet
Max
Could you please try the following to get MPI-PR going on your cluster
1) Please remove any left-over file from previous NWChem MPI-PR runs on "all" the nodes of your cluster by executing the command
rm -f /dev/shm/*cmx*

2) Recompile the tools in NWChem with ARMCI_NETWORK=MPI-PR after applying the following patch
http://nwchemgit.github.io/download.php?f=Ga_5.mpipr_shmop.patch.gz

You can apply it by following these steps
i) cd $NWCHEM_TOP/src/tools/ga-5-4
ii) wget http://nwchemgit.github.io/download.php?f=Ga_5.mpipr_shmop.patch.gz -O Ga_5.mpipr_shmop.patch.gz
iii) gzip -d Ga_5.mpipr_shmop.patch.gz
iv) patch -p0 < Ga_5.mpipr_shmop.patch
v) cd ..
vi) rm -rf build install
vii) make FC=ifort ARMCI_NETWORK=MPI-PR
viii) cd ..
ix) make FC=ifort link


Forum >> NWChem's corner >> Running NWChem