Running with OpenMPI on multiple nodes failing

Clicked A Few Times

8:00:15 AM PDT - Fri, May 2nd 2014

Hi;

I've built NWChem with OpenMPI 1.6.5 support according to the docs. I run using mpirun. The cluster has 16-core nodes. However, once PBS schedules the jobs, all the process (try to) run on the first node, rather than 16 per node. So even thigh I have, say, two nodes assigned by the scheduler all 32 NWChem instances run on the first node. Or try to, the job runs out of memory. AFAIK I compiled NWChem properly, and have installed it in a network accessible location.

What, likely simple, thing have I overlooked? Thanks,

Steve

Excerpt from PBS submit file:

NPROCS=`wc -l < $PBS_NODEFILE`
module load nwchem
module load openmpi
mpirun --hostfile $PBS_NODEFILE -np $NPROCS nwchem input.dat > output.dat

The script used to build NWChem is :

!/bin/csh

setenv NWCHEM_MODULES all

setenv NWCHEM_TOP /home/admin/root/src/nwchem-6.3.revision2-src.2013-10-17
setenv NWCHEM_TARGET LINUX64

setenv LARGE_FILES TRUE
setenv LIB_DEFINES -DDFLT_TOT_MEM=134217728
setenv USE_NOFSCHECK TRUE
setenv TCGRSH /usr/bin/ssh
setenv FC ifort
setenv CC icc

setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y

setenv LIBMPI "-L/usr/lib64 -lmca_common_sm -lmpi_f77 -lmpi -lopen-pal -lopen-trace-format -lvt-hyb -lvt-mpi-unify -lvt -lmpi_cxx -lmpi_f90 -lompitrace -lopen-rte -lotfaux -lvt-mpi -lvt-mt"

setenv LIBMPI "-L/usr/lib64 -lmca_common_sm -lmpi_f77 -lmpi -lmpi_cxx -lmpi_f90 -lompitrace -lopen-rte -lotfaux -ldl -Wl,--export-dynamic -lnsl -lutil"

setenv MPI_BASEDIR /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1
setenv MPI_INCLUDE $MPI_BASEDIR/include
setenv MPI_LIB $MPI_BASEDIR/lib

setenv IB_HOME "/usr"
setenv IB_INCLUDE "$IB_HOME/include"
setenv IB_LIB "$IB_HOME/lib64"
setenv IB_LIB_NAME "-libumad -lpthread"
setenv ARMCI_NETWORK OPENIB

module load intel/14.0.1
module load openmpi/1.6.5/intel/14.0.1

1. setenv BLASOPT "-L/zhome/Apps/intel/composerxe/mkl/lib/intel64/ -lmkl_blas95_ilp64 -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lbmkl_lapack95_ilp64"

echo "build time, here we go."
printenv

cd $NWCHEM_TOP/src

1. 1. 1. 1. make realclean;

make >& make.log2
[root@hbar1 src]#

Forum Vet

9:15:11 AM PDT - Fri, May 2nd 2014
Could you send us, for a given run (better if the following commands are executed inside the PBS script) 1) The output of the command mpirun -V 2) The output of the command ldd nwchem 2) The content of the file $PBS_NODEFILE 3) the output of the command grep hostname output.dat

Clicked A Few Times

10:12:46 AM PDT - Fri, May 2nd 2014
Sure, thank you. This copied directly from the result out/err files: 1 mpirun (Open MPI) 1.6.5 Report bugs to http://www.open-mpi.org/community/help/ 2 linux-vdso.so.1 => (0x00007fff2ebff000) libmca_common_sm.so.3 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libmca_common_sm.so.3 (0x00007f9ad8085000) libmpi_f77.so.1 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libmpi_f77.so.1 (0x00007f9ad7e4e000) libmpi.so.1 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libmpi.so.1 (0x00007f9ad7a4e000) libmpi_cxx.so.1 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libmpi_cxx.so.1 (0x00007f9ad7832000) libmpi_f90.so.1 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libmpi_f90.so.1 (0x00007f9ad762f000) libompitrace.so.0 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libompitrace.so.0 (0x00007f9ad742b000) libopen-rte.so.4 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libopen-rte.so.4 (0x00007f9ad70f1000) libotfaux.so.0 => /usr/local/openmpi/openmpi-1.6.5/intel-14.0.1/lib/libotfaux.so.0 (0x00007f9ad6ee4000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003fc6c00000) libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003fd6c00000) libutil.so.1 => /lib64/libutil.so.1 (0x0000003fd7c00000) libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x0000003976200000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003fc6800000) libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003975e00000) libm.so.6 => /lib64/libm.so.6 (0x0000003fc7000000) libc.so.6 => /lib64/libc.so.6 (0x0000003fc6400000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003fcd000000) librt.so.1 => /lib64/librt.so.1 (0x0000003fc7400000) libimf.so => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libimf.so (0x00007f9ad69fc000) libsvml.so => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libsvml.so (0x00007f9ad5e05000) libirng.so => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libirng.so (0x00007f9ad5bfe000) libintlc.so.5 => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libintlc.so.5 (0x00007f9ad59a7000) libcilkrts.so.5 => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libcilkrts.so.5 (0x00007f9ad5769000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003fcd800000) libifport.so.5 => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libifport.so.5 (0x00007f9ad5539000) libifcore.so.5 => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libifcore.so.5 (0x00007f9ad51f8000) libifcoremt.so.5 => /zhome/Apps/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libifcoremt.so.5 (0x00007f9ad4e8a000) /lib64/ld-linux-x86-64.so.2 (0x0000003fc6000000) 3 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar5 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 hbar4 4) Unfortunately I broke the previous install with a subsequent build with a new value of LIBMPI. So I am starting a new .... so I have no output.dat fie anymore.

Clicked A Few Times

7:32:00 AM PDT - Tue, May 6th 2014
So I now a clue as to what's going on: lack of Infiniband memory. Here are the relevant error messages, although I am unsure how to resolve the problem: 1) This from the PBS standard output: (rank:0 hostname:hbar11 pid:14250):ARMCI DASSERT fail. ../../ga-5-2/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void )0)) (rank:16 hostname:hbar9 pid:1254):ARMCI DASSERT fail. ../../ga-5-2/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void )0)) 2) This from the PBS standard error ... note that my attempt to increase locked memory using "ulimit -l 800000" which works in .bashrc from an interactive session failed from the PBS job: /home/lusol/.bashrc: line 14: ulimit: max locked memory: cannot modify limit: Operation not permitted mpirun (Open MPI) 1.6.5 Report bugs to http://www.open-mpi.org/community/help/ /var/spool/pbs/mom_priv/jobs/1823.hbar1.cc.lehigh.edu.SC: line 42: ulimit: max locked memory: cannot modify limit: Operation not permitted Last System Error Message from Task 0:: Cannot allocate memory Last System Error Message from Task 16:: Cannot allocate memory MPI_ABORT was invoked on rank 16 in communicator MPI COMMUNICATOR 4 DUP FROM 0 with errorcode 1. ... ... ... Stack trace terminated abnormally. [hbar11:14249] 1 more process has sent help message help-mpi-api.txt / mpi-abort [hbar11:14249] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages 3) Finally, this from the NWchem output file: more nwOutput.dat argument 1 = nwInput.dat (rank:0 hostname:hbar11 pid:14250):ARMCI DASSERT fail. ../../ga-5-2/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void )0)) (rank:16 hostname:hbar9 pid:1254):ARMCI DASSERT fail. ../../ga-5-2/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void )0)) Thanks for any insight.

Clicked A Few Times

8:19:12 AM PDT - Tue, May 6th 2014
Based on further searching, this is the cluster's kernel.shmmax: sysctl kernel.shmmax kernel.shmmax = 68719476736 Adding this to the PBS submit file and my .bashrc resulted in no change: export ARMCI_DEFAULT_SHMMAX=6553

Forum Vet

8:57:41 AM PDT - Tue, May 6th 2014
it is likely that uou are hitting the a limitation on the maximum amount of registered memory. The link below from the OpenMPI FAQ shows you how to address this problem. http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem

Clicked A Few Times

1:37:14 PM PDT - Thu, May 8th 2014
Son of a gun! After days of thrashing around, it turned out to be the stupid default PBS startup script, which has this in it: [xyzzy]# grep ulimit /etc/init.d/* /etc/init.d/pbs: ulimit -l 262144 This is a private cluster so I just set it to unlimited, pushed the new startup file to all the nodes, and NWchem is running fine now on two 16-core nodes. Is this a standard PBS value for locked memory, or did the vendor who shipped this cluster set that? Many thanks for you help, Steve

Forum Vet

3:27:54 PM PDT - Thu, May 8th 2014
Quote:Pabugeater Is this a standard PBS value for locked memory, or did the vendor who shipped this cluster set that? The OpenMPI FAQ page might be of help at the following item http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-user

Clicked A Few Times

2:13:52 AM PDT - Mon, May 19th 2014

Why the unilization ratio of CPU is more than 100% in the running process of nwchem?

Hi,

   I am a new user of nwchem.  

   I had compiled nwchem6.3 by using of intel compiler and mkl math library?composer_xe_2013_sp1.2.144?and openmpi-1.6.5?

  Butin the running process of nwchem, the unilization ratio of  CPU is more than 100% (Some is up to 300%, resulting in the node nearly collapse?

  Why?

  Please help me! 

  Thank you!

gvtheen

Forum >> NWChem's corner >> Compiling NWChem