"inconsistency processing clusterinfo" error when trying to use multiple cluster nodes...


Click here for full thread
Clicked A Few Times
Any thoughts would be welcome. Here's my situation:
1) Heterogeneous commodity cluster with gigabit ethernet interconnects.
2) Nwchem version Nwchem-6.5.revision26243-src.2014-09-10, with patches applied (maybe one did not work?)
3) Compilation environment:
OS - Ubuntu Debian 14.04.2 lts

export PATH=$PATH:/opt/intel/bin
export NWCHEM_TOP=/shared/nwchem/Nwchem-6.5.revision26243-src.2014-09-10
export NWCHEM_TARGET=LINUX64
export LARGE_FILES=TRUE
export ENABLE_COMPONENT=yes
export NWCHEM_MODULES="all python"
export NWCHEM_MPIF_WRAP=/usr/bin/mpif90
export NWCHEM_MPIC_WRAP=/usr/bin/mpicc
export NWCHEM_MPICXX_WRAP=/usr/bin/mpicxx
export USE_NOFSCHECK=Y
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_INCLUDE="-Wl,-Bsymbolic-functions -Wl,-z,relro -I/usr/include/mpich -I/usr/include/mpich"
export MPI_LIB="-L/usr/lib/x86_64-linux-gnu"
export LIBMPI="-lmpichf90 -lmpich -lopa -lmpl -lrt -lcr -lpthread"
export FC=gfortran
export CC=gcc
export CXX=g++
export ARMCI_NETWORK=SOCKETS
  1. export MSG_COMMS=MPI
export BLASOPT=" "
export PYTHON_EXE=/usr/bin/python
export PYTHONVERSION=2.7
export USE_PYTHON64=yes
export PYTHONCONFIGDIR=config-x86_64-linux-gnu
export PYTHONPATH=/usr/lib/python2.7/dist-packages
export PYTHONHOME=/usr
export PYTHONLIBTYPE=so
export CCSDTQ=y
export CCSDTLR=y
export IPCCSD=y
export EACCSD=y

4) Error when launching across multiple nodes (2 in this case) the simple geometry optimization of formaldehyde works on a single node with the same total number of cores.

0:inconsistency processing clusterinfo: 1
(rank:0 hostname:<some non printable characters>pid:25050):ARMCI DASSERT fail. ../../ga-5-3/armci/src/common/clusterinfo.c:process_hostlist():203 cond:0
Last System Error Message from Task 0:: Connection refused
0:aborting

I notice that the node name is mangled (it should be either node15 or node2). Is there something in the ga-5-3 code that is messing up offsets when reading strings?

I'd welcome any ideas..

Would I be better off with one of the development releases?

Thanks,
Jonathan