NWChem 6.6 for BlueGene/Q


Click here for full thread
Clicked A Few Times
Hi,

I recently attempted to upgrade to version 6.6 on BG/Q system, and to compare the performance using various implementation of AMRCI, MPICH and BLAS.

It has no major problem if 64-bit BLAS is used (either using standard MPI-TS with MPICH2, non-supported MPI-TS with MPICH3, and even AMRCI-MPI3 with MPICH3)

But I got some problem when I used based using ESSL (64_to_32 is required) which haven't been seen for previous versions on the same machine.

(1) some header files in src/util/util_getppn.c is not included in the /bgsys/drivers/ppcfloor/comm/xl/include, e.g.
  1. if defined(__bgq__)
  2. include <process.h>
  3. include <location.h>
  4. include <personality.h>
...

so the MPI_INCLUDE has to be modify as following to include these headers files, e.g.

Contents

================================================
export MPI_INCLUDE="-I/bgsys/drivers/ppcfloor \
                   -I/bgsys/drivers/ppcfloor/firmware/include \
-I/bgsys/drivers/ppcfloor/spi/include \
-I/bgsys/drivers/ppcfloor/spi/include/kernel \
-I/bgsys/drivers/ppcfloor/spi/include/kernel/cnk \
-I/bgsys/drivers/ppcfloor/comm/xl/include "
================================================

(2) in src/util/util_getppn.c, from line 35 assign *ppn_out on bgq, but it will get an extra "}" after preprocessing
================================================
#if defined(__bgq__)
*ppn_out = Kernel_ProcessCount();
  1. elif MPI_VERSION >= 3
...
  1. endif
 errlab:
GA_Error(" util_getppn failure", 0);
return;
}
}
================================================

(3) based on the standard MPI-TS procedure, after make 64_to_32 and defined USE_64TO32=yes, but I got a MA error:

================================================
MA fatal error: MA_sizeof: invalid datatype: 4350801871857
================================================

and I noticed that the error should be due to the some of fortran codes in GA (ver 5.4) were built using -qintsize=8 directive. So I edited src/tools/GNUmakefile to force the --enable-i4 is explicitly included for for configure GA, the program starts but crash after detecting the symmetry directive
================================================
                               NWChem Input Module
-------------------



Scaling coordinates for geometry "geometry" by  1.889725989
(inverse scale = 0.529177249)

Turning off AUTOSYM since
SYMMETRY directive was detected!

2016-03-09 15:51:34.170 (WARN ) [0xfff78848b10] 80299:ibm.runjob.client.Job: terminated by signal 11
2016-03-09 15:51:34.171 (WARN ) [0xfff78848b10] 80299:ibm.runjob.client.Job: abnormal termination by signal 11 from rank 0
================================================
I have not clue for the cause of this error, and can anyone suggest how to fix this problem? Thanks !


This is the setting I used to build MPI-TS
================================================
export NWCHEM_TOP=/scratch/home/chiensh/nwchem/nwchem-6.6
export NWCHEM_TARGET=BGQ
export ARMCI_NETWORK=MPI-TS
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export USE_XLF=y
export USE_I4FLAGS=y
export USE_64TO32=y
export FFLAG_INT="-qintsize=4"
export DISABLE_GAMIRROR=y
export NWCHEM_MODULES="all "
  1. export MRCC_METHODS=TRUE
  2. export NWCHEM_MODULES=smallqm

export MPI_INCLUDE="-I/bgsys/drivers/ppcfloor \
                   -I/bgsys/drivers/ppcfloor/firmware/include \
-I/bgsys/drivers/ppcfloor/spi/include \
-I/bgsys/drivers/ppcfloor/spi/include/kernel \
-I/bgsys/drivers/ppcfloor/spi/include/kernel/cnk \
-I/bgsys/drivers/ppcfloor/comm/xl/include "

export BLAS_SIZE=4
export BLASOPT=" /bgsys/ibm_essl/prod/opt/ibmmath/essl/5.1/lib64/libesslbg.a \
                /apps/libs/lapack/essl/lapack_BGQ_XL_ESSL_INT4_.a \
/apps/libs/blas/xl/blas_BGQ_XL_INT4_.a -Wl,-zmuldefs"
export BLAS_LIB=" /bgsys/ibm_essl/prod/opt/ibmmath/essl/5.1/lib64/libesslbg.a \
               /apps/libs/lapack/essl/lapack_BGQ_XL_ESSL_INT4_.a \
/apps/libs/blas/xl/blas_BGQ_XL_INT4_.a -zmuldefs "
================================================

Regards,
Dominic Chien