Linux workstation, MPI and QA tests


Click here for full thread
Just Got Here
Hello !

By following the instructions of the "How-to: Linux workstation platforms" I compiled a "serial" (i.e. non-mpi) and a mpi binary of nwchem 6.1 , see end of this message for the exact options and procedure.

The serial/tcgmsg parallel binary, i.e. compiled without setting any MPI environment variables, tests mostly ok. Some people have noted testsuite failures in other forum posts and I see similar ones, especially when iterative diagonalizers are used (e.g. pw module, dft_semidirect creates electrons) and the two ZORA tests.

The mpi parallel binary passes doqmtests mostly using 2 to 8 cores, but with doqmtests.mpi I see a failure in the "testtask" test in/after "Checking 3-Dimensional Arrays", i.e. the usual uninformative

 Checking single precisions 
 
 ga_create ........................ OK
 ga_fill_patch .................... OK
 ga_copy_patch .................... OK
 ga_copy_patch (transpose) ........ OK
 ga_scale_patch ................... OK
 ga_add_patch ..................... OK
 ga_sdot_patch .................... OK
 ga_destory ....................... OK
 
 Commencing NGA Test
 -------------------
 
 Checking 3-Dimensional Arrays

Last System Error Message from Task 0:: Bad address
Last System Error Message from Task 1:: Bad address
Last System Error Message from Task 2:: Bad address
Last System Error Message from Task 3:: Bad address
1:Bus error, status=: 7
(rank:1 hostname:neuro24a pid:25580):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0
2:Bus error, status=: 7
(rank:2 hostname:neuro24a pid:25581):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0
3:Bus error, status=: 7
(rank:3 hostname:neuro24a pid:25582):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0


Later computational tests in doqmtests.mpi, e.g. h2o_opt, auh2o, h2mp2, rimp2_ne, scf_feco5, small_intchk, u_sodft ... are ok !


So, my questions are:

1. Should the "Linux workstation platform" compilation instructions result in a binary to be used with mpirun or is only tcgmsg parallel supported ? We will eventually buy 8-socket SMP machines, so this is interesting for us. Please note that ARMCI_NETWORK has not been set during compilation.
2. What is the expected result of the doqmtests.mpi "testtask" test on this platform. I am confused by the fact that the computational tests appear to work while the internal GA/communication tests appear to fail catastrophically.
3. For ARMCI_NETWORK only high speed interconnects are documented as supported. What about ethernet in its 1 GbE or 10 GbE variants ? Can nwchem be compiled to work in that environment across nodes and, if yes, what would be the correct setting for ARMCI_NETWORK ?
4. A comment on the testsuite failures of the serial binary would also be appreciated.

Thank you in advance for any help !


Best Regards

Christof


Detailed build sequence:

Compiler is intel Version 12.1.2.273 Build 20111128, MPI is openmpi 1.4.5rc2 compiled with that compiler. The openmpi build appears to be ok as other programs (vasp,cpmd,pwscf/quantum espresso) work without problems so far.

In a new shell with clean environment :

module add intel-12
setenv NWCHEM_TOP /local/neuro24/nwchem-6.1
setenv NWCHEM_TARGET LINUX64
setenv NWCHEM_MODULES "all"
setenv LARGE_FILES TRUE
setenv USE_NOFSCHECK TRUE
setenv LIB_DEFINES "-DDFLT_TOT_MEM=195853376"
setenv TCGRSH /usr/bin/ssh
setenv BLASOPT "-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lpthread -lm -I$MKLROOT/include"
cd $NWCHEM_TOP/src
make nwchem_config
rm make.out; make FC="ifort -I$MKLROOT/include -DDFLT_TOT_MEM=195853376 -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" CC="icc -fPIC -DMKL_ILP64 -DDFLT_TOT_MEM=195853376 -I$MKLROOT/include -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" FOPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" COPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" |tee make.out

-> at this point I save the serial/tcgmsg parallel nwchem binaries and run doqmtests

make realclean
module add openmpi-1.4.5_intel
setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y
setenv MPI_LOC /usr/local/stow/openmpi-1.4.5_intel
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv LIBMPI "-L/usr/local/stow/openmpi-1.4.5_intel/lib -lmpi_f90 -lmpi_f77 -lmpi -lpthread"
make nwchem_config
rm make.out ; make FC="ifort -fPIC -I$MKLROOT/include -DDFLT_TOT_MEM=195853376 -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" CC="icc -fPIC -DMKL_ILP64 -DDFLT_TOT_MEM=195853376 -I$MKLROOT/include -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" FOPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" COPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" | tee make.out

this gives a mpi parallel binary
setenv MPIRUN_NPOPT "-np"
setenv MPIRUN_PATH "/usr/local/stow/openmpi-1.4.5_intel/bin/mpirun --prefix /usr/local/stow/openmpi-1.4.5_intel/"

testing with doqmtests after substituting runtests.mpi.unix everywhere and doqmtests.mpi without modifications