Linux workstation, MPI and QA tests


Just Got Here
Hello !

By following the instructions of the "How-to: Linux workstation platforms" I compiled a "serial" (i.e. non-mpi) and a mpi binary of nwchem 6.1 , see end of this message for the exact options and procedure.

The serial/tcgmsg parallel binary, i.e. compiled without setting any MPI environment variables, tests mostly ok. Some people have noted testsuite failures in other forum posts and I see similar ones, especially when iterative diagonalizers are used (e.g. pw module, dft_semidirect creates electrons) and the two ZORA tests.

The mpi parallel binary passes doqmtests mostly using 2 to 8 cores, but with doqmtests.mpi I see a failure in the "testtask" test in/after "Checking 3-Dimensional Arrays", i.e. the usual uninformative

 Checking single precisions 
 
 ga_create ........................ OK
 ga_fill_patch .................... OK
 ga_copy_patch .................... OK
 ga_copy_patch (transpose) ........ OK
 ga_scale_patch ................... OK
 ga_add_patch ..................... OK
 ga_sdot_patch .................... OK
 ga_destory ....................... OK
 
 Commencing NGA Test
 -------------------
 
 Checking 3-Dimensional Arrays

Last System Error Message from Task 0:: Bad address
Last System Error Message from Task 1:: Bad address
Last System Error Message from Task 2:: Bad address
Last System Error Message from Task 3:: Bad address
1:Bus error, status=: 7
(rank:1 hostname:neuro24a pid:25580):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0
2:Bus error, status=: 7
(rank:2 hostname:neuro24a pid:25581):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0
3:Bus error, status=: 7
(rank:3 hostname:neuro24a pid:25582):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0


Later computational tests in doqmtests.mpi, e.g. h2o_opt, auh2o, h2mp2, rimp2_ne, scf_feco5, small_intchk, u_sodft ... are ok !


So, my questions are:

1. Should the "Linux workstation platform" compilation instructions result in a binary to be used with mpirun or is only tcgmsg parallel supported ? We will eventually buy 8-socket SMP machines, so this is interesting for us. Please note that ARMCI_NETWORK has not been set during compilation.
2. What is the expected result of the doqmtests.mpi "testtask" test on this platform. I am confused by the fact that the computational tests appear to work while the internal GA/communication tests appear to fail catastrophically.
3. For ARMCI_NETWORK only high speed interconnects are documented as supported. What about ethernet in its 1 GbE or 10 GbE variants ? Can nwchem be compiled to work in that environment across nodes and, if yes, what would be the correct setting for ARMCI_NETWORK ?
4. A comment on the testsuite failures of the serial binary would also be appreciated.

Thank you in advance for any help !


Best Regards

Christof


Detailed build sequence:

Compiler is intel Version 12.1.2.273 Build 20111128, MPI is openmpi 1.4.5rc2 compiled with that compiler. The openmpi build appears to be ok as other programs (vasp,cpmd,pwscf/quantum espresso) work without problems so far.

In a new shell with clean environment :

module add intel-12
setenv NWCHEM_TOP /local/neuro24/nwchem-6.1
setenv NWCHEM_TARGET LINUX64
setenv NWCHEM_MODULES "all"
setenv LARGE_FILES TRUE
setenv USE_NOFSCHECK TRUE
setenv LIB_DEFINES "-DDFLT_TOT_MEM=195853376"
setenv TCGRSH /usr/bin/ssh
setenv BLASOPT "-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lpthread -lm -I$MKLROOT/include"
cd $NWCHEM_TOP/src
make nwchem_config
rm make.out; make FC="ifort -I$MKLROOT/include -DDFLT_TOT_MEM=195853376 -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" CC="icc -fPIC -DMKL_ILP64 -DDFLT_TOT_MEM=195853376 -I$MKLROOT/include -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" FOPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" COPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" |tee make.out

-> at this point I save the serial/tcgmsg parallel nwchem binaries and run doqmtests

make realclean
module add openmpi-1.4.5_intel
setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y
setenv MPI_LOC /usr/local/stow/openmpi-1.4.5_intel
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv LIBMPI "-L/usr/local/stow/openmpi-1.4.5_intel/lib -lmpi_f90 -lmpi_f77 -lmpi -lpthread"
make nwchem_config
rm make.out ; make FC="ifort -fPIC -I$MKLROOT/include -DDFLT_TOT_MEM=195853376 -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" CC="icc -fPIC -DMKL_ILP64 -DDFLT_TOT_MEM=195853376 -I$MKLROOT/include -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" FOPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" COPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" | tee make.out

this gives a mpi parallel binary
setenv MPIRUN_NPOPT "-np"
setenv MPIRUN_PATH "/usr/local/stow/openmpi-1.4.5_intel/bin/mpirun --prefix /usr/local/stow/openmpi-1.4.5_intel/"

testing with doqmtests after substituting runtests.mpi.unix everywhere and doqmtests.mpi without modifications

Forum Vet
Please set FC=ifort
Christof,
I see from your compilation settings that you have supplied a long list of compiler options with the variable FC. I see pleny of potential problems with this since 1) the -openmp option is likely to cause runtime conflicts with the Global Arrays parallelization and 2) the makefile structure will be confused by this long variable for FC. If you really want to change the compiler options, the recommended way would be (for example)

make FC=ifort FOPTIMIZE="-O2 -xHost"

I will try to answer your questions next
1. The "Linux workstation platform" compilation instructions result in a binary to be used with mpirun, since USE_MPI is set equal to y
2. testtask should not fail for the 3-D Global Arrays test
3. If no high-speed network is present, you can let ARMCI_NETWORK undefined. Another option, recently introduced in GA/ARMCI and that we have not thoroughly tested yet, is MPI-MT. I would suggest, first to try to get the vanilla ARMCI compilation to work,
and then you might try the ARMCI_NETWORK=MPI-MT setting
4. I am not sure how to answer this one, since I can see conflicting details in your question. If you compiler NWChem with USE_MPI=y, tests have to be run with doqmtests.mpi, since only doqmtests.mpi uses the needed mpirun

Edo

Just Got Here
Dear Edo,

Thank you for the reply.

Quote:Edoapra Feb 8th 5:55 pm
Christof,
I see from your compilation settings that you have supplied a long list of compiler options with the variable FC. I see pleny of potential problems with this since 1) the -openmp option is likely to cause runtime conflicts with the Global Arrays parallelization and 2) the makefile structure will be confused by this long variable for FC. If you really want to change the compiler options, the recommended way would be (for example)

I agree with you that too much optimization is potentially a problem. The big number of options resulted from adding "-fp-model precise -fp-model source" indiscriminately when the problems with the build showed up to put some brakes on the compiler.

I now recompiled with
setenv BLASOPT "-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -I$MKLROOT/include" 
make FC="ifort -I$MKLROOT/include" CC="icc -DMKL_ILP64 -I$MKLROOT/include" FOPTIMIZE="-O2 -fp-model precise" COPTIMIZE="-O2 -fp-model precise"


Please note the change to the MKL as I have to link to the sequential MKL now without being able to use -openmp. Also, the other options in FC and CC conform to the Intel MKL link line advisor output.

With these changes to the environment it now fails
  Checking single precisions 
 
 ga_create ........................ OK
 ga_fill_patch .................... OK
 ga_copy_patch .................... OK
ERROR (proc=1): a [1,19,0] =3736.000000,  b [1,19,0] =10001.000000
ERROR (proc=2): a [20,0,0] =36.000000,  b [20,0,0] =101.000000
ERROR (proc=3): a [1,0,0] =0.000000,  b [1,0,0] =1.000000
 ga_copy_patch (transpose) ........ OK
 ga_scale_patch ................... OK
 ga_add_patch ..................... OK
 ga_sdot_patch .................... OK
 ga_destory ....................... OK
 
 Commencing NGA Test
 -------------------
 
 Checking 3-Dimensional Arrays
 
 ga_fill .......................... OK
ERROR (proc=0): a [20,19,0] =10036.000000,  b [20,19,0] =10101.000000
Last System Error Message from Task 0:: No such file or directory
Last System Error Message from Task 1:: No such file or directory
Last System Error Message from Task 2:: No such file or directory
Last System Error Message from Task 3:: No such file or directory
3:3:bye:: 0
(rank:3 hostname:neuro24a pid:32464):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
1:1:bye:: 0
(rank:1 hostname:neuro24a pid:32462):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
2:2:bye:: 0
(rank:2 hostname:neuro24a pid:32463):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
0:0:bye:: 0
(rank:0 hostname:neuro24a pid:32461):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI COMMUNICATOR 4 DUP FROM 0 
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on



Some of this output appears to be out of order as expected from a parallel crash.

I would like to add that the MPI complains
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

but as far as I googled this should be mostly harmless.

I will try again from a clean tar.gz again tomorrow, may be there is some other cruft.

Quote:Edoapra Feb 8th 5:55 pm

I will try to answer your questions next
1. The "Linux workstation platform" compilation instructions result in a binary to be used with mpirun, since USE_MPI is set equal to y

OK.
Quote:Edoapra Feb 8th 5:55 pm

2. testtask should not fail for the 3-D Global Arrays test

Thougt so :-)
Quote:Edoapra Feb 8th 5:55 pm

3. If no high-speed network is present, you can let ARMCI_NETWORK undefined. Another option, recently introduced in GA/ARMCI and that we have not thoroughly tested yet, is MPI-MT. I would suggest, first to try to get the vanilla ARMCI compilation to work,
and then you might try the ARMCI_NETWORK=MPI-MT setting

I fully agree with that. Also, I did not enable MPI_THREAD_MULTIPLE when compiling the openmpi, so testing that would take more effort. However, does your answer imply that MPI across nodes with ethernet should work with unset ARMCI_NETWORK ? Or is the only chance with ARMCI_NETWORK=MPI-MT if at all ?
Quote:Edoapra Feb 8th 5:55 pm

4. I am not sure how to answer this one, since I can see conflicting details in your question. If you compiler NWChem with USE_MPI=y, tests have to be run with doqmtests.mpi, since only doqmtests.mpi uses the needed mpirun

So "QA/HOW-TO-RUN-TESTS" as distributed with the release tar.gz
   
c) Run the doqmtests and runtest.md scripts as described above, but first
      edit those files to substitute "runtests.mpi.unix" for "runtests.unix"
      and "runtest.unix"


is outdated ? Could a link be added to recent instructions right at the compiler how-to page ? Some people might run the binary without testing ...

The testsuite failures I referred to were with the "serial" binary, not the mpi one. Of course these failures are quite impossible to diagnose/debug/assess via the internet. I hope the earlier forum thread on the testsuite failures with the supplied binaries gathers a few more helpful comments.


Best Regards

Christof

Forum Vet
Please set FC=ifort
Christoph
1) FC could be set either equal to ifort, gfortran or pgf90, I am not quite sure of the outcome of any other setting
2) Same story for CC. My suggestion is to avoid redefining CC, since a) we don't really have C source that is a computational bottleneck and b) we never test any C flavor other than the default gcc
3)
In other words, my recommendation is to compiler with
make FC=ifort
Once you have compiled a baseline, you might want to try fancier settings

Just Got Here
Dear Edo,

Thank you very much for your help !

I see, the oldtimer "golden rule" to use matching fortran and C compilers if possible to avoid underscoring/ABI/runtime problems is no longer valid. Of course underscoring was between fortran frontends, but the openmpi which GA links against was build using icc and not gcc ...

You are completely right, with gcc and ifort the "testtask" works after a fresh untaring of the source.

Still, seeing a mess like this when using compilers claiming to be standard conforming C compilers leaves me with a bad feeling.

Could the documentation be updated with a hint regarding this ? I would expect that many people would try to use a "matched team" of compilers as I did, especially after reading www.alcf.anl.gov/resource-guides/nwchem, note their
# Intel compilers
COMPILER=intel
CC=/soft/intel/11.1.059/bin/intel64/icc
FC=/soft/intel/11.1.059/bin/intel64/ifort


Best Regards

Christof

Build Environment:
tar -zxvf ~/src/NWCHEM/Nwchem-6.1.tar.gz
module add intel-12
setenv NWCHEM_TOP /local/neuro24/nwchem-6.1
setenv NWCHEM_TARGET LINUX64
setenv NWCHEM_MODULES "all"
setenv LARGE_FILES TRUE
setenv USE_NOFSCHECK TRUE
setenv LIB_DEFINES "-DDFLT_TOT_MEM=195853376"
setenv TCGRSH /usr/bin/ssh
cd $NWCHEM_TOP/src
module add openmpi-1.4.5_intel
setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y
setenv MPI_LOC /usr/local/stow/openmpi-1.4.5_intel
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv LIBMPI "-L/usr/local/stow/openmpi-1.4.5_intel/lib -lmpi_f90 -lmpi_f77 -lmpi -lpthread"
setenv BLASOPT "-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -I$MKLROOT/include"
make nwchem_config
make FC="ifort"


Forum >> NWChem's corner >> Compiling NWChem