9:48:48 AM PST - Wed, Feb 8th 2012 |
|
Hello !
By following the instructions of the "How-to: Linux workstation platforms" I compiled a "serial" (i.e. non-mpi) and a mpi binary of nwchem 6.1 , see end of this message for the exact options and procedure.
The serial/tcgmsg parallel binary, i.e. compiled without setting any MPI environment variables, tests mostly ok. Some people have noted testsuite failures in other forum posts and I see similar ones, especially when iterative diagonalizers are used (e.g. pw module, dft_semidirect creates electrons) and the two ZORA tests.
The mpi parallel binary passes doqmtests mostly using 2 to 8 cores, but with doqmtests.mpi I see a failure in the "testtask" test in/after "Checking 3-Dimensional Arrays", i.e. the usual uninformative
Checking single precisions
ga_create ........................ OK
ga_fill_patch .................... OK
ga_copy_patch .................... OK
ga_copy_patch (transpose) ........ OK
ga_scale_patch ................... OK
ga_add_patch ..................... OK
ga_sdot_patch .................... OK
ga_destory ....................... OK
Commencing NGA Test
-------------------
Checking 3-Dimensional Arrays
Last System Error Message from Task 0:: Bad address
Last System Error Message from Task 1:: Bad address
Last System Error Message from Task 2:: Bad address
Last System Error Message from Task 3:: Bad address
1:Bus error, status=: 7
(rank:1 hostname:neuro24a pid:25580):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0
2:Bus error, status=: 7
(rank:2 hostname:neuro24a pid:25581):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0
3:Bus error, status=: 7
(rank:3 hostname:neuro24a pid:25582):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigBusHandler():222 cond:0
Later computational tests in doqmtests.mpi, e.g. h2o_opt, auh2o, h2mp2, rimp2_ne, scf_feco5, small_intchk, u_sodft ... are ok !
So, my questions are:
1. Should the "Linux workstation platform" compilation instructions result in a binary to be used with mpirun or is only tcgmsg parallel supported ? We will eventually buy 8-socket SMP machines, so this is interesting for us. Please note that ARMCI_NETWORK has not been set during compilation.
2. What is the expected result of the doqmtests.mpi "testtask" test on this platform. I am confused by the fact that the computational tests appear to work while the internal GA/communication tests appear to fail catastrophically.
3. For ARMCI_NETWORK only high speed interconnects are documented as supported. What about ethernet in its 1 GbE or 10 GbE variants ? Can nwchem be compiled to work in that environment across nodes and, if yes, what would be the correct setting for ARMCI_NETWORK ?
4. A comment on the testsuite failures of the serial binary would also be appreciated.
Thank you in advance for any help !
Best Regards
Christof
Detailed build sequence:
Compiler is intel Version 12.1.2.273 Build 20111128, MPI is openmpi 1.4.5rc2 compiled with that compiler. The openmpi build appears to be ok as other programs (vasp,cpmd,pwscf/quantum espresso) work without problems so far.
In a new shell with clean environment :
module add intel-12
setenv NWCHEM_TOP /local/neuro24/nwchem-6.1
setenv NWCHEM_TARGET LINUX64
setenv NWCHEM_MODULES "all"
setenv LARGE_FILES TRUE
setenv USE_NOFSCHECK TRUE
setenv LIB_DEFINES "-DDFLT_TOT_MEM=195853376"
setenv TCGRSH /usr/bin/ssh
setenv BLASOPT "-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lpthread -lm -I$MKLROOT/include"
cd $NWCHEM_TOP/src
make nwchem_config
rm make.out; make FC="ifort -I$MKLROOT/include -DDFLT_TOT_MEM=195853376 -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" CC="icc -fPIC -DMKL_ILP64 -DDFLT_TOT_MEM=195853376 -I$MKLROOT/include -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" FOPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" COPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" |tee make.out
-> at this point I save the serial/tcgmsg parallel nwchem binaries and run doqmtests
make realclean
module add openmpi-1.4.5_intel
setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y
setenv MPI_LOC /usr/local/stow/openmpi-1.4.5_intel
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv LIBMPI "-L/usr/local/stow/openmpi-1.4.5_intel/lib -lmpi_f90 -lmpi_f77 -lmpi -lpthread"
make nwchem_config
rm make.out ; make FC="ifort -fPIC -I$MKLROOT/include -DDFLT_TOT_MEM=195853376 -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" CC="icc -fPIC -DMKL_ILP64 -DDFLT_TOT_MEM=195853376 -I$MKLROOT/include -openmp -O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" FOPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" COPTIMIZE="-O2 -xHost -ip -funroll-loops -unroll-aggressive -fp-model precise -fp-model source" | tee make.out
this gives a mpi parallel binary
setenv MPIRUN_NPOPT "-np"
setenv MPIRUN_PATH "/usr/local/stow/openmpi-1.4.5_intel/bin/mpirun --prefix /usr/local/stow/openmpi-1.4.5_intel/"
testing with doqmtests after substituting runtests.mpi.unix everywhere and doqmtests.mpi without modifications
|