NWChem computation: multi-CPU run never converges

Click here for full thread

Forum Vet

9:50:50 AM PDT - Mon, Jul 23rd 2012
I do not believe this is a kompile issue but rather a run issue. How much memory do you have per processor (not per node, but per core)? The memory keyword in NWChem is per core. Could you try running on multiple processors with the memory keyword set to maybe 1000 mb at most to see if this works. What is the hardware you are running on that forces you to use MPI-SPAWN? Thanks, Bert Quote:Jeronimo Jul 20th 2:14 pm Dear all, I'm trying to resolve a problem with our user's computations. I've prepared the NWChem 6.0 installation (64-bit systems, Debian 6.0) for him, compiled by GNU compilers (version 4.3.2) with MPI support (OpenMPI 1.4.3) with the following options: export NWCHEM_TOP=/software/NWChem-6.0/source/nwchem-6.0-src/ export NWCHEM_TARGET=LINUX64 export NWCHEM_MODULES="pnnl" export PYTHONHOME=$NWCHEM_TOP/../python-2.6.2/ export PYTHONVERSION=2.6 export USE_PYTHON64=y export ARMCI_NETWORK=MPI-SPAWN export USE_MPI=y export USE_MPIF=y export USE_MPIF4=y export LIBMPI="-lmpi_f90 -lmpi_f77 -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil" export MPI_LIB=/software/openmpi-1.4.3/lib/ export MPI_INCLUDE=/software/openmpi-1.4.3/include/ export LARGE_FILES=TRUE export USE_NOFSCHECK=TRUE export FC=gfortran The NWChem has successfully compiled: when running a single CPU version of the following computation, the computation successfully finishes. BUT, when running a multi-CPU version, the computation never converges (within the CCSD iterations) -- the computation crashes with "maxiter exceeded" and subsequent "ARMCI DASSERT fail (260)" messages (even though a single-CPU version conveges within 20 iterations, the multi-CPU does not converge even running for 1000 iterations). The computation: `start test memory 7000 mb geometry units angstroms S 0.000000 0.000000 0.000000 S 0.000000 0.000000 1.912540 Cl 1.961192 0.000000 2.612737 end basis S library aug-cc-pVDZ Cl library aug-cc-pVDZ end scf doublet rohf end tce scf ccsd io replicated tilesize 15 freeze core atomic nroots 2 targetsym a' symmetry dipole thresh 1.0d-5 end task tce energy` I've tried various compilers (GNU, Intel), various MPI libraries (OpenMPI 1.4.3, MPICH2 1.4.1p1) -- even though most of another computations succesfully finish (no matter whether being run on single or multi processors), the above computation behaves equally (badly). Please, is there somebody who can give me a hint, how to resolve the issue? I don't have any other idea... ;-( Thanks a lot in advance! Tom Rebok, Czech Republic. PS: The build log is available here: make.log