NWChem computation: multi-CPU run never converges


Clicked A Few Times
Dear all,

I'm trying to resolve a problem with our user's computations. I've prepared the NWChem 6.0 installation (64-bit systems, Debian 6.0) for him, compiled by GNU compilers (version 4.3.2) with MPI support (OpenMPI 1.4.3) with the following options:


export NWCHEM_TOP=/software/NWChem-6.0/source/nwchem-6.0-src/
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="pnnl"

export PYTHONHOME=$NWCHEM_TOP/../python-2.6.2/
export PYTHONVERSION=2.6
export USE_PYTHON64=y

export ARMCI_NETWORK=MPI-SPAWN
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export LIBMPI="-lmpi_f90 -lmpi_f77 -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil"
export MPI_LIB=/software/openmpi-1.4.3/lib/
export MPI_INCLUDE=/software/openmpi-1.4.3/include/

export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE

export FC=gfortran


The NWChem has successfully compiled: when running a single CPU version of the following computation, the computation successfully finishes. BUT, when running a multi-CPU version, the computation never converges (within the CCSD iterations) -- the computation crashes with "maxiter exceeded" and subsequent "ARMCI DASSERT fail (260)" messages (even though a single-CPU version conveges within 20 iterations, the multi-CPU does not converge even running for 1000 iterations).

The computation:

start test

memory 7000 mb

geometry units angstroms
S 0.000000 0.000000 0.000000
S 0.000000 0.000000 1.912540
Cl 1.961192 0.000000 2.612737
end

basis
 S library aug-cc-pVDZ
Cl library aug-cc-pVDZ
end

scf
doublet
rohf
end

tce
scf
ccsd
io replicated
tilesize 15
freeze core atomic
nroots 2
targetsym a'
symmetry
dipole
thresh 1.0d-5
end

task tce energy


I've tried various compilers (GNU, Intel), various MPI libraries (OpenMPI 1.4.3, MPICH2 1.4.1p1) -- even though most of another computations succesfully finish (no matter whether being run on single or multi processors), the above computation behaves equally (badly).

Please, is there somebody who can give me a hint, how to resolve the issue? I don't have any other idea... ;-(

Thanks a lot in advance!
Tom Rebok, Czech Republic.

PS: The build log is available here: make.log

Forum Vet
I do not believe this is a kompile issue but rather a run issue. How much memory do you have per processor (not per node, but per core)? The memory keyword in NWChem is per core. Could you try running on multiple processors with the memory keyword set to maybe 1000 mb at most to see if this works.

What is the hardware you are running on that forces you to use MPI-SPAWN?

Thanks,

Bert


Quote:Jeronimo Jul 20th 2:14 pm
Dear all,

I'm trying to resolve a problem with our user's computations. I've prepared the NWChem 6.0 installation (64-bit systems, Debian 6.0) for him, compiled by GNU compilers (version 4.3.2) with MPI support (OpenMPI 1.4.3) with the following options:


export NWCHEM_TOP=/software/NWChem-6.0/source/nwchem-6.0-src/
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="pnnl"

export PYTHONHOME=$NWCHEM_TOP/../python-2.6.2/
export PYTHONVERSION=2.6
export USE_PYTHON64=y

export ARMCI_NETWORK=MPI-SPAWN
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export LIBMPI="-lmpi_f90 -lmpi_f77 -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil"
export MPI_LIB=/software/openmpi-1.4.3/lib/
export MPI_INCLUDE=/software/openmpi-1.4.3/include/

export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE

export FC=gfortran


The NWChem has successfully compiled: when running a single CPU version of the following computation, the computation successfully finishes. BUT, when running a multi-CPU version, the computation never converges (within the CCSD iterations) -- the computation crashes with "maxiter exceeded" and subsequent "ARMCI DASSERT fail (260)" messages (even though a single-CPU version conveges within 20 iterations, the multi-CPU does not converge even running for 1000 iterations).

The computation:

start test

memory 7000 mb

geometry units angstroms
S 0.000000 0.000000 0.000000
S 0.000000 0.000000 1.912540
Cl 1.961192 0.000000 2.612737
end

basis
 S library aug-cc-pVDZ
Cl library aug-cc-pVDZ
end

scf
doublet
rohf
end

tce
scf
ccsd
io replicated
tilesize 15
freeze core atomic
nroots 2
targetsym a'
symmetry
dipole
thresh 1.0d-5
end

task tce energy


I've tried various compilers (GNU, Intel), various MPI libraries (OpenMPI 1.4.3, MPICH2 1.4.1p1) -- even though most of another computations succesfully finish (no matter whether being run on single or multi processors), the above computation behaves equally (badly).

Please, is there somebody who can give me a hint, how to resolve the issue? I don't have any other idea... ;-(

Thanks a lot in advance!
Tom Rebok, Czech Republic.

PS: The build log is available here: make.log

Clicked A Few Times
Dear Bert,

at first, sorry for my late response (I've had a vacation).

Quote:Bert Jul 23rd 9:50 am
I do not believe this is a kompile issue but rather a run issue. How much memory do you have per processor (not per node, but per core)? The memory keyword in NWChem is per core. Could you try running on multiple processors with the memory keyword set to maybe 1000 mb at most to see if this works.


I do run the computation using various nodes in our computing infrastructure -- ranging from less-CPU nodes (Dual Core AMD Opteron 885, 16 cores and 64GB of memory) to SMP nodes (Intel Xeon E7 4860, 80 cores, 512GB of memory). On each of these clusters, I did have 4 cores and 50GB of memory reserved on a single node -- no matter which machine I use, all the computations fail in the same fashion (described initially). (I've also tried to specify just 1GB of memory as you have suggested; however, this resulted in the same error).

To illustrate a run, here is a run log on the SMP node (80 cores, 512GB of memory) -- the computation obtained 4CPUs and 50GB of memory reserved by our scheduling system: single-CPU (successfull) computation run-single.out and multi-CPU (failing) computation run-multi.out (the failing convergence is visible on lines starting by line no. 1031).

Quote:Bert Jul 23rd 9:50 am
What is the hardware you are running on that forces you to use MPI-SPAWN?


Since the infrastructure we run is quite heterogenous, some clusters do have Infiniband interconnection and some do not. Thus, I've decided to use MPI-SPAWN so that such a compilation should be runable on all the machines we run. I hope this is a correct deduction...

Thanks very much for any advice.
--best
Tom Rebok, MetaCentrum NGI, Czech Republic.

PS: Is there anybody, who can test to run the computation above in parallel and let me know whether it converges?

PS2: Maybe, it is somehow related to the problem described here...

Forum Vet
Just ran it with 16 processors. I would remove the "io replicated" from the input (that's what I did).

Also, I seriously doubt you can build one binary that will work on all platforms. Also, we have never tested running MPI-SPAWN between heterogeneous nodes on different clusters.

Bert


[QUOTE=Jeronimo Jul 31st 12:01 pm]Dear Bert,

at first, sorry for my late response (I've had a vacation).

Quote:Bert Jul 23rd 9:50 am
I do not believe this is a kompile issue but rather a run issue. How much memory do you have per processor (not per node, but per core)? The memory keyword in NWChem is per core. Could you try running on multiple processors with the memory keyword set to maybe 1000 mb at most to see if this works.


I do run the computation using various nodes in our computing infrastructure -- ranging from less-CPU nodes (Dual Core AMD Opteron 885, 16 cores and 64GB of memory) to SMP nodes (Intel Xeon E7 4860, 80 cores, 512GB of memory). On each of these clusters, I did have 4 cores and 50GB of memory reserved on a single node -- no matter which machine I use, all the computations fail in the same fashion (described initially). (I've also tried to specify just 1GB of memory as you have suggested; however, this resulted in the same error).

To illustrate a run, here is a run log on the SMP node (80 cores, 512GB of memory) -- the computation obtained 4CPUs and 50GB of memory reserved by our scheduling system: single-CPU (successfull) computation run-single.out and multi-CPU (failing) computation run-multi.out (the failing convergence is visible on lines starting by line no. 1031).

Quote:Bert Jul 23rd 9:50 am
What is the hardware you are running on that forces you to use MPI-SPAWN?


Since the infrastructure we run is quite heterogenous, some clusters do have Infiniband interconnection and some do not. Thus, I've decided to use MPI-SPAWN so that such a compilation should be runable on all the machines we run. I hope this is a correct deduction...

Thanks very much for any advice.
--best
Tom Rebok, MetaCentrum NGI, Czech Republic.

PS: Is there anybody, who can test to run the computation above in parallel and let me know whether it converges?

PS2: Maybe, it is somehow related to the problem described here...

Clicked A Few Times
Dear Bert,

Quote:Bert Aug 3rd 9:23 am
Just ran it with 16 processors. I would remove the "io replicated" from the input (that's what I did).


thanks a lot! That works!!! :-)

May I ask you for short info, what does the option "io replicated" influence? And why it brokes such type of computations? And when it is (or a different io scheme) suitable to use?

Quote:Bert Aug 3rd 9:23 am
Also, I seriously doubt you can build one binary that will work on all platforms. Also, we have never tested running MPI-SPAWN between heterogeneous nodes on different clusters.


I see. But our clusters are not of different platforms -- all our clusters are "x86_64" architecture (having as much as possible unique OS/SW equipment), and thus I think that such a binary should work for us. And if not, I'll solve it once such a problem appears...

Once again: THANK YOU VERY MUCH FOR YOUR HELP AND SPENT TIME.

--best
Tom.


Forum >> NWChem's corner >> Running NWChem