6.1.1 MPI build runs great, but only on 1 node


Jump to page 1Prev 16
Gets Around
The following fails linking blas for me, are the steps in the right order?:

export NWCHEM_TOP=/.../nwchem-6.1.1-src
export NWCHEM_TARGET=LINUX64
export CC=gcc
export FC=gfortran
export LD_LIBRARY_PATH=/usr/lib64/openmpi/1.4-gcc/lib
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPIEXEC=/usr/lib64/openmpi/1.4-gcc/bin/mpiexec
export MPI_LIB=/usr/lib64/openmpi/1.4-gcc/lib
export MPI_INCLUDE=/usr/lib64/openmpi/1.4-gcc/include
export LIBMPI='-L/usr/lib64/openmpi/1.4-gcc/lib -lmpi -lmpi_f90 -lmpi_f77'
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export TCGRSH=ssh
export PYTHONHOME=/usr
export PYTHONVERSION=2.4
export PYTHONLIBTYPE=a
export USE_PYTHON64=y
export HAS_BLAS=yes
export BLASOPT="-L/usr/lib64 -lblas -llapack"
make nwchem_config NWCHEM_MODULES="all python" 2>&1 | tee make_nwchem_config.log
make 64_to_32 2>&1 | tee make_64_to_32.log
make USE_64TO32=y 2>&1 | tee make.log
cd $NWCHEM_TOP/src/tools
make FC=gfortran GA_DIR=ga-4-3 OLD_GA=y clean 2>&1 | tee ../make_ga_clean.log
make FC=gfortran GA_DIR=ga-4-3 OLD_GA=y 2>&1 | tee ../make_ga.log
cd ..
make FC=gfortran link 2>&1 | tee make_link.log

with:
/.../nwchem-6.1.1-src/src/task/task_bsse.F:1778: undefined reference to `ycopy_'

Please fix the problems with posting to the forum: im wasting about 5 min per post trying
to figure out, line-by-line, which characters are are allowed and which are not.
This time i figured out that single quote is not allowed in im.

Forum Vet
check the existence of $NWCHEM_TOP/lib/LINUX64/lib64to32.a
Please do the following
1) check the existence of $NWCHEM_TOP/lib/LINUX64/lib64to32.a
2) if the library is there, check the existence of the ycopy_ symbol

nm $NWCHEM_TOP/lib/LINUX64/lib64to32.a | grep ycopy

3) if the symbol is not there (or the full library is missing),

cd $NWCHEM_TOP/src/64to32blas
make clean
make FC=gfortran
relink

Gets Around
The following links (i'm not sure if properly to ga-4-3), but still fails across the nodes.

make nwchem_config NWCHEM_MODULES="all python" 2>&1 | tee make_nwchem_config.log
make 64_to_32 2>&1 | tee make_64_to_32.log
export MAKEOPTS="USE_64TO32=y"
make ${MAKEOPTS} 2>&1 | tee make.log
cd $NWCHEM_TOP/src/tools
make ${MAKEOPTS} FC=gfortran GA_DIR=ga-4-3 OLD_GA=y clean 2>&1 | tee ../make_ga_clean.log
make ${MAKEOPTS} FC=gfortran GA_DIR=ga-4-3 OLD_GA=y 2>&1 | tee ../make_ga.log
cd ..
make ${MAKEOPTS} FC=gfortran link 2>&1 | tee make_link.log

Gets Around
To make the debugging easier, please check the log of the latest build at (for CentOS 5 86_64):
https://build.opensuse.org/package/show?package=nwchem&project=home%3Adtufys

This is the "official" build place for our institute RPM/deb packages, so please when the problem with crashes across the nodes
is finally resolved, consider annoucing that "unofficial binaries" of nwchem ara available at:
https://wiki.fysik.dtu.dk/ase/download.html#installation-with-package-manager-on-linux

Note that the RPMS should be fine for running on single node already now.
Note that i don't build deb packages as the nwchem version 6.0 is already available on debian/ubuntu.

Forum Vet
Quote:Marcindulak Aug 25th 2:42 am
The following links (i'm not sure if properly to ga-4-3), but still fails across the nodes.

Does the stderr/stdout from these failures (using ga-4-3) have the same aspect as the one with ga-5-1?

Forum Vet
Quote:Marcindulak Aug 25th 5:21 am
To make the debugging easier, please check the log of the latest build at (for CentOS 5 86_64):
https://build.opensuse.org/package/show?package=nwchem&project=home%3Adtufys


I still see ga-5-1 been used at

https://build.opensuse.org/package/rawlog?arch=x86_64&package=nwchem&project=home%...

and

at

https://build.opensuse.org/package/rawlog?arch=x86_64&package=nwchem&project=home%...

You can find by searching for

../ga-5-1/configure

The correct behavior for ga-4-3 should not show any "../ga-5-1/configure" entry.
Once you type

cd $NWCHEM_TOP/src/tools;make FC=gfortran

no configure business should show up (ga-4-3 was not using autoconf and related tools)
and the compilation should start immediately.
By any chance, have you modified the file $NWCHEM_TOP/src/tools/GNUmakefile?
Could please post the $NWCHEM_TOP/src/tools/GNUmakefile that you are using?

Gets Around
Probably i just invoke make in wrong order and/or wrong options - can you provide the full sequence?
I dont make any changes to the makefiles and every time run the kompile.sh script in a clean environment.
The contents of kompile.sh is now listed in the logs:
https://build.opensuse.org/package/live_build_log?arch=i586&package=nwchem&project...
What changes need to be done to kompile.sh?

Forum Vet
link stage wrong in compile.sh
The link stage is missing the ga-4-3 definition, therefore it is linking with the ga-5-1 libraries that -- unfortunately --
are present since you have previously compiled them. The link line should be changed from

/usr/bin/make ${MAKEOPTS} FC=gfortran link 2>&1 | tee make_link.log

to

/usr/bin/make ${MAKEOPTS} FC=gfortran GA_DIR=ga-4-3 OLD_GA=y link 2>&1 | tee make_link.log

Gets Around
Thanks, this solves the problem and Nwchem runs now in parallel across the nodes.

Forum Vet
Quote:Marcindulak Aug 29th 2:50 am
Thanks, this solves the problem and NWChem runs now in parallel across the nodes.


Very good


Forum >> NWChem's corner >> Compiling NWChem
Jump to page 1Prev 16