Compiling NWCHEM in an IBM PPC64 running linux.


Clicked A Few Times
Does anyone has any experience compiling nwchem 6.1.1 on an IBM SP6 running Linux (not aix) with xlf compilers? (user mode)

Could you please post the environment settings?

The main difficulties I'm having are:
How to use essl properly?
Parallel environment: I've found the ppp.poe mpi libraries, but I don't have an idea how to link them properly, and I don't know the particular flavour of hardware to use.

If by chance anyone is a user, the target machine is the Huygens national supercomputer of Netherlands.

Best,
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Dr. O. Baris Malcioglu,
University of Liege,
Bât. B5 Physique de la matière condensée
allée du 6 Août 17
4000 Liège 1
Belgique

Forum Regular
Hi Baris,

Your question is a slightly tricky one, mainly as I don't have an account there so I cannot actually test that what I say really works. However, I think we can take a first stab at this and see how we go.

Given that we know relatively little about the machine I would like to suggest using the build script. This script is still in an experimental stage and hence not widely advertised but it seems to work on a variety of platforms. The only thing we need be concerned with is the fact that you want to use the essl library. Assuming that you can load essl by providing the linker with the -lessl flag you should set the environment variable BLASOPT=-lessl (exactly how depends on whether you are using sh or bash or the (t)csh family of shells).

Next go into the nwchem-6.1.1 directory and run

    ./contrib/distro-tools/build_nwchem | tee build_nwchem.log

and see what happens. The output has a section entitled "Building NWChem" which provides details on what the script has detected on the machine. If everything works as it should the code should just compile. If not please post whatever the script said between "Building NWChem" and "configure GA", and any error messages. I can then scratch my head about why you got the results you got.

One comment: the script tries to guess all sorts of things. However, the script was also written in the knowledge that sitting in my office I cannot possibly guess every possible setup of ever machine on the planet. So in case you have some local knowledge of a given machine you can set appropriate environment variables that override anything the script might think (even if your settings cannot make any sense). This allows you to fully control what the script does, but it also often forces the script into a blind alley with no way out. So initially it is best to not set anything and let the script do its thing. If that approach is not successfull environment settings can be judiciously adjusted to help the script out. So far that approach has had more success than setting a lot of stuff up front.

I hope this helps, Huub

Just Got Here
Building and running in IBM Power755 Linux
At least I can *build* in a similar setup - I built NWChem 6.1.1 in our IBM Power755 linux (SUSE).

I can run a serial job, but parallel jobs via POE crash. My build is perhaps not good enough, but I figured it's still worth sharing (to ask for help from my side too)

The environment variables I have are:


export NWCHEM_TOP=/hpc/home/seb56/pkg/nwchem-6.1.1-p7linux/
export NWCHEM_BASIS_LIBRARY=/usr/local/pkg/nwchem/nwchem-6.1.1/data/libraries/
export MP_HOSTFILE=/hpc/home/seb56/host.list
export CC="mpcc"
export F77="mpfort"
export MPICC="mpcc"
export MPIF77="mpfort"
export MPIEXEC=poe
export ARMCI_NETWORK=OPENIB
export IB_INCLUDE=/usr/include/infiniband
export MSG_COMMS=MPI
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export LARGE_FILES=TRUE
export MPI_LIB=/opt/ibmhpc/ppe.poe/lib64/
export MPI_INCLUDE='/opt/ibmhpc/ppe.poe/include/ibmmpi/ -I/opt/ibmhpc/ppe.poe/include/ibmmpi/thread64'
export LIBMPI="-L${MPI_LIB} -lmpi_ibm"
export NWCHEM_TARGET=LINUX64
export NWCHEM_TARGET_CPU=ppc64
export NWCHEM_MODULES=all
export CFLAGS='-qtune=pwr7 -qarch=pwr7 -q64 -qhot'
export FFLAGS=$CFLAGS
export CFLAGS_FORGA=$CFLAGS
export FFLAGS_FORGA=$CFLAGS


0. make nwchem_config

And I took the following steps.

1. Edit $NWCHEM_TOP/src/config/makefile.h such that

(line 1924) CC=xlc
(line 1940) FOPTIMIZE= -O2 -qstrict -qarch=auto -qtune=auto -qcache=auto -qfloat=fltint (-O3 hangs at some stage)

2. Open $NWCHEM_TOP/src/nwpw/nwpwlib/Parallel/Parallel-tcgmsg.F and replace /* ... */ with **

(line 825) *        *determine psr - should be made w/o using tmp array! */
(line 964) * *determine psr - should be made w/o using tmp array! */



3. Back to $NWCHEM_TOP/src and enter "make FC=xlf CC=xlc"

Sometimes it will get stuck at:

checking for fork... yes

Open another terminal and "ps -aux |grep conftest"

Kill the one with ./conftest, by (kill -KILL pid) not (poe ./conftest), which will wake up the build process.


When it fails and complains about *.fh files, do the following

cp $NWCHEM_TOP/src/util/*.fh $NWCHEM_TOP/src/include


4. Back to $NWCHEM_TOP/src and enter "make FC=xlf CC=xlc" to carry on the build.

5. Towards the end of build process, it will build "nwchem" executable by

xlf -q64 -qextname -qfixed  -NQ40000 -NT80000 -qmaxmem=8192 -qxlf77=leadzero -qintsize=8 -O2 -g   -L/hpc/home/seb56/pkg/nwchem-6.1.1-p7linux//lib/LINUX64_ppc64 -L/hpc/home/seb56/pkg/nwchem-6.1.1-p7linux//src/tools/install/lib  -o /hpc/home/seb56/pkg/nwchem-6.1.1-p7linux//bin/LINUX64_ppc64/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lcons -lperfm -ldntmc -lccca -lnwcutil -lga -lpeigs -lperfm -lcons -lbq -lnwcutil -llapack  -lblas   -L/opt/ibmhpc/ppe.poe/lib64/   -libverbs   

which will fail due to referencing mpi calls. I replaced "xlf" in the line above with "mpfort" and re-ran the following command.

mpfort -q64 -qextname -qfixed  -NQ40000 -NT80000 -qmaxmem=8192 -qxlf77=leadzero -qintsize=8 -O2 -g   -L/hpc/home/seb56/pkg/nwchem-6.1.1-p7linux//lib/LINUX64_ppc64 -L/hpc/home/seb56/pkg/nwchem-6.1.1-p7linux//src/tools/install/lib  -o /hpc/home/seb56/pkg/nwchem-6.1.1-p7linux//bin/LINUX64_ppc64/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lcons -lperfm -ldntmc -lccca -lnwcutil -lga -lpeigs -lperfm -lcons -lbq -lnwcutil -llapack  -lblas   -L/opt/ibmhpc/ppe.poe/lib64/   -libverbs   

it compiled correctly.

Now, I have nwchem built in
$NWCHEM_TOP/bin/LINUX64_ppc64/nwchem

I followed the General site installation. The output of the ldd indicates that the executable is correctly linked to the mpi_ibm, poe, bibverbs etc.
$ldd /usr/local/bin/nwchem
linux-vdso64.so.1 => (0x0000040000040000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00000400000a0000)
libmpi_ibm.so => /usr/lib64/libmpi_ibm.so (0x00000400000d0000)
libpoe.so => /usr/lib64/libpoe.so (0x0000040000380000)
liblapi.so => /usr/lib64/liblapi.so (0x00000400003e0000)
libxlf90_r.so.1 => /opt/ibmcmp/lib64/libxlf90_r.so.1 (0x0000040000620000)
libxlomp_ser.so.1 => /opt/ibmcmp/lib64/libxlomp_ser.so.1 (0x0000040000d50000)
libxlfmath.so.1 => /opt/ibmcmp/lib64/libxlfmath.so.1 (0x0000040000d70000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000040000d90000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000040000dc0000)
librt.so.1 => /lib64/power7/librt.so.1 (0x0000040000de0000)
libpthread.so.0 => /lib64/power7/libpthread.so.0 (0x0000040000e00000)
libm.so.6 => /lib64/power7/libm.so.6 (0x0000040000e40000)
libc.so.6 => /lib64/power7/libc.so.6 (0x0000040000f10000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000400010f0000)
/lib64/ld64.so.1 (0x0000040000000000)


If I run a test script

$poe nwchem test.nw

This works as expected and computes the test correctly.

If I run 2 processes

$poe nwchem test.nw -procs 2

It instantly fails as shown below.

$poe nwchem test.nw -procs 2
argument 1 = test.nw
0:Segmentation Violation error, status=: 11
(rank:0 hostname:p1n12-c pid:116227):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common /signaltrap.c:SigSegvHandler():310 cond:0
nwchem[0x12472618]
nwchem[0x124c83e8]
[0x40000040418]
nwchem[0x124861c4]
nwchem[0x1252ec6c]
nwchem[0x124c4c18]
nwchem[0x12481174]
nwchem[0x12481f88]
nwchem[0x1247f2f0]
nwchem[0x124a27f0]
nwchem[0x124735bc]
nwchem[0x1247f230]
nwchem[0x124f9de8]
nwchem[0x123a0f00]
nwchem[0x10008790]
/lib64/power7/libc.so.6(+0x4f05c)[0x40000f5f05c]
/lib64/power7/libc.so.6(__libc_start_main-0x16ea7c)[0x40000f5f27c]
Last System Error Message from Task 0:: No such file or directory
ERROR: 0031-250 task 0: Terminated

(a while later...)

ERROR: 0032-171 Communication subsystem error:  2660-413 Communication timeout has occurred. in MPI_Recv, task 1
ERROR: 0032-171 Communication subsystem error: 2660-413 Communication timeout has occurred. in routine unknown, task 1


The backtrace that looks like nwchem[0x.......]... were enabled by editing /src/tools/ga=-5-1/armci/src/common/armci.c

(line 29) #define  PRINT_BT

(line 986) 
#if defined(PRINT_BT)
void *bt[100];
backtrace_symbols_fd(bt, backtrace(bt, 100), 2);
#endif

Can anyone help please?

Thanks
Sung

==

--
Sung Eun Bae, Ph.D
Supercomputing Services and Support Consultant
BlueFern
University of Canterbury
Private Bag 4800
Christchurch 8140
New Zealand


http://www.bluefern.canterbury.ac.nz
Tel: +64 3 364 2987 ext 43070
Mobile: +64 21 238 1420
Fax: +64 3 364 3002



Forum >> NWChem's corner >> Compiling NWChem