Compiling/Running NWChem 6.8.1 over Intel Omni-Path and NVidia P100 offload


Just Got Here
Dear Colleague,

I was trying to compile NWChem 6.8 for a cluster, each node is equipped dual Xeon Gold and 4x NVidia P100, and interconnected by Intel Omni Path. I plan to run CCSD(T) offload to the GPUs, and at first I wished to use ARMCI_NETWOR=ARMCI according to Hammond's instruction (https://github.com/jeffhammond/HPCInfo/blob/master/ofi/NWChem-OPA.md); but somehow I failed at the stage when compiling Casper, so I decided to try ARMCI_NETWOR=MPI-PR beforehand. Here is the setup of compilation,
 Currently Loaded Modulefiles:
1) intel/2018_u1 3) hwloc/1.11.6
2) mvapich2/gcc/64/2.2rc1 4) cuda/8.0.61

export NWCHEM_TOP=$HOME/src/nwchem-6.8.1.MPI-PR
export FC=ifort
export CC=icc
export CXX=icpc
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export NWCHEM_TARGET=LINUX64
export USE_PYTHONCONFIG=y
export PYTHONVERSION=2.7
export PYTHONHOME=/usr
export BLASOPT="-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm"
export USE_SCALAPACK=y
export SCALAPACK="-L$MKLROOT/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"
export NWCHEM_MODULES="all python"
export MRCC_METHODS=TRUE
export USE_OPENMP=1
export CUDA=nvcc
export TCE_CUDA=Y
export CUDA_LIBS="-L/pkg/cuda/8.0.61/lib64 -lcublas -lcudart"
export CUDA_FLAGS="-arch sm_60 "
export CUDA_INCLUDE="-I. -I/pkg/cuda/8.0.61/include"
export ARMCI_NETWORK=MPI-PR
export USE_MPI=y
export MPI_LOC=/usr/mpi/intel/mvapich2-2.2-hfi
export MPI_LIB=/usr/mpi/intel/mvapich2-2.2-hfi/lib
export MPI_INCLUDE=/usr/mpi/intel/mvapich2-2.2-hfi/include
export LIBMPI="-lmpichf90 -lmpich -lopa -lmpl -lpthread -libverbs -libumad -ldl -lrt"


The following make commands went successfully and I was able to obtain the binary of nwchem. However, I noticed in the log file that although I had set CUDA_FLAGS to "-arch sm_60", nvcc still used -arch=sm_35 during the compilation process, for example,

nvcc -c -O3 -std=c++11 -DNOHTIME -Xptxas --warn-on-spills -arch=sm_35 -I. -I/pkg/cuda/8.0.61/include -I/home/molpro/src/nwchem-6.8.1.MPI-PR/src/tce/ttlg/includes -o memory.o memory.cu

and some warnings also showed up such as:

nvcc -c -O3 -std=c++11 -DNOHTIME -Xptxas --warn-on-spills -arch=sm_35 -I. -I/pkg/cuda/8.0.61/include -I/home/molpro/src/nwchem-6.8.1.MPI-PR/src/tce/ttlg/includes -o sd_t_total_ttlg.o sd_t_total_ttlg.cu
Compiling ccsd_t_gpu.F...
./sd_top.fh(5): warning #7734: DEC$ ATTRIBUTES OFFLOAD is deprecated. [OMP_GET_WTIME]
cdir$ ATTRIBUTES OFFLOAD : mic :: omp_get_wtime

^
./sd_top.fh(5): warning #7734: DEC$ ATTRIBUTES OFFLOAD is deprecated. [OMP_GET_WTIME]
cdir$ ATTRIBUTES OFFLOAD : mic :: omp_get_wtime

^
.....
and so on..

Then I just tried to run the QA/tests/tce_cuda/tce_cuda.nw ,

./runtests.mpi.unix procs 2 tce_cuda

but nwchem immediately popped error after SCF calculation as shown in the attached file. May I learn that:

1. Does the error came from my incorrect compilation, or due to the -arch flag of nvcc that , or I need to adjust the tce_cuda.nw to fit my running environment?
2. How can I correctly set the CUDA_FLAGS to force nvcc to do -arch=sm_60 for the P100 GPUs? During the compilation process that make command did not seem to honor the CUDA_FLAGS setting.

Thanks a lot for your kind help.

Kenny

Attached information with Media:tce_cuda.out errors follow.

argument  1 = /home/chem/src/nwchem-6.8.1.MPI-PR/QA/tests/tce_cuda/tce_cuda.nw
NWChem w/ OpenMP: maximum threads = 1

...

!! The overlap matrix has   2 vectors deemed linearly dependent with
eigenvalues:
0.00D+00 0.00D+00


Superposition of Atomic Density Guess
-------------------------------------

Sum of atomic energies: -75.76222910
------------------------------------------------------------------------
ga_orthog: hard zero 1
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
41: task tce energy
------------------------------------------------------------------------
------------------------------------------------------------------------
This error has not yet been assigned to a category
------------------------------------------------------------------------
For more information see the NWChem manual at
http://nwchemgit.github.io/index.php/NWChem_Documentation


For further details see manual section: 
No section for this category



[0] Received an Error in Communication: (1) 0:ga_orthog: hard zero:
[cli_0]: aborting job:
application called MPI_Abort(comm=0x84000004, 1) - process 0

=======================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 151045 RUNNING AT glogin1
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=======================================================================


Forum >> NWChem's corner >> Running NWChem