Enabling TCE CUDA


Just Got Here
Hi,

we have a setup comprising of several blades, each equipped with two 8-way CPUs (Intel Xeon E5-2620-V2) and two Tesla K40 GPUs. In the environment
Currently Loaded Modules:
  1) StdEnv    2) intel/13.1.0    3) mkl/11.0.2    4) intelmpi/4.1.0    5) cuda/5.5

I compiled the version
    source          = /wrk/runeberg/taito_wrkdir/gpu/nwchem-src-2014-01-28
    nwchem branch   = Development
    nwchem revision = 25178
    ga revision     = 10467

using the setup
export NWCHEM_TOP=$PWD/nwchem-src-2014-01-28
export NWCHEM_TARGET=LINUX64
export USE_MPI=y
export USE_MPIF=y

export NWCHEM_MODULES=all
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE

export BLAS_LIB="-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm"
export BLASOPT="$BLAS_LIB"
export BLAS_SIZE=8
export SCALAPACK_SIZE=8
export SCALAPACK="-L$MKLROOT/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"
export SCALAPACK_LIB="$SCALAPACK"
export USE_SCALAPACK=y

export TCE_CUDA=y
export CUDA_LIBS="-L$CUDA_PATH/lib64 -lcudart -lcublas"
export CUDA_FLAGS="-arch sm_35"
export CUDA_INCLUDE="-I. -I$CUDA_PATH/include"
export CUDA=nvcc
export PATH=$PATH:$CUDA_PATH/bin
[runeberg@taito-login3 gpu]$ cd $NWCHEM_TOP
./contrib/distro-tools/build_nwchem | tee build_nwchem.log


The build went smoothly but for some reason I can't get the binary to cooperate with
our setup/queuing system, slurm 2.6.7 - I can only launch one mpi process per gpu.
I can launch several mpi processes even over several blades (though the
performance is quite bad) but only engage one gpu per mpi-core. Any ideas?

Cheers,
             --Nino

Forum Regular
Hi Nino,

At present we have only implemented the logic to control 1 GPU card per node. We simply set aside a single MPI process to control the GPU and other MPI processes on the same node should work in the same way as the code without GPU support. There are various potential options to extending this to multiple GPUs per node but we haven't established a general approach to doing this yet.

Huub


Forum >> NWChem's corner >> Compiling NWChem