Hi,
we have a setup comprising of several blades, each equipped with two 8-way CPUs (Intel Xeon E5-2620-V2) and two Tesla K40 GPUs. In the environment
Currently Loaded Modules:
1) StdEnv 2) intel/13.1.0 3) mkl/11.0.2 4) intelmpi/4.1.0 5) cuda/5.5
I compiled the version
source = /wrk/runeberg/taito_wrkdir/gpu/nwchem-src-2014-01-28
nwchem branch = Development
nwchem revision = 25178
ga revision = 10467
using the setup
export NWCHEM_TOP=$PWD/nwchem-src-2014-01-28
export NWCHEM_TARGET=LINUX64
export USE_MPI=y
export USE_MPIF=y
export NWCHEM_MODULES=all
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE
export BLAS_LIB="-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm"
export BLASOPT="$BLAS_LIB"
export BLAS_SIZE=8
export SCALAPACK_SIZE=8
export SCALAPACK="-L$MKLROOT/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"
export SCALAPACK_LIB="$SCALAPACK"
export USE_SCALAPACK=y
export TCE_CUDA=y
export CUDA_LIBS="-L$CUDA_PATH/lib64 -lcudart -lcublas"
export CUDA_FLAGS="-arch sm_35"
export CUDA_INCLUDE="-I. -I$CUDA_PATH/include"
export CUDA=nvcc
export PATH=$PATH:$CUDA_PATH/bin
[runeberg@taito-login3 gpu]$ cd $NWCHEM_TOP
./contrib/distro-tools/build_nwchem | tee build_nwchem.log
The build went smoothly but for some reason I can't get the binary to cooperate with
our setup/queuing system, slurm 2.6.7 - I can only launch one mpi process per gpu.
I can launch several mpi processes even over several blades (though the
performance is quite bad) but only engage one gpu per mpi-core. Any ideas?
Cheers,
--Nino
|