Multiple errors running NWChem 6.5 on Intel Xeon


Clicked A Few Times
Hello,
I got NWChem 6.5 installed on Helios system with Intel Xeon Phi. Here is input file. Same input works well on CRAY NWChem 6.3.
Here are errors that I get after launching nwchem:
1) 0:Floating Point Exception error, status=: 8
2) (rank:0 hostname:helios775 pid:31625):ARMCI DASSERT fail. ../../ga-5 3/armci/src/common/signaltrap.c:SigFpeHandler():249 cond:0
3) Last System Error Message from Task 0:: No such file or directory
Each error is repeated 16 times. Output

Compilation parameters:
Quote:

NWCHEM_TOP /csc/home2/$usr/NWChem-6.5
NWCHEM_TARGET LINUX64
ARMCI_NETWORK OPENIB

- MPI settings
USE_MPI y
USEMPIF y
USE_MPIF4 y

MPI_LOC /opt/mpi/bullxmpi/1.2.8.2
MPI_LIB $MPI_LOC/lib
MPI_INCLUDE $MPI_LOC/include
LIBMPI "-lmpi_f90 -lmpi_f77 -lmpi -ldl -lm -lnuma -Wl,--export-dynamic -lrt -lnsl -lutil -lm -ldl"

- Modules
NWCHEM_MODULES all

- Additional environmental variables
LARGE_FILES TRUE
USE_NOFSCHECK TRUE
MRCC_THEORY TRUE

- Intel Xeon Phi
USE_OPENMP 1
USE_OFFLOAD 1

- Compilation
cd $NWCHEM_TOP/src
make nwchem_config
make FC=ifort CC=icc >& make.log


Here is script, which starts NWChem:
Quote:
#!/bin/bash
 #SBATCH -J water_test           # jobname
#SBATCH -A NAME # project name
#SBATCH -N 1 # number of nodes
#SBATCH -n 16 # number of tasks
#SBATCH -o %j.out # strout filename (%j is jobid)
#SBATCH -e %j.err # stderr filename (%j is jobid)
#SBATCH -t 12:00:00 # execute time
# mpirun -np
export OMP_NUM_THREADS=4
export MIC_USE_2MB_BUFFER=16K
export ARMCI_OPENIB_DEVICE=mlx4_0
module load intel bullxmpi srun

mpirun -np 16 /csc/scratch2/kotomin/nwchem nwchem.nw

• MPI - bullxmpi/1.2.8.2

Thanks in advance!

Forum Vet
Could you try to to run with the following env. variable (it will disable Xeon Phi offloading)

NWC_RANKS_PER_DEVICE=0

PS Firefox 35 refuses to access the dropbox URL you posted. Could you post the full output here?

Clicked A Few Times
It works! Thank you!

Now i have only a warning in output:
Quote:output
1: WARNING:armci_set_mem_offset: offset changed -837935153152 to -837930958848

but it does not stop calculation. Can it cause any problems in future or it is ok?

Forum Vet
This might cause problems.
Please define
ARMCI_DEFAULT_SHMMAX=8192

Quote:AlexanderLV Jan 26th 12:40 am
It works! Thank you!

Now i have only a warning in output:
Quote:output
1: WARNING:armci_set_mem_offset: offset changed -837935153152 to -837930958848

but it does not stop calculation. Can it cause any problems in future or it is ok?


Forum >> NWChem's corner >> Running NWChem