ARMCI DASSERT fail


Click here for full thread
Just Got Here
Hi all,

I'm trying to run a CCSD(T) simulation but I keep running in the following problem:

(rank:0 hostname:an-24 pid:71069):ARMCI DASSERT fail. ../../ga-5.6.3/armci/src/devices/openib/openib.c:armci_call_data_server():2209 cond:(pdscr->status==IBV_WC_SUCCESS)

I'm using NWChem-6.8 (https://github.com/nwchemgit/nwchem/archive/v6.8-release.tar.gz).

Here is my job script:
#!/bin/bash
#SBATCH -N 8
#SBATCH --mem 400000
#SBATCH --ntasks-per-node=40
#SBATCH -t 0-8:00 # time (D-HH:MM)

module load OpenMPI
export NWCHEM_TOP=${HOME}/nwchem-6.8-release
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=OPENIB
export ARMCI_DEFAULT_SHMMAX_UBOUND=368640
export ARMCI_DEFAULT_SHMMAX=368640
export USE_MPI=y
export NWCHEM_MODULES=all
export USE_MPIF=y
export USE_MPIF4=y

export USE_INTERNALBLAS=y
export SCALAPACK_SIZE=8
export BLAS_SIZE=8
export LAPACK_SIZE=8

export PYTHONHOME=/usr
export PYTHONVERSION=2.7
export USE_PYTHONCONFIG=Y
export USE_CPPRESERVE=y
export USE_NOFSCHECK=y

export TCE_CUDA=y
export CUDA_INCLUDE="-I${CUDA_ROOT}/include"
export CUDA_LIBS="${CUDA_ROOT}/lib64/libcublas.so ${CUDA_ROOT}/lib64/libcudart.so"
export GPU_ARCH="sm_70"

export IPCCSD=y
export EACCSD=y
export MRCC_METHODS=y

export USE_OPENMP=y

unset MPI_INCLUDE
unset MPI_LIB
unset LIBMPI
unset GA_DEV

export OMP_NUM_THREADS=1
export KMP_AFFINITY=scatter

mpirun -n 320 -npernode 40 ${NWCHEM_TOP}/bin/LINUX64/nwchem ${NWCHEM_TOP}/run/input.nw


As well as the (partial) output:
argument  1 = /nwchem-6.8-release/run/input.nw
NWChem w/ OpenMP: maximum threads =  1



============================== echo of input deck ==============================
title "uracil-6-31-Gs"
echo
start uracil-6-31-Gs

memory stack 2500 mb heap 300 mb global 5000 mb noverify

basis cartesian
* library 6-31G*
end

scf
thresh 1.0e-10
tol2e 1.0e-10
singlet
rhf
end

tce
freeze atomic
ccsd(t)
tilesize 24
2eorb
2emet 13
attilesize 40
thresh 1.0d-1
cuda 6
end

task tce energy
================================================================================
           Job information
           ---------------

    hostname        = an-24
    program         = /nwchem-6.8-release/bin/LINUX64/nwchem
    date            = Tue Jun 26 10:48:33 2018

    compiled        = Mon_Jun_25_21:44:59_2018
    source          = /nwchem-6.8-release
    nwchem branch   = 6.8
    nwchem revision = N/A
    ga revision     = ga-5.6.3
    use scalapack   = F
    input           = /nwchem-6.8-release/run/input.nw
    prefix          = uracil-6-31-Gs.
    data base       = ./uracil-6-31-Gs.db
    status          = startup
    nproc           =      320
    time left       =     -1s



           Memory information
           ------------------

    heap     =   39321596 doubles =    300.0 Mbytes
    stack    =  327680001 doubles =   2500.0 Mbytes
    global   =  655360000 doubles =   5000.0 Mbytes (distinct from heap & stack)
    total    = 1022361597 doubles =   7800.0 Mbytes
    verify   = no 
    hardfail = no 


           Directory information
           ---------------------

  0 permanent = .
  0 scratch   = .




                                NWChem Input Module
                                -------------------


                                  uracil-6-31-Gs
                                  --------------

 Scaling coordinates for geometry "geometry" by  1.889725989
 (inverse scale =  0.529177249)

 Turning off AUTOSYM since
 SYMMETRY directive was detected!


          ------
          auto-z
          ------
  autoz: The atoms group into disjoint clusters
 cluster   1:    1    2    3    4    5    6    7    8    9   10   11
                12
 cluster   2:   13   14   15   16   17   18   19   20   21   22   23
                24
 cluster   3:   25   26   27   28   29   30   31   32   33   34   35
                36
     1 autoz failed with cvr_scaling = 1.2 changing to 1.3
  autoz: The atoms group into disjoint clusters
 cluster   1:    1    2    3    4    5    6    7    8    9   10   11
                12
 cluster   2:   13   14   15   16   17   18   19   20   21   22   23
                24
 cluster   3:   25   26   27   28   29   30   31   32   33   34   35
                36
     2 autoz failed with cvr_scaling = 1.3 changing to 1.4
  autoz: The atoms group into disjoint clusters
 cluster   1:    1    2    3    4    5    6    7    8    9   10   11
                12
 cluster   2:   13   14   15   16   17   18   19   20   21   22   23
                24
 cluster   3:   25   26   27   28   29   30   31   32   33   34   35
                36
     3 autoz failed with cvr_scaling = 1.4 changing to 1.5
  autoz: The atoms group into disjoint clusters
 cluster   1:    1    2    3    4    5    6    7    8    9   10   11
                12
 cluster   2:   13   14   15   16   17   18   19   20   21   22   23
                24
 cluster   3:   25   26   27   28   29   30   31   32   33   34   35
                36
     4 autoz failed with cvr_scaling = 1.5 changing to 1.6
  autoz: The atoms group into disjoint clusters
 cluster   1:    1    2    3    4    5    6    7    8    9   10   11
                12
 cluster   2:   13   14   15   16   17   18   19   20   21   22   23
                24
 cluster   3:   25   26   27   28   29   30   31   32   33   34   35
                36
     5 autoz failed with cvr_scaling = 1.6 changing to 1.7
      warning. autoz generated    7 bonds for atom    1
      warning. autoz generated    7 bonds for atom   13
      warning. autoz generated    7 bonds for atom   25
  autoz: The atoms group into disjoint clusters
 cluster   1:    1    2    3    4    5    6    7    8    9   10   11
                12
 cluster   2:   13   14   15   16   17   18   19   20   21   22   23
                24
 cluster   3:   25   26   27   28   29   30   31   32   33   34   35
                36

 AUTOZ failed to generate good internal coordinates.
 Cartesian coordinates will be used in optimizations.


....
            General Information
            -------------------
      Number of processors :   320
         Wavefunction type : Restricted Hartree-Fock
          No. of electrons :   174
           Alpha electrons :    87
            Beta electrons :    87
           No. of orbitals :   768
            Alpha orbitals :   384
             Beta orbitals :   384
        Alpha frozen cores :    24
         Beta frozen cores :    24
     Alpha frozen virtuals :     0
      Beta frozen virtuals :     0
         Spin multiplicity : singlet 
    Number of AO functions :   384
       Number of AO shells :   168
        Use of symmetry is : off
      Symmetry adaption is : off
         Schwarz screening : 0.10D-09

          Correlation Information
          -----------------------
          Calculation type : Coupled-cluster singles & doubles w/ perturbation           
   Perturbative correction : (T)                                                         
            Max iterations :      100
        Residual threshold : 0.10D+00
     T(0) DIIS level shift : 0.00D+00
     L(0) DIIS level shift : 0.00D+00
     T(1) DIIS level shift : 0.00D+00
     L(1) DIIS level shift : 0.00D+00
     T(R) DIIS level shift : 0.00D+00
     T(I) DIIS level shift : 0.00D+00
   CC-T/L Amplitude update :  5-th order DIIS
                I/O scheme : Global Array Library
        L-threshold :  0.10D+00
        EOM-threshold :  0.10D+00
 no EOMCCSD initial starts read in
 TCE RESTART OPTIONS
 READ_INT:   F
 WRITE_INT:  F
 READ_TA:    F
 WRITE_TA:   F
 READ_XA:    F
 WRITE_XA:   F
 READ_IN3:   F
 WRITE_IN3:  F
 SLICE:      F
 D4D5:       F

            Memory Information
            ------------------
          Available GA space size is    ********** doubles
          Available MA space size is     366945284 doubles

 Maximum block size supplied by input
 Maximum block size        24 doubles

 tile_dim =     23

 Block   Spin    Irrep     Size     Offset   Alpha
 -------------------------------------------------
   1    alpha     a     21 doubles       0       1
   2    alpha     a     21 doubles      21       2
   3    alpha     a     21 doubles      42       3
   4    beta      a     21 doubles      63       1
   5    beta      a     21 doubles      84       2
   6    beta      a     21 doubles     105       3
   7    alpha     a     22 doubles     126       7
   8    alpha     a     23 doubles     148       8
   9    alpha     a     23 doubles     171       9
  10    alpha     a     23 doubles     194      10
  11    alpha     a     23 doubles     217      11
  12    alpha     a     23 doubles     240      12
  13    alpha     a     22 doubles     263      13
  14    alpha     a     23 doubles     285      14
  15    alpha     a     23 doubles     308      15
  16    alpha     a     23 doubles     331      16
  17    alpha     a     23 doubles     354      17
  18    alpha     a     23 doubles     377      18
  19    alpha     a     23 doubles     400      19
  20    beta      a     22 doubles     423       7
  21    beta      a     23 doubles     445       8
  22    beta      a     23 doubles     468       9
  23    beta      a     23 doubles     491      10
  24    beta      a     23 doubles     514      11
  25    beta      a     23 doubles     537      12
  26    beta      a     22 doubles     560      13
  27    beta      a     23 doubles     582      14
  28    beta      a     23 doubles     605      15
  29    beta      a     23 doubles     628      16
  30    beta      a     23 doubles     651      17
  31    beta      a     23 doubles     674      18
  32    beta      a     23 doubles     697      19

 Global array virtual files algorithm will be used

 Parallel file system coherency ......... OK

 Integral file          = ./uracil-6-31-Gs.aoints.000
 Record size in doubles =    65536    No. of integs per rec  =    32766
 Max. records in memory =      263    Max. records in file   =  7474008
 No. of bits per label  =       16    No. of bits per value  =       64


 #quartets = 9.792D+06 #integrals = 1.193D+08 #direct =  0.0% #cached =100.0%


File balance: exchanges=   443  moved=   598  time=   0.1


 Fock matrix recomputed
 1-e file size   =           129600
 1-e file name   = ./uracil-6-31-Gs.f1
 Cpu & wall time / sec            0.8            1.0
 4-electron integrals stored in orbital form

 v2    file size   =       2387981089
 4-index algorithm nr.  13 is used
 imaxsize =       40
 imaxsize ichop =        0
0: error ival=4
(rank:0 hostname:an-24 pid:71069):ARMCI DASSERT fail. ../../ga-5.6.3/armci/src/devices/openib/openib.c:armci_call_data_server():2209 cond:(pdscr->status==IBV_WC_SUCCESS)
80: error ival=10
(rank:80 hostname:an-26 pid:11116):ARMCI DASSERT fail. ../../ga-5.6.3/armci/src/devices/openib/openib.c:armci_call_data_server():2209 cond:(pdscr->status==IBV_WC_SUCCESS)
40: error ival=4
(rank:40 hostname:an-25 pid:67763):ARMCI DASSERT fail. ../../ga-5.6.3/armci/src/devices/openib/openib.c:armci_call_data_server():2209 cond:(pdscr->status==IBV_WC_SUCCESS)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 80 in communicator MPI COMMUNICATOR 4 DUP FROM 0 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------


Does someone encounter something similar or knows how to fix this?
Any help is greatly appreciated.

Pav