NWChem6.8.1 MA Error


Click here for full thread
Clicked A Few Times
Dear NWChem Developers,

I recently compiled NWChem6.8.1 on our cluster. First I run some small jobs and they all went fine. However, a weird error message showed up when running a relatively large job. The input of the job is:

Quote:username
echo
  1. memory global 3200 mb stack 2000 mb heap 800 mb
memory 6000 mb
start c4h2o

title "Title"
charge 0

geometry units angstroms print xyz noautosym noautoz
H -0.023716 0.095026 0.015095
C -0.024184 0.026600 1.093963
H 0.910417 0.094036 1.633048
C -1.156356 -0.117410 1.747548
C -2.253150 -0.315204 2.380689
C -3.334422 0.019761 3.005146
O -4.337690 0.197190 3.584511
end

basis
 * library cc-pvdz
end

tce
 freeze core
end

task tce optimize


The last a few lines of the output reads:

Quote:username
tce_mo2e: fast2e=1
2-e integrals stored in memory

MA_verify_allocator_stuff: starting scan ...
stack block 'pqboff', handle 21, address 0xccc4d0e8:
       current checksum 3435451402 != stored checksum 0
MA_verify_allocator_stuff: scan completed
------------------------------------------------------------------------
tce_mo2e: MA problem 5
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
27: task tce optimize
------------------------------------------------------------------------[/quote]

The error message from SLURM reads:
Quote:username
MA error: MA_verify_allocator_stuff:
                               heap    stack
---- -----
checksum errors 0 1
left signature errors 0 0
right signature errors 0 0
total bad blocks 0 1
total blocks 17 27

0:tce_mo2e: MA problem:Received an Error in Communication
application called MPI_Abort(comm=0x84000000, 5) - process 0
In: PMI_Abort(5, application called MPI_Abort(comm=0x84000000, 5) - process 0)
srun: Job step aborted: Waiting up to 122 seconds for job step to finish.
slurmstepd: error: *** STEP 103469.0 ON node-0063 CANCELLED AT 2019-06-20T10:38:06 ***
srun: error: node-0065: task 2: Killed
srun: Terminating job step 103469.0
srun: error: node-0064: task 1: Killed
srun: error: node-0066: task 3: Killed
srun: error: node-0063: task 0: Exited with exit code 5


There seems to be a problem when NWChem is trying to allocate memories for calculations, even though it is not a typical "not enough memory" error. Would you please shed light on this problem?

Thank you!