CreateSharedRegion: kr malloc Numerical result out of range


Click here for full thread
Forum Vet
What are the error or warning messages in the error file (not the output file)? Any armci_set_mem_offset messages?

One thing I set is "setenv ARMCI_DEFAULT_SHMMAX 4096"

I was also running with 4 cores/node on our 8 core/node system (i.e. half filled).

Looks like you are running 26 or 24 processes respectively? I generally run "mpirun -np 32". We're running HPMPI, not openMPI, and we definitely are not runnign with treads.

Bert


Quote:KarlB Jun 25th 7:16 pm
Hey Bert,

Thank you for your time and trying to help me solve this problem.

I've tried it twice in what I think to be idealized and improved idealized situations with your numbers and it still seems to not be working for me. How did you arrive at those numbers and what might be different with my system than yours that's causing the problem?

Let me know if you need the full files for any reason or more info of some sort.

(Job file1)
#$ -N Title
#$ -pe threads 26
#$ -l mem=3500M
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd

force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/Job file1)

(input1)
start
echo
title "macrovib"
scratch_dir /somedirectories/.scratch
memory heap 100 mb stack 1000 mb global 2400 mb

geometry noautoz
Tons and Tons of Geo
end

charge 2

basis
 * library 6-31g**
end

dft
 xc b3lyp
 iterations 1000
 noio
 direct
 grid nodisk
end

driver
 maxiter 1000
end

task dft freq
(/input1)

(result1)
...
     2   2 0 0   -215.909422 -13516.764569 -13516.764569  26817.619717
     2   1 1 0     -7.361729      9.715923      9.715923    -26.793575
     2   1 0 1      0.102928     -0.219199     -0.219199      0.541326
     2   0 2 0   -190.652630 -18119.815610 -18119.815610  36048.978590
     2   0 1 1      0.084968     -0.339414     -0.339414      0.763796
     2   0 0 2   -261.530067  -3482.457806  -3482.457806   6703.385545

0:0:ndai_get failed:: -1977
(rank:0 hostname:chi16 pid:23323):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result1)

************************************************************************
****************************** Run 2 ************************************
************************************************************************

(Job file2)
#$ -N macfe_ded
#$ -pe threads 24
#$ -l mem=4G
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd

force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/job file2)

(input2)
Identical save...
memory heap 117 mb stack 1171 mb global 2808 mb
(/input2)

(result2)
...
     2   2 0 0   -215.909422 -13516.764569 -13516.764569  26817.619717
     2   1 1 0     -7.361729      9.715923      9.715923    -26.793575
     2   1 0 1      0.102928     -0.219199     -0.219199      0.541326
     2   0 2 0   -190.652630 -18119.815610 -18119.815610  36048.978590
     2   0 1 1      0.084968     -0.339414     -0.339414      0.763796
     2   0 0 2   -261.530067  -3482.457806  -3482.457806   6703.385545

0:0:ndai_get failed:: -1977
(rank:0 hostname:chi14 pid:14312):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result2)