CreateSharedRegion: kr malloc Numerical result out of range


Just Got Here
Someone in the lab that I work in has been trying to run some calculations with NWChem and I've been trying to help him get started running it with MPI. It has been a bumpy ride, however. At first, we were getting problems about not being able to allocate a shared block of memory. SHMMAX was already plenty high (as large as all of the physical memory), so I created a swap file, started calculations again, and waited.

Now they are crashing again, but this time with a much different error message. From what I can tell, it is another problem with allocating shared memory, but it seems like NWChem is passing an invalid (i.e., negative) number to the allocation function.

Here is the relevant error message:
 0:CreateSharedRegion:kr_malloc failed KB=: -772361
(rank:0 hostname:vivaldi.chem.utk.edu pid:3128):ARMCI DASSERT fail. ../../ga-5-1/armci/src/memory/shmem.c:Create_Shared_Region():1188 cond:0
Last System Error Message from Task 0:: Numerical result out of range
application called MPI_Abort(comm=0x84000007, -772361) - process 0
rank 0 in job 2 vivaldi.chem.utk.edu_35229 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

And just in case, here is a link to the full output file:
http://web.eecs.utk.edu/~dbauer3/nwchem/macrofe_full631f.out

Running under Ubuntu 11.04 with MPICH2.

Forum Vet
For one, you are requesting 22 Gbyte per processor. Assuming you are running on 8 cores per node, you are asking (potentially) for 176 Gbyte of memory for the calculation.

Remember, the memory keyword is per processor or process, not for the whole calculation! You should keep your memory allocation per process below the available memory / # processors per node.

Given the size of the calculation I don't believe you need that much memory.

Bert



Quote:DBauer May 21st 2:57 pm
Someone in the lab that I work in has been trying to run some calculations with NWChem and I've been trying to help him get started running it with MPI. It has been a bumpy ride, however. At first, we were getting problems about not being able to allocate a shared block of memory. SHMMAX was already plenty high (as large as all of the physical memory), so I created a swap file, started calculations again, and waited.

Now they are crashing again, but this time with a much different error message. From what I can tell, it is another problem with allocating shared memory, but it seems like NWChem is passing an invalid (i.e., negative) number to the allocation function.

Here is the relevant error message:
 0:CreateSharedRegion:kr_malloc failed KB=: -772361
(rank:0 hostname:vivaldi.chem.utk.edu pid:3128):ARMCI DASSERT fail. ../../ga-5-1/armci/src/memory/shmem.c:Create_Shared_Region():1188 cond:0
Last System Error Message from Task 0:: Numerical result out of range
application called MPI_Abort(comm=0x84000007, -772361) - process 0
rank 0 in job 2 vivaldi.chem.utk.edu_35229 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

And just in case, here is a link to the full output file:
http://web.eecs.utk.edu/~dbauer3/nwchem/macrofe_full631f.out

Running under Ubuntu 11.04 with MPICH2.

Just Got Here
Okay, that sounds like a problem with not reading the documentation quite close enough. In any case, I'll work with the chemist running the code to get more reasonable values and try running again. The system should have enough RAM + swap to get the job done.

Thanks for the quick answer.

EDIT:
I assume that if the calculation requires more memory than we allow it per process then it'll swap out to the hard drive or something to free up space.

Clicked A Few Times
Continuing trouble with calculation.
Hello all,

I’m working with DBauer on the calculation he mentioned previously in this thread. We’ve also been working on a much larger cluster computer that has 96 gigs of ram per node in an effort to complete this calculation.

Yesterday I did a run with 16 cores with the full 96 gigs of ram split between them (6 gigs per core). In nwchem I used the directive “memory total 6144 mb” to assign the memory to be used. This calculation failed at the same point all the previous have with the error: Error Run 1

I then ran the same calculation in the same manner this time removing the memory directive and allowing nwchem to assign the memory itself. This calculation also failed with the error: Error Run 2

The cluster that I was running on has a monitoring system that allows for reviewing of node performance. I went and reviewed the system performance and found that in both calculations NWChem never used more than ~4 gigs of ram for these calculations which I find puzzling. The 4 gig threshold makes me suspicious of a 32 bit limit somewhere.

Link to Images: Images
Link to Input 1: Input run 1
Link to Input 2: Input run 2
Outputs can be provided if more clarification is needed.

Notes:
1st calculation ran from ~1pm-8pm
2nd calculation ran from ~10:30pm-5:30am
The calculation running before till ~1pm was a Raman calculation I did with 48 cores and “memory 1800 mb”

Forum Vet
The default memory settings for NWChem are pretty small, 400mb I think (unless it was changed at compile time). So, simply without the memory keyword the calculation does not have enough memory to proceed.

Note, you may want to check how much memory the OS is taking. Fully loading the memory with NWChem will create problems. Swapping memory is not going to work well or at all.

Let's take the 48 core 1800mb case. This means that each processor is going to allocate 450 mb of local heap, 450 mb of local stack (these two are not the problem), and 900 mb of global shared memory. Now, on a single node this is allocated in shared memory segments. 900 mb * 48 cores means the code with try to potentially allocate a single (over) 43 GByte segment of memory.

Let me try and run the input. I'll try 16 cores with 3500 mb.

Bert


[QUOTE=KarlB May 24th 4:00 pm]Hello all,

I’m working with DBauer on the calculation he mentioned previously in this thread. We’ve also been working on a much larger cluster computer that has 96 gigs of ram per node in an effort to complete this calculation.

Yesterday I did a run with 16 cores with the full 96 gigs of ram split between them (6 gigs per core). In nwchem I used the directive “memory total 6144 mb” to assign the memory to be used. This calculation failed at the same point all the previous have with the error: Error Run 1

I then ran the same calculation in the same manner this time removing the memory directive and allowing nwchem to assign the memory itself. This calculation also failed with the error: Error Run 2

The cluster that I was running on has a monitoring system that allows for reviewing of node performance. I went and reviewed the system performance and found that in both calculations NWChem never used more than ~4 gigs of ram for these calculations which I find puzzling. The 4 gig threshold makes me suspicious of a 32 bit limit somewhere.

Link to Images: Images
Link to Input 1: Input run 1
Link to Input 2: Input run 2
Outputs can be provided if more clarification is needed.

Notes:
1st calculation ran from ~1pm-8pm
2nd calculation ran from ~10:30pm-5:30am
The calculation running before till ~1pm was a Raman calculation I did with 48 cores and “memory 1800 mb”

Clicked A Few Times
Any luck Bert?

Karl B.

Forum Vet
Still working on this case, running it among the other calculations in the long queues on our system.

It's not a small calculation. I generally would try and run this on 128 cores or so.

Bert


Quote:KarlB Jun 4th 6:03 pm
Any luck Bert?

Karl B.

Forum Vet
I was able to run this calculation on 32 processors with the memory keyword set as follows:


memory heap 100 mb stack 1000 mb global 2400 mb


Bert



Quote:Bert Jun 4th 6:43 pm
Still working on this case, running it among the other calculations in the long queues on our system.

It's not a small calculation. I generally would try and run this on 128 cores or so.

Bert


Quote:KarlB Jun 4th 6:03 pm
Any luck Bert?

Karl B.

Clicked A Few Times
Hey Bert,

Thank you for your time and trying to help me solve this problem.

I've tried it twice in what I think to be idealized and improved idealized situations with your numbers and it still seems to not be working for me. How did you arrive at those numbers and what might be different with my system than yours that's causing the problem?

Let me know if you need the full files for any reason or more info of some sort.

(Job file1)
#$ -N Title
#$ -pe threads 26
#$ -l mem=3500M
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd

force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/Job file1)

(input1)
start
echo
title "macrovib"
scratch_dir /somedirectories/.scratch
memory heap 100 mb stack 1000 mb global 2400 mb

geometry noautoz
Tons and Tons of Geo
end

charge 2

basis
 * library 6-31g**
end

dft
 xc b3lyp
 iterations 1000
 noio
 direct
 grid nodisk
end

driver
 maxiter 1000
end

task dft freq
(/input1)

(result1)
...
     2   2 0 0   -215.909422 -13516.764569 -13516.764569  26817.619717
     2   1 1 0     -7.361729      9.715923      9.715923    -26.793575
     2   1 0 1      0.102928     -0.219199     -0.219199      0.541326
     2   0 2 0   -190.652630 -18119.815610 -18119.815610  36048.978590
     2   0 1 1      0.084968     -0.339414     -0.339414      0.763796
     2   0 0 2   -261.530067  -3482.457806  -3482.457806   6703.385545

0:0:ndai_get failed:: -1977
(rank:0 hostname:chi16 pid:23323):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result1)

************************************************************************
****************************** Run 2 ************************************
************************************************************************

(Job file2)
#$ -N macfe_ded
#$ -pe threads 24
#$ -l mem=4G
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd

force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/job file2)

(input2)
Identical save...
memory heap 117 mb stack 1171 mb global 2808 mb
(/input2)

(result2)
...
     2   2 0 0   -215.909422 -13516.764569 -13516.764569  26817.619717
     2   1 1 0     -7.361729      9.715923      9.715923    -26.793575
     2   1 0 1      0.102928     -0.219199     -0.219199      0.541326
     2   0 2 0   -190.652630 -18119.815610 -18119.815610  36048.978590
     2   0 1 1      0.084968     -0.339414     -0.339414      0.763796
     2   0 0 2   -261.530067  -3482.457806  -3482.457806   6703.385545

0:0:ndai_get failed:: -1977
(rank:0 hostname:chi14 pid:14312):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result2)

Forum Vet
What are the error or warning messages in the error file (not the output file)? Any armci_set_mem_offset messages?

One thing I set is "setenv ARMCI_DEFAULT_SHMMAX 4096"

I was also running with 4 cores/node on our 8 core/node system (i.e. half filled).

Looks like you are running 26 or 24 processes respectively? I generally run "mpirun -np 32". We're running HPMPI, not openMPI, and we definitely are not runnign with treads.

Bert


Quote:KarlB Jun 25th 7:16 pm
Hey Bert,

Thank you for your time and trying to help me solve this problem.

I've tried it twice in what I think to be idealized and improved idealized situations with your numbers and it still seems to not be working for me. How did you arrive at those numbers and what might be different with my system than yours that's causing the problem?

Let me know if you need the full files for any reason or more info of some sort.

(Job file1)
#$ -N Title
#$ -pe threads 26
#$ -l mem=3500M
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd

force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/Job file1)

(input1)
start
echo
title "macrovib"
scratch_dir /somedirectories/.scratch
memory heap 100 mb stack 1000 mb global 2400 mb

geometry noautoz
Tons and Tons of Geo
end

charge 2

basis
 * library 6-31g**
end

dft
 xc b3lyp
 iterations 1000
 noio
 direct
 grid nodisk
end

driver
 maxiter 1000
end

task dft freq
(/input1)

(result1)
...
     2   2 0 0   -215.909422 -13516.764569 -13516.764569  26817.619717
     2   1 1 0     -7.361729      9.715923      9.715923    -26.793575
     2   1 0 1      0.102928     -0.219199     -0.219199      0.541326
     2   0 2 0   -190.652630 -18119.815610 -18119.815610  36048.978590
     2   0 1 1      0.084968     -0.339414     -0.339414      0.763796
     2   0 0 2   -261.530067  -3482.457806  -3482.457806   6703.385545

0:0:ndai_get failed:: -1977
(rank:0 hostname:chi16 pid:23323):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result1)

************************************************************************
****************************** Run 2 ************************************
************************************************************************

(Job file2)
#$ -N macfe_ded
#$ -pe threads 24
#$ -l mem=4G
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd

force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/job file2)

(input2)
Identical save...
memory heap 117 mb stack 1171 mb global 2808 mb
(/input2)

(result2)
...
     2   2 0 0   -215.909422 -13516.764569 -13516.764569  26817.619717
     2   1 1 0     -7.361729      9.715923      9.715923    -26.793575
     2   1 0 1      0.102928     -0.219199     -0.219199      0.541326
     2   0 2 0   -190.652630 -18119.815610 -18119.815610  36048.978590
     2   0 1 1      0.084968     -0.339414     -0.339414      0.763796
     2   0 0 2   -261.530067  -3482.457806  -3482.457806   6703.385545

0:0:ndai_get failed:: -1977
(rank:0 hostname:chi14 pid:14312):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result2)

Clicked A Few Times
Will look into the setenv ARMCI_DEFAULT_SHMMAX 4096 in the morning...

The cluster I'm currently running on is a 48 core node with ~94 gigs of ram per node and running on a single node @24 cores gives me half a node as well (with full ram usage).

It seems the error codes do indeed have something about armci memory usage...

(Run1 Error)
force: Command not found.
2: WARNING:armci_set_mem_offset: offset changed -204456148992 to -204454051840
Last System Error Message from Task 0:: Inappropriate ioctl for device
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode -1977.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 23323 on
node chi16 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
(/run1 error)

***************************
***********Run2***********
***************************


(run2 error)
force: Command not found.
23: WARNING:armci_set_mem_offset: offset changed 440439992320 to 440442089472
Last System Error Message from Task 0:: Inappropriate ioctl for device
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode -1977.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 14312 on
node chi14 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
(/run2 error)


Thank you for your assistance!
Karl

Clicked A Few Times
Hey Bert,

Please forgive my ignorance here, but where/how am I setting ARMCI_DEFAULT_SHMMAX? In the kernel directory like shmall/shmmax/shmmni?

Karl

Forum Vet
As an environment variable. Has nothing to do with the kernel, but rather with GA and NWChem.

Bert

Quote:KarlB Jun 26th 9:14 pm
Hey Bert,

 Please forgive my ignorance here, but where/how am I setting ARMCI_DEFAULT_SHMMAX? In the kernel directory like shmall/shmmax/shmmni?

Karl

Forum Vet
Karl,
You need to take care of both.
ARMCI_DEFAULT_SHMMAX has to greater or equal than kernel.shmmax.
For example, if the value of kernel.shmmax is 4294967296 as in the example below,
ARMCI_DEFAULT_SHMMAX can be at most 4096 (4294967296=4096*1024*1024)

$ sysctl kernel.shmmax
kernel.shmmax = 4294967296

Cheers, Edo

Quote:KarlB Jun 26th 1:14 pm
Hey Bert,

Please forgive my ignorance here, but where/how am I setting ARMCI_DEFAULT_SHMMAX? In the kernel directory like shmall/shmmax/shmmni?

Karl


Forum >> NWChem's corner >> Running NWChem