2:56:30 PM PDT - Mon, Jun 25th 2012 |
|
What are the error or warning messages in the error file (not the output file)? Any armci_set_mem_offset messages?
One thing I set is "setenv ARMCI_DEFAULT_SHMMAX 4096"
I was also running with 4 cores/node on our 8 core/node system (i.e. half filled).
Looks like you are running 26 or 24 processes respectively? I generally run "mpirun -np 32". We're running HPMPI, not openMPI, and we definitely are not runnign with treads.
Bert
Quote:KarlB Jun 25th 7:16 pmHey Bert,
Thank you for your time and trying to help me solve this problem.
I've tried it twice in what I think to be idealized and improved idealized situations with your numbers and it still seems to not be working for me. How did you arrive at those numbers and what might be different with my system than yours that's causing the problem?
Let me know if you need the full files for any reason or more info of some sort.
(Job file1)
#$ -N Title
#$ -pe threads 26
#$ -l mem=3500M
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd
force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/Job file1)
(input1)
start
echo
title "macrovib"
scratch_dir /somedirectories/.scratch
memory heap 100 mb stack 1000 mb global 2400 mb
geometry noautoz
Tons and Tons of Geo
end
charge 2
basis
* library 6-31g**
end
dft
xc b3lyp
iterations 1000
noio
direct
grid nodisk
end
driver
maxiter 1000
end
task dft freq
(/input1)
(result1)
...
2 2 0 0 -215.909422 -13516.764569 -13516.764569 26817.619717
2 1 1 0 -7.361729 9.715923 9.715923 -26.793575
2 1 0 1 0.102928 -0.219199 -0.219199 0.541326
2 0 2 0 -190.652630 -18119.815610 -18119.815610 36048.978590
2 0 1 1 0.084968 -0.339414 -0.339414 0.763796
2 0 0 2 -261.530067 -3482.457806 -3482.457806 6703.385545
0:0:ndai_get failed:: -1977
(rank:0 hostname:chi16 pid:23323):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result1)
************************************************************************
****************************** Run 2 ************************************
************************************************************************
(Job file2)
#$ -N macfe_ded
#$ -pe threads 24
#$ -l mem=4G
#$ -q medium_chi
#$ -l proc_vendor=AMD
#$ -cwd
force coredumpsize 1
module load nwchem/6.1
/data/apps/openmpi/1.4.3-gcc/bin/mpirun nwchem macrofe_vibded.nw > macrofe_vibded.out
(/job file2)
(input2)
Identical save...
memory heap 117 mb stack 1171 mb global 2808 mb
(/input2)
(result2)
...
2 2 0 0 -215.909422 -13516.764569 -13516.764569 26817.619717
2 1 1 0 -7.361729 9.715923 9.715923 -26.793575
2 1 0 1 0.102928 -0.219199 -0.219199 0.541326
2 0 2 0 -190.652630 -18119.815610 -18119.815610 36048.978590
2 0 1 1 0.084968 -0.339414 -0.339414 0.763796
2 0 0 2 -261.530067 -3482.457806 -3482.457806 6703.385545
0:0:ndai_get failed:: -1977
(rank:0 hostname:chi14 pid:14312):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
(/result2)
|