memory problem in parallel running "ARMCI DASSERT fail"


Click here for full thread
Clicked A Few Times
Parallel efficiency of CC tasks
Hi Edoapra!

Thanks for your help. We have found the bottleneck and fixed it, and it works well now. This is caused by the default amount of memory infiniband can register, in default it is limited to 4G.

Besides, I performed some CC calculations, and I found that it is unsatisfactory about the paralleling efficiency. For the task as below, if I use 16 cores with 1 nodes, each iteration takes 759.6s of CPU time, but when I use 128 cores with 8 nodes, it increases to 1326s to complete one iteration. Is this normal?

Thanks!

Jun Chen
2012/11/15

the input file is:
start fh2

permanent_dir .
scratch_dir ./tmp

memory heap 500 mb stack 500 mb global 9000 mb

geometry units au
    H       -0.466571969    0.000000000   -3.498280516
    H        0.624505061    0.000000000   -2.532671944
    F       -0.008378972    0.000000000    0.319965748
#   symmetry c1
end

basis noprint
 * library aug-cc-pvqz
end

SCF
  semidirect
  DOUBLET
  RHF
  THRESH 1.0d-8
  TOL2E  1.0d-8
END

TCE
  SCF
  CCSDT
  THRESH 1.0d-5
  FREEZE atomic
  DIIS 5
END

TASK TCE ENERGY





Quote:Edoapra Nov 7th 2:40 pm
Psd,
Did you check if there are shared memory segments still allocated on the nodes of your cluster?
You can do it by running the command

ipcs -a

The scripts ipcreset can be used both to display and cleanup existing shared memory segments.
You can find it in

$NWCHEM_TOP/src/tools/ga-5-1/global/testing/ipcreset

Cheers, Edo