GA memory error, large (RI-)MP2 job


Just Got Here
Skipping to a particular point/concern: how strict is the ~2,800 basis fxn 'limit' alluded to in the manual? Second, if the RHF calculation converges and prints the vectors but breaks in the analysis of the solution (multipoles), can I still use the vector file to do the MP2 job or will it be incomplete and therefore fail to be read correctly? I'd have tried it myself already, but it's an expensive job and we've only a limited time on the cluster.

Any guidance would be appreciated.

Abbreviated input file. Direct algorithm is necessary. Total basis fxns: 3,488. We can do DFT (BLYP and B3LYP) on a cluster this size using Orca, just fiddling with an explicitly correlated method.

start job_name

charge -1

geometry
 *coordinates, 316 atoms*
end

basis spherical
  all_not_hydrogen library aug-cc-pvdz
  h library cc-pvdz
end

scf
  rhf
  thresh 1.0e-10
  direct
end

mp2
 tight
 freeze atomic
end

# We could just as well do direct_rimp2
task direct_mp2 energy



Abbreviated PBS submission script. Calling 10 nodes, 12 ppn, each node pulls 48 GB memory.

# PBS header stuff

# Common memory adjusting things, works well with everything I've ever done
ulimit -s unlimited
export ARMCI_DEFAULT_SHMMAX=4096
unset MA_USE_ARMCI_MEM


RHF converges successfully and crashes in multipole analysis, before beginning MP2 module.

*RHF data*
*Vectors*
*Mulliken analysis*

       Multipole analysis of the density wrt the origin
       ------------------------------------------------

     L   x y z        total         open         nuclear
     -   - - -        -----         ----         -------
     0   0 0 0     -1.000000      0.000000   1059.000000

     1   1 0 0    -30.877247      0.000000      0.000000
     1   0 1 0     13.037500      0.000000      0.000000
     1   0 0 1    -26.055544      0.000000      0.000000

     2   2 0 0   -432.118567      0.000000  68489.384441
     2   1 1 0     92.050461      0.000000    104.555336
     2   1 0 1    -13.323659      0.000000    -22.521644
     2   0 2 0   -718.291153      0.000000  71458.928432
     2   0 1 1     19.661159      0.000000     18.213106
     2   0 0 2   -574.530181      0.000000  56599.014560

(rank:72 hostname:n0477.ten.osc.edu pid:25477):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:36 hostname:n0480.ten.osc.edu pid:24452):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:60 hostname:n0478.ten.osc.edu pid:302):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:0 hostname:n0491.ten.osc.edu pid:30001):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:108 hostname:n0364.ten.osc.edu pid:22340):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:84 hostname:n0376.ten.osc.edu pid:4740):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:12 hostname:n0489.ten.osc.edu pid:2876):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:48 hostname:n0479.ten.osc.edu pid:10756):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:96 hostname:n0374.ten.osc.edu pid:9019):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))
(rank:24 hostname:n0482.ten.osc.edu pid:4205):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))

Forum Vet
Memory settings
Yes, you should be able to use the vectors, they have been written before the analysis is done.

Your input deck does not show a memory allocation keyword, which means it will be using the default. At the beginning of the output the memory that is going to be used is printed. What are the values shown there?

You should be able to use about 3.5 Gbyte per processor.

Bert

Just Got Here
Thanks for the information about the vector file. Here's the info you asked about.

           Memory information
           ------------------

    heap     =  125829121 doubles =    960.0 Mbytes
    stack    =  125829121 doubles =    960.0 Mbytes
    global   =  251658240 doubles =   1920.0 Mbytes (distinct from heap & stack)
    total    =  503316482 doubles =   3840.0 Mbytes
    verify   = yes
    hardfail = no

Forum Vet
You're definitely pushing the limits on memory allocation.

I would suggest you try using:

  memory heap 100 mb stack 1000 mb global 2000 mb

You may want to try at the same time to:

  export ARMCI_DEFAULT_SHMMAX=8192

If it continues to fail, what error messages do you see in the output file?

Bert


Forum >> NWChem's corner >> Running NWChem