When memory is not enough


Gets Around
All TCE computing requires a huge amount of RAM.
There are an options 2eorb and 2emet which allow more economical use of memory, but 2EORB works only with CCSD/CCSD(T) RHF/ROHF references and 2EMET=3/idiskx=1 with CCSD/CCSD(T) UHF reference. In the case of MR-CCSD and CCSDT such method is not suitable.
I tried to use different I/O schemes (fortran, eaf) and found that in the case of CCSDT calculation everything works fine, but in the case of MR-CCSD using I/O schemes (fortran, eaf) leads to an error (I think when trying to use GA array).
Such an error occurs if you use (fortran, eaf) I/O schemes with CCSD.

Surprising is the fact that in CCSDT case everything works.

Thanks in advance if you could explain the reasons for this program's behavior.

Example with error in development snapshot February 24, 2015 Nwchem-dev.revision26871-src.2015-02-24:

title "N8-cubane MK-CCSD(2,2)/cc-pVDZ calculation"

scratch_dir /mnt/scratch

memory stack 100 mb heap 100 mb global 1000 mb

geometry
 symmetry C2v
 N                     1.05441608    -0.77535101    -0.01369174
 N                     1.05441608     0.77535101    -0.01369174
 N                     0.00000000    -0.98791056    -1.03990790
 N                     0.00000000     0.98791056    -1.03990790
 N                     0.00000000    -0.75710570     1.06729139
 N                     0.00000000     0.75710570     1.06729139
 N                    -1.05441608    -0.77535101    -0.01369174
 N                    -1.05441608     0.77535101    -0.01369174
end

scf
 direct
end

basis spherical
 N library cc-pVDZ
end

tce
 mkccsd
 2emet 1
 io fortran
 freeze atomic
end

mrccdata
 root 1
 nref 2
 22222222222222222222222222220
 22222222222222222222222222202
end

task tce energy



Error

MRCC tiling completed in             0.0            0.0
tce_ao1e_fock2e       13.13436       13.11165
F:     1 in bytes =                44864
tce_mo1e        0.01232        0.00842
eone,etwo,enrep,energy  -1350.825545840117    532.593254393754    383.331702590501   -434.900588855862
mrcc_uhf_energy        3.60661        3.55607
tce_ao1e_fock2e       13.84977       13.84197
F:     2 in bytes =                44864
tce_mo1e        0.01221        0.00837
eone,etwo,enrep,energy  -1350.544140578163    532.463964282863    383.331702590501   -434.748473704798
mrcc_uhf_energy        3.60278        3.55067
2-e(intermediate) /mnt/scratch/N8.v2i  in bytes=         2358272000
Ref.   1 Half 2-e         456.73         368.62
V 2-e /mnt/scratch/N8.v200 in bytes=          407298240
Ref.   1 2-e transform. completed
2-e(intermediate) /mnt/scratch/N8.v2i  in bytes=         2358272000
Ref.   2 Half 2-e         458.38         368.67
V 2-e /mnt/scratch/N8.v200 in bytes=          407298240
Ref.   2 2-e transform. completed

Integral replication completed.
Proceeding the allocation of the memory for intermediates.

T2:     1 in bytes =             10729792
T1:     1 in bytes =                 6976
T2:     2 in bytes =             10699136
T1:     2 in bytes =                 7040
Done.


Symmetry of references

Ref.   1 sym:a1  
Ref.   2 sym:a1  
MR MkCCSD, version 1.0

Heff
=============================================
    0    1 -818.23229145    0.05208125
    0    2    0.05208125 -818.08017630

Eigenvalues (real and imaginary)
=============================================
 -818.248414191633    0.00000000
 -818.064053550030    0.00000000

Left eigenvectors
=============================================
    1   -0.95527367   -0.29572320
    2    0.29572320   -0.95527367

Right eigenvectors
=============================================
VR    1   -0.95527367   -0.29572320
VR    2    0.29572320   -0.95527367
Target root:    1

MkCC iter. #   1      -818.2484141916329      -434.9167116011317      -818.2484141916329
0:ga_zero: ARRAY NOT ACTIVE:Received an Error in Communication

Gets Around
I think the reason is that in the modules scr/tce/ccsd; scr/tce/ccsd_t; scr/tce/mrcc ga_get subroutine http://hpc.pnl.gov/globalarrays/api/f_op_api.html#GET is used, on the other hand in the module scr/tce/ccsdt this subroutine is not used.

I just found commit, that implement T1/X1 LOCALIZATION
https://github.com/jeffhammond/nwchem/commit/6b8b2c1b6bac0cab33e2fb3d58cc4a7859b64222

But cancel of this commit does not lead to the correct result.

Gets Around
Related thread
http://nwchemgit.github.io/Special_AWCforum/st/id1360/Floating_Point_Exception_usi...

Gets Around
https://github.com/Konjkov/nwchem/commit/8ea6fc88b79c86049dcede2b2cc1bce2e61668ae - create a patch, MK-mrccsd/ROHF works with eaf I/O scheme.


Forum >> NWChem's corner >> Running NWChem