"ccsd t: MA error sgl"


Clicked A Few Times
Hi,

I run a (H2O)6 CCSD(T)/aug-cc-pvtz (using TCE) benchmark job on a small computer cluster with 17 nodes (24 cores and 128G Ram per node) and 408 cores in total.

Here is how I set the TCE in the input:



tce
ccsd(t)
2eorb
2emet 13
attilesize 40
tilesize 30
freeze atomic
thresh 1e-5
end

set tce:nts T

task tce energy



It successfully finish the CCSD part, but fail on immediately when it turn to plain CCSD(T) code

---------------------------------------------------------
Iter Residuum Correlation Cpu Wall
---------------------------------------------------------
NEW TASK SCHEDULING
CCSD_T1_NTS --- OK
CCSD_T2_NTS --- OK
1 0.3317148058827 -1.6443784745941 440.5 450.0
2 0.0738824894128 -1.6391115361329 413.8 423.1
3 0.0240445728371 -1.6633600842093 410.3 419.3
4 0.0096959381239 -1.6648269289426 411.1 419.9
5 0.0038460496840 -1.6665908531599 412.4 421.5
MICROCYCLE DIIS UPDATE: 5 5
6 0.0011462209806 -1.6671866836842 414.1 422.9
7 0.0004484560791 -1.6672193462235 409.4 418.4
8 0.0001857195246 -1.6672278256195 415.1 424.0
9 0.0001070545228 -1.6672297987410 414.5 424.2
10 0.0000508271267 -1.6672351367696 415.6 424.8
MICROCYCLE DIIS UPDATE: 10 5
11 0.0000182389060 -1.6672366743640 409.9 419.0
12 0.0000085843635 -1.6672383748573 416.6 425.8
-----------------------------------------------------------------
Iterations converged
CCSD correlation energy / hartree = -1.667238374857258
CCSD total energy / hartree = -458.072807714594376

Singles contributions

Doubles contributions
CCSD(T)
Using plain CCSD(T) code
ccsd_t: MA error sgl 337153536
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
0:
------------------------------------------------------------------------
------------------------------------------------------------------------

What does the error message "ccsd_t: MA error sgl" mean? is the memory too small? How can I fix it?

Thanks!

Regards,
Dominic Chien

Forum Regular
Yes, that is a memory error. CCSD(T) requires more local memory so you need to increase the amount of memory that you allocate to stack. See, http://nwchemgit.github.io/index.php/Release66:TCE#Maximizing_performance

Clicked A Few Times
Quote:Sean Aug 18th 6:34 am
Yes, that is a memory error. CCSD(T) requires more local memory so you need to increase the amount of memory that you allocate to stack. See, http://nwchemgit.github.io/index.php/Release66:TCE#Maximizing_performance


Thank you Sean! I will try it.

I also found a new setting described in section "CCSD(T)/CR-EOMCCSD(T) calculations with large tiles" of the document: tce:xmem, will this new directive help to solve the local memory limitation problem? how do I determine the suitable size for this directive?

It may be considered as a new feature request: is it possible to define different memory allocations at different stages of the calculation? e.g. using more memory in GA during calculation the CCSD and more local memory for the Triplet evaluation?

Thanks!

Rgds,
Dominic

Forum Regular
I am not familiar with that modification, but the documentation does seem to indicate that it is for reducing local memory requirements and that the size would depend on the amount of local memory that is available.

It is not possible to change the memory allocation during the calculation.

Clicked A Few Times
Quote:Sean Aug 23rd 5:13 am
I am not familiar with that modification, but the documentation does seem to indicate that it is for reducing local memory requirements and that the size would depend on the amount of local memory that is available.

It is not possible to change the memory allocation during the calculation.



Thank you very much,

Now, in my cluster (17nodes, 24 cores each and 128GB memory), after changing to allocate more to stack memory(memory stack 3000 mb heap 200 mb global 1700 mb ), and set set tce:xmem 150, the job can finish without problem.

I understand it not possible to change memory allocation during the calculation currently, but is it possible to add this feature in future?

I am not sure if I can do this, or how difficult to implement this, for example splitting a large CCSD(T) calculation into 2 stages, i.e. CCSD stage and CCSD(T) stage, with different memory requirement, in a single input file with different memory requirement, e.g.

#STAGE 1 CCSD
memory stack 1700 mb heap 200 mb global 3000 mb

geometry
xxx
end

scf
xxx
end

tce
ccsd
freeze atomic
2eorb
2emet 13
thresh 1.0d-5
tilesize 30
attilesize 40
maxiter 25
end

set tce:nts T
set tce:tceiop 2048
set tce:xmem 150
set tce:writeint T
set tce:writet T

task tce energy

#STAGE 2
memory stack 3000 mb heap 200 mb global 1700 mb

tce
ccsd(t)
freeze atomic
2eorb
2emet 13
thresh 1.0d-5
tilesize 30
attilesize 40
maxiter 25
end

set tce:nts T
set tce:tceiop 2048
set tce:xmem 150
set tce:readint T
set tce:readt T

task tce energy

Forum Regular
It is unlikely that the ability to reallocate the memory during the calculation will be added given that this would most likely require all of the memory be purged and then reinitialized. At that point there is not much difference between reallocating the memory during the calculation and simply stringing two separate calculations back to back with different memory allocations.

TCE does have restart capabilities so it does seem possible to me that you would be able to split the CCSD(T) calculation into two calculations: first a CCSD calculation and then a CCSD(T) calculation. This would be beneficial not only from the memory standpoint that has already been mentioned, but also the (T) part of the calculation can scale to more cores than the CCSD part of the calculation. All that being said, I am not very familiar with the TCE part of the code and do not know if there are technical issues that would prevent this from being accomplished. But do note that this would have to be run as two separate calculations if you wanted different memory allocations (i.e. two input decks).

Gets Around
Quote:Chiensh Aug 24th 10:39 am

I understand it not possible to change memory allocation during the calculation currently, but is it possible to add this feature in future?


NWChem uses the MA slack suballocator for a number of reasons, some of which are historical (malloc was slow in the past) and some of which pertain to how ARMCI supports RDMA (the MA slack can be registered with the network up-front, eliminating the need to use a page-registration cache).

It is possible to replace MA allocation in TCE with Fortran allocatable, which is usually a wrapper around malloc and thus has all of the positive and negative features of normal dynamic memory allocation. I have already started doing for KNL MCDRAM, although it should work in other contexts. You can switch the parts of TCE that have this already by setting USE_F90_ALLOCATABLE in your environment at compile time.

Only three parts of TCE support Fortran allocatable right now, but one of them is the part that you are seeing crashing (see src/tce/ccsd_t/ccsd_t.F line 105 in the svn trunk).

Note that using Fortran allocatable may have a negative impact on the performance of GA communication on some networks. As communication is not the bottleneck in the triples portion of CCSD(T), it should not be noticeable, but I mention it anyways.

Anyways, this Fortran allocatable stuff is a recent development and currently not targeted to be the default. I am only doing it because of KNL, hence you should not expect to see it implemented holistically unless PNNL makes a decision to move away from MA.


Forum >> NWChem's corner >> Running NWChem