CCSD(T) Calculation with Quadruple Zeta Basis Set -- Memory Issue


Clicked A Few Times
Hello NWChem Developer,

I am trying to run some CCSD(T) energy calculations with quadruple-zeta basis set of a 5-atom system but it seems the memory requirement of the 2-e file size is a little off the chart (> 100GB). The input file reads:

Quote:username

echo
memory stack 1300 mb heap 200 mb global 1500 mb

start im

title "im"
charge 1

geometry units angstroms print xyz noautosym noautoz
C                  -2.23423902     0.59425408    -0.03224283
O -1.12129315 1.09129114 -0.09445519
O -3.30588587 0.19083810 0.02028232
Br 1.41553615 -0.39477191 0.02227492
H -0.18608027 0.45084374 -0.04234683
end

basis
C  library aug-cc-pvqz
H library aug-cc-pvqz
O library aug-cc-pvqz
  1. BASIS SET: (15s,12p,13d,3f,2g) -> [7s,6p,5d,3f,2g]
Br S
 78967.5000000              0.0000280             -0.0000110
11809.7000000 0.0002140 -0.0000860
2687.1400000 0.0010560 -0.0004350
760.0360000 0.0036880 -0.0014570
241.8110000 0.0079340 -0.0033810
38.4914000 0.1528680 -0.0576580
24.0586000 -0.2786020 0.1123250
14.3587000 -0.2188500 0.0756730
... (to keep it short)
end

ECP
Br nelec 10
Br ul
2 1.0000000 0.0000000
Br S
...
end

scf
 doublet
THRESH 1.0e-5
MAXITER 100
TOL2E 1e-7
end

tce
 ccsd(t)
FREEZE atomic
thresh 1e-6
maxiter 100
end

task tce


The error message reads
Quote:username

2-e (intermediate) file size =    106977507300
2-e (intermediate) file name = ./im.v2i
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
available GA memory 1572841816 bytes
createfile: failed ga_create size/nproc bytes 5348875365
------------------------------------------------------------------------
------------------------------------------------------------------------


I could change the memory options at the beginning of the file but it just seems a little unrealistic to have GA as large as 100G for the nodes that I am using (two Intel Xeon E5-2680v2 “Ivy Bridge” 10-core, 2.8GHz processors, which is 20 cores total, and 128 GB of memory, 6.8GB per core).

I have also tried different "IO" and "2emet" options, for example,
Quote:username

tce
 ccsd(t)
FREEZE atomic
thresh 1e-6
maxiter 100
2eorb
2emet 13
tilesize 10
attilesize 40
end
set tce:xmem 100

and
Quote:username
tce
 tilesize 2
io ga
2EORB
2EMET 15
idiskx 1
ccsd(t)
FREEZE atomic
thresh 1e-6
maxiter 100
end

but the job seems to hang there after printing out "v2 file size = "

Any insight on this issue is greatly appreciated!

Thank you in advance,
Rui

Forum Vet
createfile: failed ga_create size/nproc bytes          5348875365

5348875365=5348875365/1024/1024/1024=4.98 GB

Please change the memory line to


memory stack 1300 mb heap 200 mb global 6000 mb

Clicked A Few Times
Thank you for the prompt response, Edoapra.

I had to adjust the memory to
Quote:username
memory stack 1000 mb heap 100 mb global 5300 mb

so it does not exceed the memory of the core (6.8 GB)

but now run into an error like the following:
Quote:username
slurmstepd: error: Step 3840722.0 exceeded memory limit (123363455 > 122880000), being killed
slurmstepd: error: Step 3840722.0 exceeded memory limit (123618673 > 122880000), being killed
slurmstepd: error: Step 3840722.0 exceeded memory limit (123451708 > 122880000), being killed
slurmstepd: error: *** STEP 3840722.0 ON prod2-0143 CANCELLED AT 2018-06-20T04:05:00 ***
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: Exceeded job memory limit
srun: Job step aborted: Waiting up to 122 seconds for job step to finish.
srun: error: prod2-0148: tasks 100-119: Killed
srun: error: prod2-0150: tasks 140-159: Killed
srun: error: prod2-0149: tasks 120-139: Killed
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/156552/smaps
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/135960/smaps
srun: error: prod2-0145: tasks 41,43,45,47,49,51,53,55,57,59: Killed
srun: error: prod2-0146: tasks 63,65,69,71,75,77,79: Killed
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/234980/smaps
srun: error: prod2-0145: tasks 40,42,44,46,48,50,52,54,56,58: Killed
srun: error: prod2-0146: tasks 61,67,73: Killed
srun: error: prod2-0143: tasks 0-19: Killed
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/77821/smaps
srun: error: prod2-0146: tasks 60,62,64,66,68,70,72,74,76,78: Killed
srun: error: prod2-0144: tasks 20-39: Killed
slurmstepd: error: _get_pss: ferror() indicates error on file /proc/17624/smaps
srun: error: prod2-0147: tasks 80-99: Killed


I have also tried another memory allocation
Quote:username
memory stack 400 mb heap 100 mb global 6000 mb


and it yielded a different error
Quote:username
2-e (intermediate) file size = 107432197225
2-e (intermediate) file name = ./vim.v2i
tce_ao2e: MA problem k_ijkl 18
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
0:
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
For more information see the NWChem manual at
http://nwchemgit.github.io/index.php/NWChem_Documentation


For further details see manual section:


Currently I am using 160 cores -- Do you think I should try to use more cores so the GA allocation on each core is less?

Thank you very much,
Rui

Clicked A Few Times
More CPUs, still failed
In the hope of reducing the memory requirement on each core, I tested the job with 200 cores (increased 160 cores). However, it seems the computer could not allocate the correct amount of memory for MA. For example, the memory line reads:
Quote:username
memory stack 900 mb heap 200 mb global 4300 mb

but the error message shows:
Quote:username
tce_ao2e: fast2e=1
half-transformed integrals in memory

2-e (intermediate) file size =    107432197225
2-e (intermediate) file name = ./vim.v2i
Cpu & wall time / sec 214.7 266.1
available GA memory 211394680 bytes
------------------------------------------------------------------------
createfile: failed ga_create size/nproc bytes 3079838825
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
129: task tce

even though clearly the input file was trying to allocate 4300 mb for GA.

Would you please let me know how to fix this?

Thank you,
Rui

Forum Vet
Please report the tce input block you are currently using and number of processors

Clicked A Few Times
The TCE input block reads:
Quote:username

tce
 ccsd(t)
FREEZE atomic
thresh 1e-6
maxiter 100
end


I am currently using 10 nodes with 20 cores per node. The memory on each core is 6GB. The job script reads:
Quote:username
#!/bin/bash
  1. SBATCH --job-name=vim
  2. SBATCH --partition=kill.q
  3. SBATCH --exclusive
  4. SBATCH --nodes=10
  5. SBATCH --tasks-per-node=20
  6. SBATCH --cpus-per-task=1
  7. SBATCH --error=%A.err
  8. SBATCH --time=0-10:59:59 ## time format is DD-HH:MM:SS
  9. SBATCH --output=%A.out

export I_MPI_FABRICS=shm:tmi
export I_MPI_PMI_LIBRARY=/opt/local/slurm/default/lib64/libpmi.so

source /global/opt/intel_2016/mkl/bin/mklvars.sh intel64

module load intel_2016/ics intel_2016/impi

export NWCHEM_TARGET=LINUX64
  1. CHANGE TO THE CORRECT PATH
export ARMCI_DEFAULT_SHMMAX=8096
export MPIRUN_PATH="srun"
export MPIRUN_NPOPT="-n"
export INPUT="vim"

$MPIRUN_PATH $MPIRUN_NPOPT ${SLURM_NTASKS} $NWCHEM_EXECUTABLE $INPUT.nw


Thank you!

Forum Vet
Please try the following input
echo
permanent_dir /global/cscratch1/sd/apra/arar
memory stack 1300 mb heap 200 mb global 7000 mb

start im

title "im"
charge 1

geometry #units angstroms print xyz noautosym noautoz
 C                  -2.23423902     0.59425408    -0.03224283
 O                  -1.12129315     1.09129114    -0.09445519
 O                  -3.30588587     0.19083810     0.02028232
 Br                  1.41553615    -0.39477191     0.02227492
 H                  -0.18608027     0.45084374    -0.04234683
end

basis spherical
 C  library aug-cc-pvqz
 H  library aug-cc-pvqz
 O  library aug-cc-pvqz
 Br  library aug-cc-pvqz-pp
end

ECP
 Br  library aug-cc-pvqz-pp
end

scf
  doublet
  THRESH 1.0e-5
  MAXITER 100
  TOL2E 1e-12
end

tce
  ccsd(t)
  FREEZE atomic
  tilesize 8
  attilesize 12
  thresh 1e-6
  maxiter 100
end

task tce

Clicked A Few Times
Thank you, Edoapra.

Just want to make sure I understand this correctly.

I should try to use 200 cores and each core should allocate the following amount of memory?
Quote:username
memory stack 1300 mb heap 200 mb global 7000 mb

Forum Vet
Quote:Srhhh Jun 20th 6:41 pm
Thank you, Edoapra.

Just want to make sure I understand this correctly.

I should try to use 200 cores and each core should allocate the following amount of memory?
Quote:username
memory stack 1300 mb heap 200 mb global 7000 mb


You should use only 10 tasks-per-node for a total of 100 cores since you mentioned that you have 6GB/core

Clicked A Few Times
Thank you, Edoapra.

I managed to get more core (20 nodes, 400 cores) so I was able to run without MA allocation issue with the following memory line:
Quote:username
memory stack 1000 mb heap 200 mb global 4400 mb


Everything else in the input file is identified as in your previous comment. It took about 5 mins for the calculation to go to
Quote:username
tce_ao2e: fast2e=1
half-transformed integrals in memory

2-e (intermediate) file size = 105684005025
2-e (intermediate) file name = ./vim.v2i
Cpu & wall time / sec 144.8 184.8

tce_mo2e: fast2e=1
2-e integrals stored in memory

but the calculation has been hanging there for over eight hours -- nothing got written into the folder or output file at all. I also noticed there were some vim.aoints.x files that seem to have not been cleaned up properly. Is the behavior normal for this size of a calculation or this QZ calculation is pushing the limit of NWChem?

Thanks again.

Clicked A Few Times
Unstable CCSD iterations
The test run in the previous comment actually went to the CCSD iterations part (each iteration takes about 1 hour wall time) but the iterations seem unstable. Please see below:

Quote:username
t2 file handle = -995

CCSD iterations
-----------------------------------------------------------------
Iter Residuum Correlation Cpu Wall V2*C2
-----------------------------------------------------------------
1 0.3745619466040 -1.0830661992146 1975.7 3034.0 759.3
2 0.3338130779425 -1.0377329617715 1992.9 3058.0 760.8
3 7.2614902105214 -1.0607684520852 1991.8 3049.5 762.0
4 60.1400573985661 -1.0597624893767 1986.2 3038.5 759.7
51384.5956104600380 -1.0695691959406 1993.2 3050.9 765.8
MICROCYCLE DIIS UPDATE: 5 5


The geometry of this calculation was optimized from ccsd(t)/aug-cc-pvTZ basis set so this error should not be from a bad geometry.

Thank you!

Forum Vet
Quote:Srhhh Jun 21st 5:52 pm
The test run in the previous comment actually went to the CCSD iterations part (each iteration takes about 1 hour wall time) but the iterations seem unstable.

Thank you!


Did you use a spherical or cartesian basis?

Clicked A Few Times
I had a Cartesian basis and now changed to spherical. I will update you how this test goes.

Another problem just happened:

Quote:username
[25] Received an Error in Communication: (-991) 25:nga_get_common:cannot locate region: ./vim.r1.d1 [18591:18511 ,1:1 ]:
[212] Received an Error in Communication: (-991) 212:nga_get_common:cannot locate region: ./vim.r1.d1 [18526:18511 ,1:1 ]:
application called MPI_Abort(comm=0x84000000, -991) - process 212
[173] Received an Error in Communication: (-991) 173:nga_get_common:cannot locate region: ./vim.r1.d1 [18721:18511 ,1:1 ]:
application called MPI_Abort(comm=0x84000000, -991) - process 173
application called MPI_Abort(comm=0x84000000, -991) - process 25
[179] Received an Error in Communication: (-991) 179:nga_get_common:cannot locate region: ./vim.r1.d1 [18656:18511 ,1:1 ]:
application called MPI_Abort(comm=0x84000000, -991) - process 179
srun: error: prod2-0101: task 212: Exited with exit code 33
srun: error: prod2-0029: task 25: Exited with exit code 33
srun: error: prod2-0096: tasks 173,179: Exited with exit code 33


From what I can find online this seems to be also related to memory (even though MA 'test' passed) and CCSD iterations started. Does DIIS require additional memories?

Thank you

Forum Vet
This calculation is very hardware-demanding. I have tried NWCHEM6.8 on MAC to using aug-cc-pvdz.

...
Iterations converged
CCSD correlation energy / hartree = ...       
CCSD total energy / hartree = ...

Singles contributions

Doubles contributions
...
CCSD[T]  correction energy / hartree =       ...
CCSD[T] correlation energy / hartree = ...
CCSD(T) correction energy / hartree = ...
CCSD(T) correlation energy / hartree = ...
CCSD(T) total energy / hartree = ...


 ...
CITATION
--------
Please cite the following reference when publishing
results obtained with NWChem:

                M. Valiev, E.J. Bylaska, N. Govind, K. Kowalski,
T.P. Straatsma, H.J.J. van Dam, D. Wang, J. Nieplocha,
E. Apra, T.L. Windus, W.A. de Jong
"NWChem: a comprehensive and scalable open-source
solution for large scale molecular simulations"
Comput. Phys. Commun. 181, 1477 (2010)
doi:10.1016/j.cpc.2010.04.018

                                     AUTHORS
-------
E. Apra, E. J. Bylaska, W. A. de Jong, N. Govind, K. Kowalski,
T. P. Straatsma, M. Valiev, H. J. J. van Dam, D. Wang, T. L. Windus,
J. Hammond, J. Autschbach, K. Bhaskaran-Nair, J. Brabec, K. Lopata,
S. A. Fischer, S. Krishnamoorthy, M. Jacquelin, W. Ma, M. Klemm, O. Villa,
Y. Chen, V. Anisimov, F. Aquino, S. Hirata, M. T. Hackler, V. Konjkov,
D. Mejia-Rodriguez, T. Risthaus, M. Malagoli, A. Marenich,
A. Otero-de-la-Roza, J. Mullin, P. Nichols, R. Peverati, J. Pittner, Y. Zhao,
P.-D. Fan, A. Fonari, M. J. Williamson, R. J. Harrison, J. R. Rehr,
M. Dupuis, D. Silverstein, D. M. A. Smith, J. Nieplocha, V. Tipparaju,
M. Krishnan, B. E. Van Kuiken, A. Vazquez-Mayagoitia, L. Jensen, M. Swart,
Q. Wu, T. Van Voorhis, A. A. Auer, M. Nooijen, L. D. Crosby, E. Brown,
G. Cisneros, G. I. Fann, H. Fruchtl, J. Garza, K. Hirao, R. A. Kendall,
J. A. Nichols, K. Tsemekhman, K. Wolinski, J. Anchell, D. E. Bernholdt,
P. Borowski, T. Clark, D. Clerc, H. Dachsel, M. J. O. Deegan, K. Dyall,
D. Elwood, E. Glendening, M. Gutowski, A. C. Hess, J. Jaffe, B. G. Johnson,
J. Ju, R. Kobayashi, R. Kutteh, Z. Lin, R. Littlefield, X. Long, B. Meng,
T. Nakajima, S. Niu, L. Pollack, M. Rosing, K. Glaesemann, G. Sandrone,
M. Stave, H. Taylor, G. Thomas, J. H. van Lenthe, A. T. Wong, Z. Zhang.

Forum Vet
I have tried aug-cc-pvtz, which I think is adequate for many practical purposes, with "ROHF"; and others added into the proper groups.
I am very much afraid that the original calculation employing aug-cc-pvqz only could be successful on an excellent performance supercomputer with official NWCHEM installed in a USA national lab.

NWCHEM6.8 on MAC gave
...

Iterations converged
CCSD correlation energy / hartree =  ...     
CCSD total energy / hartree = ...

Singles contributions

 ...

                                    CITATION
--------
Please cite the following reference when publishing
results obtained with NWChem:

                M. Valiev, E.J. Bylaska, N. Govind, K. Kowalski,
T.P. Straatsma, H.J.J. van Dam, D. Wang, J. Nieplocha,
E. Apra, T.L. Windus, W.A. de Jong
"NWChem: a comprehensive and scalable open-source
solution for large scale molecular simulations"
Comput. Phys. Commun. 181, 1477 (2010)
doi:10.1016/j.cpc.2010.04.018

                                     AUTHORS
-------
E. Apra, E. J. Bylaska, W. A. de Jong, N. Govind, K. Kowalski,
T. P. Straatsma, M. Valiev, H. J. J. van Dam, D. Wang, T. L. Windus,
J. Hammond, J. Autschbach, K. Bhaskaran-Nair, J. Brabec, K. Lopata,
S. A. Fischer, S. Krishnamoorthy, M. Jacquelin, W. Ma, M. Klemm, O. Villa,
Y. Chen, V. Anisimov, F. Aquino, S. Hirata, M. T. Hackler, V. Konjkov,
D. Mejia-Rodriguez, T. Risthaus, M. Malagoli, A. Marenich,
A. Otero-de-la-Roza, J. Mullin, P. Nichols, R. Peverati, J. Pittner, Y. Zhao,
P.-D. Fan, A. Fonari, M. J. Williamson, R. J. Harrison, J. R. Rehr,
M. Dupuis, D. Silverstein, D. M. A. Smith, J. Nieplocha, V. Tipparaju,
M. Krishnan, B. E. Van Kuiken, A. Vazquez-Mayagoitia, L. Jensen, M. Swart,
Q. Wu, T. Van Voorhis, A. A. Auer, M. Nooijen, L. D. Crosby, E. Brown,
G. Cisneros, G. I. Fann, H. Fruchtl, J. Garza, K. Hirao, R. A. Kendall,
J. A. Nichols, K. Tsemekhman, K. Wolinski, J. Anchell, D. E. Bernholdt,
P. Borowski, T. Clark, D. Clerc, H. Dachsel, M. J. O. Deegan, K. Dyall,
D. Elwood, E. Glendening, M. Gutowski, A. C. Hess, J. Jaffe, B. G. Johnson,
J. Ju, R. Kobayashi, R. Kutteh, Z. Lin, R. Littlefield, X. Long, B. Meng,
T. Nakajima, S. Niu, L. Pollack, M. Rosing, K. Glaesemann, G. Sandrone,
M. Stave, H. Taylor, G. Thomas, J. H. van Lenthe, A. T. Wong, Z. Zhang.


Forum >> NWChem's corner >> Running NWChem