Help with large CCSD(T) Calculation


Click here for full thread
Just Got Here
Hi all. I'm working on some CCSD(T) calculations of CO2 dimers using aug-cc-pvqz basis sets. I realize that this is a very large job. I've run a few calculations previously using molpro (on XSEDE Blacklight), which (I don't have my notes on me, but if I recall correctly) took about 20 hours on 16 cores, and required ~256GB memory.

I would like to try running these jobs on NWChem instead, but I'm having problems with 1) tweaking the performance options and 2) my jobs are dying due to a file writing error.

First, here is my input file. I've not included the basis set specification, as it's a long copy/paste from BSEL

title "co2 test"
#memory stack 9600 mb heap 800 mb global 4800 mb // tried this also, same error
memory stack 1500 mb heap 100 mb global 1400 mb
geometry
	symmetry c1
	C   2.12544   0.00000   0.00000
	O   1.82852  -0.93172  -0.62769
	O   2.42235   0.93172   0.62769
	C  -2.12544   0.00000   0.00000
	O  -1.20623  -0.32695  -0.63119
	O  -3.04465   0.32695   0.63119
end
basis
	## *snip*  
end

bsse
	mon firstmonomer 1 2 3
	mon secondmonomer 4 5 6
end
scf
	singlet
	rhf
end
tce
	ccsd(t)
	2eorb
	io ga
	tilesize 10 # also tried 15 and 20
end
task tce energy


I'm running the jobs on XSEDE trestles, on 8 cores (mpirun_rsh) over 2 nodes (64GB mem/node) using environmental variable ARMCI_DEFAULT_SHMMAX=2048. I've also tried running without the variable set, but with the same results.

So now the results. The job runs for a while, and generates ~150GB of temp files before dying. I've pasted the relevant output below.

*snip*
           General Information
            -------------------
      Number of processors :    16
         Wavefunction type : Restricted Hartree-Fock
          No. of electrons :    44
           Alpha electrons :    22
            Beta electrons :    22
           No. of orbitals :  1234
            Alpha orbitals :   617
             Beta orbitals :   617
        Alpha frozen cores :     0
         Beta frozen cores :     0
     Alpha frozen virtuals :     0
      Beta frozen virtuals :     0
         Spin multiplicity : singlet 
    Number of AO functions :   630
       Number of AO shells :   120
        Use of symmetry is : off
      Symmetry adaption is : off
         Schwarz screening : 0.10D-09

  !! WARNING !! The number of MO is less than the number of AO

          Correlation Information
          -----------------------
          Calculation type : Coupled-cluster singles & doubles w/ perturbation           
   Perturbative correction : (T)                                                         
            Max iterations :      100
        Residual threshold : 0.10D-06
          DIIS level shift : 0.00D+00
    CC-LR DIIS level shift : 0.00D+00
    CC-IR DIIS level shift : 0.00D+00
          Amplitude update :  5-th order DIIS
                I/O scheme : Global Array Library

            Memory Information
            ------------------
          Available GA space size is    ********** doubles
          Available MA space size is     681563897 doubles

 Maximum block size supplied by input
 Maximum block size        20 doubles

 tile_dim =     20

 Block   Spin    Irrep     Size     Offset   Alpha
 -------------------------------------------------
   1    alpha     a     11 doubles       0       1
   2    alpha     a     11 doubles      11       2
   3    beta      a     11 doubles      22       1
   4    beta      a     11 doubles      33       2
   5    alpha     a     19 doubles      44       5
   6    alpha     a     20 doubles      63       6
   7    alpha     a     20 doubles      83       7
   8    alpha     a     20 doubles     103       8
   9    alpha     a     20 doubles     123       9
  10    alpha     a     20 doubles     143      10
  11    alpha     a     19 doubles     163      11
  12    alpha     a     20 doubles     182      12
  13    alpha     a     20 doubles     202      13
  14    alpha     a     20 doubles     222      14
  15    alpha     a     20 doubles     242      15
  16    alpha     a     20 doubles     262      16
  17    alpha     a     19 doubles     282      17
  18    alpha     a     20 doubles     301      18
  19    alpha     a     20 doubles     321      19
  20    alpha     a     20 doubles     341      20
  21    alpha     a     20 doubles     361      21
  22    alpha     a     20 doubles     381      22
  23    alpha     a     19 doubles     401      23
  24    alpha     a     20 doubles     420      24
  25    alpha     a     20 doubles     440      25
  26    alpha     a     20 doubles     460      26
  27    alpha     a     20 doubles     480      27
  28    alpha     a     20 doubles     500      28
  29    alpha     a     19 doubles     520      29
  30    alpha     a     20 doubles     539      30
  31    alpha     a     20 doubles     559      31
  32    alpha     a     20 doubles     579      32
  33    alpha     a     20 doubles     599      33
  34    alpha     a     20 doubles     619      34
  35    beta      a     19 doubles     639       5
  36    beta      a     20 doubles     658       6
  37    beta      a     20 doubles     678       7
  38    beta      a     20 doubles     698       8
  39    beta      a     20 doubles     718       9
  40    beta      a     20 doubles     738      10
  41    beta      a     19 doubles     758      11
  42    beta      a     20 doubles     777      12
  43    beta      a     20 doubles     797      13
  44    beta      a     20 doubles     817      14
  45    beta      a     20 doubles     837      15
  46    beta      a     20 doubles     857      16
  47    beta      a     19 doubles     877      17
  48    beta      a     20 doubles     896      18
  49    beta      a     20 doubles     916      19
  50    beta      a     20 doubles     936      20
  51    beta      a     20 doubles     956      21
  52    beta      a     20 doubles     976      22
  53    beta      a     19 doubles     996      23
  54    beta      a     20 doubles    1015      24
  55    beta      a     20 doubles    1035      25
  56    beta      a     20 doubles    1055      26
  57    beta      a     20 doubles    1075      27
  58    beta      a     20 doubles    1095      28
  59    beta      a     19 doubles    1115      29
  60    beta      a     20 doubles    1134      30
  61    beta      a     20 doubles    1154      31
  62    beta      a     20 doubles    1174      32
  63    beta      a     20 doubles    1194      33
  64    beta      a     20 doubles    1214      34

 Global array virtual files algorithm will be used

 Parallel file system coherency ......... OK

 Integral file          = ./co2.aoints.00
 Record size in doubles =  65536        No. of integs per rec  =  32766
 Max. records in memory =      0        Max. records in file   = ******
 No. of bits per label  =     16        No. of bits per value  =     64


 #quartets = 1.807D+07 #integrals = 1.013D+10 #direct =  0.0% #cached =100.0%


File balance: exchanges=    63  moved=  7630  time=   5.1


 Fock matrix recomputed
 1-e file size   =           380689
 1-e file name   = ./co2.f1            
 Cpu & wall time / sec          137.1          183.2
 4-electron integrals stored in orbital form
  available GA memory                2516039248  bytes
    available GA memory    available GA memory  available GA memory   available GA memory                  2516039256               2516039256       available GA memory  available GA memory  available GA memory  available GA memory Last System Error Message from Task 10:: No such file or directory
Last System Error Message from Task 9:: No such file or directory
 available GA memory                2516039256                 2516039256 bytes 
  bytes   bytes              2516392056              2516392056  
               2516392056 bytes bytes
  available GA memory               2516039256

               2516039256 bytes   createfile: failed ga_create size=*********
 createfile: failed ga_create size=********* createfile: failed ga_create size=*********
 ------------------------------------------------------------------------


 bytes
 ------------------------------------------------------------------------

 ------------------------------------------------------------------------ ------------------------------------------------------------------------

 ------------------------------------------------------------------------
 ------------------------------------------------------------------------

  ------------------------------------------------------------------------ ------------------------------------------------------------------------ current input line : 
 ------------------------------------------------------------------------ ------------------------------------------------------------------------

 
 current input line :  ------------------------------------------------------------------------ ------------------------------------------------------------------------



       0:       0:  ------------------------------------------------------------------------
 ------------------------------------------------------------------------
Last System Error Message from Task 0:: No such file or directory
 createfile: failed ga_create size=********* createfile: failed ga_create size=********* createfile: failed ga_create size=********* ------------------------------------------------------------------------


 ------------------------------------------------------------------------  current input line :  ------------------------------------------------------------------------
  ------------------------------------------------------------------------


     0: 
 ------------------------------------------------------------------------     0:  ------------------------------------------------------------------------   289: task tce energy
 ------------------------------------------------------------------------

 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
 For more information see the NWChem manual at 
 http://www.emsl.pnl.gov/docs/nwchem/nwchem.html


 For further details see manual section: 
                                                                                                                                                                                                                                                                
0:0:createfile: failed ga_create size=:: 2137779302
(rank:0 hostname:trestles-2-32.local pid:25704):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
 ------------------------------------------------------------------------

 ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------


 ------------------------------------------------------------------------ ------------------------------------------------------------------------

 ------------------------------------------------------------------------

 ------------------------------------------------------------------------ ------------------------------------------------------------------------     0:  ------------------------------------------------------------------------   ------------------------------------------------------------------------For more information see the NWChem manual at 



For more information see the NWChem manual at  

 ------------------------------------------------------------------------
 ------------------------------------------------------------------------For more information see the NWChem manual at   

 ------------------------------------------------------------------------
http://www.emsl.pnl.gov/docs/nwchem/nwchem ------------------------------------------------------------------------
http://www.emsl.pnl.gov/docs/nwchem/nwchem .html ------------------------------------------------------------------------ 
.html
 For more information see the NWChem manual at 

For more information see the NWChem manual at  http://www.emsl.pnl.gov/docs/nwchem/nwchem


For more information see the NWChem manual at .html

 ------------------------------------------------------------------------  

http://www.emsl.pnl.gov/docs/nwchem/nwchem
http://www.emsl.pnl.gov/docs/nwchem/nwchem 
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 12
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 10
.htmlhttp://www.emsl.pnl.gov/docs/nwchem/nwchem
 
For further details see manual section:   application called MPI_Abort(MPI_COMM_WORLD, 0) - process 13

application called MPI_Abort(MPI_COMM_WORLD, 0) - process 15
For more information see the NWChem manual at .htmlFor further details see manual section: 







   For further details see manual section:  For further details see manual section: .html                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               


  application called MPI_Abort(MPI_COMM_WORLD, 0) - process 9
For further details see manual section: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 11

http://www.emsl.pnl.gov/docs/nwchem/nwchem

                                                                                                                                                                                                                                                               
.html

  
                                                                                                                                                                                                                                                                For further details see manual section:                                                                                                                                                                                                                                                                



 
10:10:createfile: failed ga_create size=:: 2137779302
                                                                                                                                                                                                                                                                
For further details see manual section: 
(rank:10 hostname:trestles-2-4.local pid:10516):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
 9:9:createfile: failed ga_create size=:: 2137779302
                                                                                                                                                                                                                                                               13:13:createfile: failed ga_create size=:: 2137779302

(rank:9 hostname:trestles-2-4.local pid:10515):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
11:11:createfile: failed ga_create size=:: 2137779302
(rank:13 hostname:trestles-2-4.local pid:10519):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
(rank:11 hostname:trestles-2-4.local pid:10517):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
12:12:createfile: failed ga_create size=:: 2137779302
(rank:12 hostname:trestles-2-4.local pid:10518):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
15:15:createfile: failed ga_create size=:: 2137779302
(rank:15 hostname:trestles-2-4.local pid:10521):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
8:8:createfile: failed ga_create size=:: 2137779302
(rank:8 hostname:trestles-2-4.local pid:10514):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
 available GA memory                2516392056  bytes
 createfile: failed ga_create size=*********
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
  current input line : 
     0: 
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
 For more information see the NWChem manual at 
 http://www.emsl.pnl.gov/docs/nwchem/nwchem.html


 For further details see manual section: 
                                                                                                                                                                                                                                                                
14:14:createfile: failed ga_create size=:: 2137779302
(rank:14 hostname:trestles-2-4.local pid:10520):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
Last System Error Message from Task 14:: No such file or directory
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------  ------------------------------------------------------------------------
 current input line : 
 ------------------------------------------------------------------------Last System Error Message from Task 2:: No such file or directory

Last System Error Message from Task 3:: No such file or directory
 Last System Error Message from Task 1:: No such file or directory
 ------------------------------------------------------------------------Last System Error Message from Task 5:: No such file or directory
 ------------------------------------------------------------------------Last System Error Message from Task 7:: No such file or directory

Last System Error Message from Task 4:: No such file or directory

Last System Error Message from Task 6:: No such file or directory
 current input line :   ------------------------------------------------------------------------
 current input line :      0: 
 

  
 current input line :      0:  ------------------------------------------------------------------------ 

 current input line : 
 current input line :  current input line : 
 ------------------------------------------------------------------------
     0:      0: 


 ------------------------------------------------------------------------     0: 
     0:  ------------------------------------------------------------------------ ------------------------------------------------------------------------

     0: 
 ------------------------------------------------------------------------


 ------------------------------------------------------------------------ ------------------------------------------------------------------------
 ------------------------------------------------------------------------

 ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------


 ------------------------------------------------------------------------
  ------------------------------------------------------------------------
  ------------------------------------------------------------------------ ------------------------------------------------------------------------For more information see the NWChem manual at For more information see the NWChem manual at  ------------------------------------------------------------------------




 ------------------------------------------------------------------------   ------------------------------------------------------------------------ 

http://www.emsl.pnl.gov/docs/nwchem/nwchem
 ------------------------------------------------------------------------http://www.emsl.pnl.gov/docs/nwchem/nwchem
 .html
 
 ------------------------------------------------------------------------

For more information see the NWChem manual at  For more information see the NWChem manual at  For further details see manual section: .html
For more information see the NWChem manual at 


application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2

  
                                                                                                                                                                                                                                                                http://www.emsl.pnl.gov/docs/nwchem/nwchem
.htmlapplication called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
http://www.emsl.pnl.gov/docs/nwchem/nwchem
  


For further details see manual section:  For further details see manual section: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 14
For more information see the NWChem manual at  
.htmlFor more information see the NWChem manual at 

 http://www.emsl.pnl.gov/docs/nwchem/nwchemapplication called MPI_Abort(MPI_COMM_WORLD, 0) - process 4

  
                                                                                                                                                                                                                                                               2:2:createfile: failed ga_create size=:: 2137779302
                                                                                                                                                                                                                                                               http://www.emsl.pnl.gov/docs/nwchem/nwchemapplication called MPI_Abort(MPI_COMM_WORLD, 0) - process 5

.html 
(rank:2 hostname:trestles-2-32.local pid:25706):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
http://www.emsl.pnl.gov/docs/nwchem/nwchem.html.html






 application called MPI_Abort(MPI_COMM_WORLD, 0) - process 6
 
For further details see manual section: 
For further details see manual section: 
3:3:createfile: failed ga_create size=:: 2137779302
  application called MPI_Abort(MPI_COMM_WORLD, 0) - process 7


For further details see manual section: For further details see manual section:   (rank:3 hostname:trestles-2-32.local pid:25707):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                               

 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
1:1:createfile: failed ga_create size=:: 2137779302

(rank:1 hostname:trestles-2-32.local pid:25705):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
7:7:createfile: failed ga_create size=:: 2137779302
5:5:createfile: failed ga_create size=:: 2137779302
(rank:7 hostname:trestles-2-32.local pid:25711):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
(rank:5 hostname:trestles-2-32.local pid:25709):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
6:6:createfile: failed ga_create size=:: 2137779302
4:4:createfile: failed ga_create size=:: 2137779302
(rank:6 hostname:trestles-2-32.local pid:25710):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
(rank:4 hostname:trestles-2-32.local pid:25708):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 8


Any help would be very very very appreciated. Thanks.

Keith McLaughlin
University of South Florida