Hi all. I'm working on some CCSD(T) calculations of CO2 dimers using aug-cc-pvqz basis sets. I realize that this is a very large job. I've run a few calculations previously using molpro (on XSEDE Blacklight), which (I don't have my notes on me, but if I recall correctly) took about 20 hours on 16 cores, and required ~256GB memory.
I would like to try running these jobs on NWChem instead, but I'm having problems with 1) tweaking the performance options and 2) my jobs are dying due to a file writing error.
First, here is my input file. I've not included the basis set specification, as it's a long copy/paste from BSEL
title "co2 test"
#memory stack 9600 mb heap 800 mb global 4800 mb // tried this also, same error
memory stack 1500 mb heap 100 mb global 1400 mb
geometry
symmetry c1
C 2.12544 0.00000 0.00000
O 1.82852 -0.93172 -0.62769
O 2.42235 0.93172 0.62769
C -2.12544 0.00000 0.00000
O -1.20623 -0.32695 -0.63119
O -3.04465 0.32695 0.63119
end
basis
## *snip*
end
bsse
mon firstmonomer 1 2 3
mon secondmonomer 4 5 6
end
scf
singlet
rhf
end
tce
ccsd(t)
2eorb
io ga
tilesize 10 # also tried 15 and 20
end
task tce energy
I'm running the jobs on XSEDE trestles, on 8 cores (mpirun_rsh) over 2 nodes (64GB mem/node) using environmental variable ARMCI_DEFAULT_SHMMAX=2048. I've also tried running without the variable set, but with the same results.
So now the results. The job runs for a while, and generates ~150GB of temp files before dying. I've pasted the relevant output below.
*snip*
General Information
-------------------
Number of processors : 16
Wavefunction type : Restricted Hartree-Fock
No. of electrons : 44
Alpha electrons : 22
Beta electrons : 22
No. of orbitals : 1234
Alpha orbitals : 617
Beta orbitals : 617
Alpha frozen cores : 0
Beta frozen cores : 0
Alpha frozen virtuals : 0
Beta frozen virtuals : 0
Spin multiplicity : singlet
Number of AO functions : 630
Number of AO shells : 120
Use of symmetry is : off
Symmetry adaption is : off
Schwarz screening : 0.10D-09
!! WARNING !! The number of MO is less than the number of AO
Correlation Information
-----------------------
Calculation type : Coupled-cluster singles & doubles w/ perturbation
Perturbative correction : (T)
Max iterations : 100
Residual threshold : 0.10D-06
DIIS level shift : 0.00D+00
CC-LR DIIS level shift : 0.00D+00
CC-IR DIIS level shift : 0.00D+00
Amplitude update : 5-th order DIIS
I/O scheme : Global Array Library
Memory Information
------------------
Available GA space size is ********** doubles
Available MA space size is 681563897 doubles
Maximum block size supplied by input
Maximum block size 20 doubles
tile_dim = 20
Block Spin Irrep Size Offset Alpha
-------------------------------------------------
1 alpha a 11 doubles 0 1
2 alpha a 11 doubles 11 2
3 beta a 11 doubles 22 1
4 beta a 11 doubles 33 2
5 alpha a 19 doubles 44 5
6 alpha a 20 doubles 63 6
7 alpha a 20 doubles 83 7
8 alpha a 20 doubles 103 8
9 alpha a 20 doubles 123 9
10 alpha a 20 doubles 143 10
11 alpha a 19 doubles 163 11
12 alpha a 20 doubles 182 12
13 alpha a 20 doubles 202 13
14 alpha a 20 doubles 222 14
15 alpha a 20 doubles 242 15
16 alpha a 20 doubles 262 16
17 alpha a 19 doubles 282 17
18 alpha a 20 doubles 301 18
19 alpha a 20 doubles 321 19
20 alpha a 20 doubles 341 20
21 alpha a 20 doubles 361 21
22 alpha a 20 doubles 381 22
23 alpha a 19 doubles 401 23
24 alpha a 20 doubles 420 24
25 alpha a 20 doubles 440 25
26 alpha a 20 doubles 460 26
27 alpha a 20 doubles 480 27
28 alpha a 20 doubles 500 28
29 alpha a 19 doubles 520 29
30 alpha a 20 doubles 539 30
31 alpha a 20 doubles 559 31
32 alpha a 20 doubles 579 32
33 alpha a 20 doubles 599 33
34 alpha a 20 doubles 619 34
35 beta a 19 doubles 639 5
36 beta a 20 doubles 658 6
37 beta a 20 doubles 678 7
38 beta a 20 doubles 698 8
39 beta a 20 doubles 718 9
40 beta a 20 doubles 738 10
41 beta a 19 doubles 758 11
42 beta a 20 doubles 777 12
43 beta a 20 doubles 797 13
44 beta a 20 doubles 817 14
45 beta a 20 doubles 837 15
46 beta a 20 doubles 857 16
47 beta a 19 doubles 877 17
48 beta a 20 doubles 896 18
49 beta a 20 doubles 916 19
50 beta a 20 doubles 936 20
51 beta a 20 doubles 956 21
52 beta a 20 doubles 976 22
53 beta a 19 doubles 996 23
54 beta a 20 doubles 1015 24
55 beta a 20 doubles 1035 25
56 beta a 20 doubles 1055 26
57 beta a 20 doubles 1075 27
58 beta a 20 doubles 1095 28
59 beta a 19 doubles 1115 29
60 beta a 20 doubles 1134 30
61 beta a 20 doubles 1154 31
62 beta a 20 doubles 1174 32
63 beta a 20 doubles 1194 33
64 beta a 20 doubles 1214 34
Global array virtual files algorithm will be used
Parallel file system coherency ......... OK
Integral file = ./co2.aoints.00
Record size in doubles = 65536 No. of integs per rec = 32766
Max. records in memory = 0 Max. records in file = ******
No. of bits per label = 16 No. of bits per value = 64
#quartets = 1.807D+07 #integrals = 1.013D+10 #direct = 0.0% #cached =100.0%
File balance: exchanges= 63 moved= 7630 time= 5.1
Fock matrix recomputed
1-e file size = 380689
1-e file name = ./co2.f1
Cpu & wall time / sec 137.1 183.2
4-electron integrals stored in orbital form
available GA memory 2516039248 bytes
available GA memory available GA memory available GA memory available GA memory 2516039256 2516039256 available GA memory available GA memory available GA memory available GA memory Last System Error Message from Task 10:: No such file or directory
Last System Error Message from Task 9:: No such file or directory
available GA memory 2516039256 2516039256 bytes
bytes bytes 2516392056 2516392056
2516392056 bytes bytes
available GA memory 2516039256
2516039256 bytes createfile: failed ga_create size=*********
createfile: failed ga_create size=********* createfile: failed ga_create size=*********
------------------------------------------------------------------------
bytes
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------ current input line :
------------------------------------------------------------------------ ------------------------------------------------------------------------
current input line : ------------------------------------------------------------------------ ------------------------------------------------------------------------
0: 0: ------------------------------------------------------------------------
------------------------------------------------------------------------
Last System Error Message from Task 0:: No such file or directory
createfile: failed ga_create size=********* createfile: failed ga_create size=********* createfile: failed ga_create size=********* ------------------------------------------------------------------------
------------------------------------------------------------------------ current input line : ------------------------------------------------------------------------
------------------------------------------------------------------------
0:
------------------------------------------------------------------------ 0: ------------------------------------------------------------------------ 289: task tce energy
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
For more information see the NWChem manual at
http://www.emsl.pnl.gov/docs/nwchem/nwchem.html
For further details see manual section:
0:0:createfile: failed ga_create size=:: 2137779302
(rank:0 hostname:trestles-2-32.local pid:25704):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------ 0: ------------------------------------------------------------------------ ------------------------------------------------------------------------For more information see the NWChem manual at
For more information see the NWChem manual at
------------------------------------------------------------------------
------------------------------------------------------------------------For more information see the NWChem manual at
------------------------------------------------------------------------
http://www.emsl.pnl.gov/docs/nwchem/nwchem ------------------------------------------------------------------------
http://www.emsl.pnl.gov/docs/nwchem/nwchem .html ------------------------------------------------------------------------
.html
For more information see the NWChem manual at
For more information see the NWChem manual at http://www.emsl.pnl.gov/docs/nwchem/nwchem
For more information see the NWChem manual at .html
------------------------------------------------------------------------
http://www.emsl.pnl.gov/docs/nwchem/nwchem
http://www.emsl.pnl.gov/docs/nwchem/nwchem
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 12
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 10
.htmlhttp://www.emsl.pnl.gov/docs/nwchem/nwchem
For further details see manual section: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 13
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 15
For more information see the NWChem manual at .htmlFor further details see manual section:
For further details see manual section: For further details see manual section: .html
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 9
For further details see manual section: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 11
http://www.emsl.pnl.gov/docs/nwchem/nwchem
.html
For further details see manual section:
10:10:createfile: failed ga_create size=:: 2137779302
For further details see manual section:
(rank:10 hostname:trestles-2-4.local pid:10516):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
9:9:createfile: failed ga_create size=:: 2137779302
13:13:createfile: failed ga_create size=:: 2137779302
(rank:9 hostname:trestles-2-4.local pid:10515):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
11:11:createfile: failed ga_create size=:: 2137779302
(rank:13 hostname:trestles-2-4.local pid:10519):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
(rank:11 hostname:trestles-2-4.local pid:10517):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
12:12:createfile: failed ga_create size=:: 2137779302
(rank:12 hostname:trestles-2-4.local pid:10518):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
15:15:createfile: failed ga_create size=:: 2137779302
(rank:15 hostname:trestles-2-4.local pid:10521):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
8:8:createfile: failed ga_create size=:: 2137779302
(rank:8 hostname:trestles-2-4.local pid:10514):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
available GA memory 2516392056 bytes
createfile: failed ga_create size=*********
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
0:
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
For more information see the NWChem manual at
http://www.emsl.pnl.gov/docs/nwchem/nwchem.html
For further details see manual section:
14:14:createfile: failed ga_create size=:: 2137779302
(rank:14 hostname:trestles-2-4.local pid:10520):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
Last System Error Message from Task 14:: No such file or directory
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------
current input line :
------------------------------------------------------------------------Last System Error Message from Task 2:: No such file or directory
Last System Error Message from Task 3:: No such file or directory
Last System Error Message from Task 1:: No such file or directory
------------------------------------------------------------------------Last System Error Message from Task 5:: No such file or directory
------------------------------------------------------------------------Last System Error Message from Task 7:: No such file or directory
Last System Error Message from Task 4:: No such file or directory
Last System Error Message from Task 6:: No such file or directory
current input line : ------------------------------------------------------------------------
current input line : 0:
current input line : 0: ------------------------------------------------------------------------
current input line :
current input line : current input line :
------------------------------------------------------------------------
0: 0:
------------------------------------------------------------------------ 0:
0: ------------------------------------------------------------------------ ------------------------------------------------------------------------
0:
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------For more information see the NWChem manual at For more information see the NWChem manual at ------------------------------------------------------------------------
------------------------------------------------------------------------ ------------------------------------------------------------------------
http://www.emsl.pnl.gov/docs/nwchem/nwchem
------------------------------------------------------------------------http://www.emsl.pnl.gov/docs/nwchem/nwchem
.html
------------------------------------------------------------------------
For more information see the NWChem manual at For more information see the NWChem manual at For further details see manual section: .html
For more information see the NWChem manual at
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
http://www.emsl.pnl.gov/docs/nwchem/nwchem
.htmlapplication called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
http://www.emsl.pnl.gov/docs/nwchem/nwchem
For further details see manual section: For further details see manual section: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 14
For more information see the NWChem manual at
.htmlFor more information see the NWChem manual at
http://www.emsl.pnl.gov/docs/nwchem/nwchemapplication called MPI_Abort(MPI_COMM_WORLD, 0) - process 4
2:2:createfile: failed ga_create size=:: 2137779302
http://www.emsl.pnl.gov/docs/nwchem/nwchemapplication called MPI_Abort(MPI_COMM_WORLD, 0) - process 5
.html
(rank:2 hostname:trestles-2-32.local pid:25706):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
http://www.emsl.pnl.gov/docs/nwchem/nwchem.html.html
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 6
For further details see manual section:
For further details see manual section:
3:3:createfile: failed ga_create size=:: 2137779302
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 7
For further details see manual section: For further details see manual section: (rank:3 hostname:trestles-2-32.local pid:25707):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
1:1:createfile: failed ga_create size=:: 2137779302
(rank:1 hostname:trestles-2-32.local pid:25705):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
7:7:createfile: failed ga_create size=:: 2137779302
5:5:createfile: failed ga_create size=:: 2137779302
(rank:7 hostname:trestles-2-32.local pid:25711):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
(rank:5 hostname:trestles-2-32.local pid:25709):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
6:6:createfile: failed ga_create size=:: 2137779302
4:4:createfile: failed ga_create size=:: 2137779302
(rank:6 hostname:trestles-2-32.local pid:25710):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
(rank:4 hostname:trestles-2-32.local pid:25708):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 8
Any help would be very very very appreciated. Thanks.
Keith McLaughlin
University of South Florida
|