2:31:56 AM PDT - Wed, Aug 15th 2012 |
|
Dear Bert/all,
even though one of my problem has been resolved (multi-CPU convergence), one another had arised. ;-(
A user, who is using our infrastructure, had reported, that the following computation crashes with "failed ga_create" error (see details below).
The computation:
start cis1
memory 10000 mb
geometry units bohr
S 0.000000000 0.073468702 -1.957631386
N 0.000000000 -1.017192265 1.304155528
O 0.000000000 0.582796470 2.868174306
H 0.000000000 2.547479474 -1.383337299
end
basis
S library aug-cc-pVDZ
N library aug-cc-pVDZ
O library aug-cc-pVDZ
H library aug-cc-pVDZ
end
scf
singlet
rhf
end
tce
scf
ccsdt
freeze core atomic
nroots 1
targetsym a'
symmetry
thresh 1.0d-5
dipole
end
task tce energy
The computation had been run on a single node, having dedicated 4 CPUs and 100GB of memory. The CPU was Intel Xeon CPU E7- 2860 @ 2.27GHz. When computing "EOM-CCSDT right-hand side iterations", the computation did fail with the following error:
Iteration 5 using 5 trial vectors
available GA memory 299908168 bytes
createfile: failed ga_create size=162410843
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
0:
------------------------------------------------------------------------
...
Last System Error Message from Task 0:: Illegal seek
Last System Error Message from Task 3:: Illegal seek
Last System Error Message from Task 1:: Illegal seek
Last System Error Message from Task 2:: Illegal seek
3:3:createfile: failed ga_create size=:: 162410843
(rank:3 hostname:zewura1.cerit-sc.cz pid:21865):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
1:1:createfile: failed ga_create size=:: 162410843
(rank:1 hostname:zewura1.cerit-sc.cz pid:21863):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
2:2:createfile: failed ga_create size=:: 162410843
(rank:2 hostname:zewura1.cerit-sc.cz pid:21864):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
0:0:createfile: failed ga_create size=:: 162410843
(rank:0 hostname:zewura1.cerit-sc.cz pid:21862):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
The full computation output is available here...
Please, is there anybody, who can give me a hint, how to resolve the issue?
Thanks a lot for any advice...
--best
Tom, Czech Republic.
PS: If one needs our compilation options, see them in this thread...
|
|