I have an Intel i7 system with 4 physical cores and 16 GB of memory. I am running under 64 bit Ubuntu 14.04 and I built NWChem against OpenBLAS 0.28 with the 64 bit integer interface.
I am using the June 2014 NWChem snapshot: http://nwchemgit.github.io/download.php?f=Nwchem-dev.revision25716-src.2014-06-09.tar.gz
Specifically, that corresponds to NWChem revision 25716 and ga revision 10496.
I am trying to run a calculation like this:
start dccsinglet
echo
print high
memory stack 3000 mb heap 200 mb global 3600 mb
charge 0
geometry units angstroms
C -0.13183 0.72345 -0.07866
Cl -1.15973 -0.55669 -0.69209
Cl 1.24554 0.01838 0.74329
symmetry c1
end
basis spherical
* library aug-cc-pvtz
end
scf
singlet
uhf
end
tce
io sf
ccsd
end
task tce energy
I started out using "io ga" and slightly more generous memory settings, but even running with only 2 cores my machine was swapping once I got to the CCSD part. I interrupted the job, lowered the memory settings, and switched the I/O scheme to shared file as shown above. I understand that disk based schemes will be slow but they should still work if I am patient, and I can upgrade disk speed more easily than I can install more RAM. I am using a disk with about 2 TB free space for my calculations, and none of my attempts ever led to more than about 6 GB of files stored.
The problem is that the job crashes with a floating point exception as soon as it reaches the ccsd portion if I use shared file IO and two processors, like this:
mpirun -np 2 nwchem dcc-singlet.nw | tee dcc-singlet.nwo
Output from two processor attempt: http://pastebin.com/fSs4X0Et
If I run with only one processor, the job lives longer but ultimately crashes with INVALID ARRAY HANDLE:
mpirun -np 1 nwchem dcc-singlet.nw | tee dcc-singlet.nwo
Output from quasi-serial attempt: http://pastebin.com/xp3ZwVak
I also tried decreasing the tile size with no luck. If you want to see I can provide that output too. Additionally I tried the replicated, fortran, and eaf IO options, both with one and two processors, and they all failed in a similar manner, though I didn't save all of their outputs. Are the disk based IO schemes currently unsupported? I did a find/xargs/grep search through the QA directory and I didn't find a single TCE test using disk based IO.
|