5:15:27 PM PDT - Mon, Oct 21st 2013 |
|
Petri
I think that there are two separate problems affecting your runs on the Cray XC30 at CSC
1) You get plenty of warnings coming out of the libhugetlbfs library. I don't think these warnings are causing your run to fail (more later).
2) Your calculation fails (eventually calling MPI_Abort) since NWChem detects an error.
First, let's talk about point #2. The input for the c240 benchmark available from the NWChem website always causes NWChem to stop with an error. This is caused by the fact the input limit the number of SCF cycle to four ( iterations 4), but the SCF will never converge in four iterations. To avoid this failure, I have checked in a new version of
the input file that avoids generating a fatal error.
As far as the issue with libhugetlbfs is concerned, there are a few suggestions I can pass to you.
If you set the env. variable HUGETLB_VERBOSE equal to zero, all your warning messages are going to vanish and,
at the same time, the wall-time of your jobs will decrease (most likely since each warning messages takes quite a bit of time to be written).
Another thing that I have done that seems to have improved performance and stability for my Cray XC30 runs has been to switch from the GA source code we distrubute with NWChem 6.3 (that resided in the $NWCHEM_TOP/src/tools directory) to the modified version contributed by the Cray folks (available at https://github.com/ryanolson/ga/archive/cray.zip).
Please let me know if you need any help in installing these modified tools source code.
Cheers, Edo
|