QA tests failing


Click here for full thread
Gets Around
I can't tell you whether or not the differences you observe are large enough to be considered "out of spec". I can tell you that that the qm test scripts are kind of a mess. They include a bunch of jobs that are not considered reliable enough to run nightly. The failure criteria are over-sensitive; you spend a lot of time wading through 6th decimal place differences to find significant differences.

I wrote my own script to streamline the post-build QA process. It creates test-execution scripts for you, using only the subset of tests considered reliable enough for nightlies, then shows test results sorted by severity of deviation from references. It also allows you to control the time cost of tests you are willing to run, e.g. "no expense greater than 5000 core-seconds according to the reference output." A "core-second" is a convenient mishmash of units: it is the reference output job completion wall clock time in seconds multiplied by the number of processors used.

Here's the code: https://github.com/mattbernst/nwflask/blob/master/chemnw/qacheck.py

Suppose you download qacheck.py to your home directory and you have built NWChem under /opt/science/nwchem/Nwchem-dev.revision25890-src.2014-07-18 on a LINUX64 platform.

Then you would do this to run and check all tests that are stable enough for nightly use, allowing for an execution cost of up to 10000 core-seconds per test:

cd ~
cp -r /opt/science/nwchem/Nwchem-dev.revision25890-src.2014-07-18/QA .
cd QA
python ~/qacheck.py --top /opt/science/nwchem/Nwchem-dev.revision25890-src.2014-07-18 --cost 10000 --target LINUX64 --test-root .


That generated two scripts in the working directory, runmpi and runserial. I would try runmpi first and only drop back to runserial if there are unusual problems. I have never personally had serial execution work any better than parallel, at least for the jobs that make it into nightly QA. The script assumes that your just-built nwchem can be found in your PATH. If that is not the case, edit the generated scripts and replace

setenv NWCHEM_EXECUTABLE `which nwchem`

with
setenv NWCHEM_EXECUTABLE /path/to/your/binary/nwchem


Here's how you would run the tests selected above with 4 cores and save the output to a log file:
./runmpi 4 | tee mpi.log


You'll wait a while. The tests are sorted from lowest to highest estimated cost, so tests get slower as the script runs. You can look inside the script to see estimated cost as a comment next to each test. Many tests will self-report failure but most of those failures will be trivial, as you'll see when you run the analysis phase. To analyze the mpi.log generated, do this:

python ~/qacheck.py -l mpi.log