"rtdb open old failed" on BG/Q when old rtdb file exists


Clicked A Few Times
Hello,

I have run into an issue recently when running NWChem 6.6 on the Blue Gene/Q platform. When I try to restart a calculation that previously timed out, I receive the following error:
 rtdb_seq_open: ./second.db does not exist, cannot open old
 start: rtdb_open old failed                    0

I can reproduce this error any time I use the restart keyword, even if the rtdb already exists. Here is a minimal working example that reproduces this error:

first/nwchem.nw:
 start first

 memory total 800 mb

 geometry
   H	0.0000	0.0000	-0.5000
   H	0.0000	0.0000	 0.5000
 end

 basis
   H library "6-311G**"
 end

 dft
   xc b3lyp
   grid fine
 end

 title "First calculation"
 task dft energy

This task completes successfully and produces the rtdb file first.db. I then copy this database, along with the .movecs file, to the same directory as second/nwchem.nw (renaming the files to second.db and second.movecs):
 restart second

 memory total 800 mb

 task dft energy

This job fails with the above error, even though second.db exists in the working directory.

Clicked A Few Times
solution
I have found the cause of this problem, as well as a simple solution. It would seem that rtdb_open_seq is failing when set to "old" mode because it checks for file existence using the access() syscall. Apparently the BG/Q access() implementation (at least on our BG/Q - I do not know if it is universal) always fails for R_OK | W_OK flags unless the file permission bits are set to 777 (or possibly 666, it occurred to me just now that I did not test this case). In any case, these are not the default permission bits for new files, so rtdb files produced by a calculation cannot be reused without first modifying the permissions accordingly.

To get around this, I have modified src/rtdb/rtdb_seq.c, changing the code near line 313 in rtdb_seq_open():
   int exists = access(filename, R_OK | W_OK) == 0;

to
 #if defined(__bgq__)
   int exists = access(filename, F_OK) == 0;
 #else
   int exists = access(filename, R_OK | W_OK) == 0;
 #endif

This fixes this behavior on BG/Q while leaving behavior on other platforms unchanged.

Forum Vet
Thanks for the patch
I have just committed this change to the repository

Cheers, Edo


Forum >> NWChem's corner >> Running NWChem