runtime error when running nwchem


Clicked A Few Times
Hi,

Now I'm using fep qmmm module to run my jobs on a cluster. The code runs fine and the jobs are finished sometimes. However, some of them, are aborted at some point of the calculation (although they can be later resumed). All of the jobs run here should take relatively the same amount of time to finish (and I allocated enough time per run). Thus, I think some of them shouldn't be aborted prematurely. I looked into the err file and saw this error:

At line 1656 of file util_md.F
At line 1656 of file util_md.F
Fortran runtime error: End of record
At line 1656 of file util_md.F
Fortran runtime error: End of record
Fortran runtime error: End of record
At line 1656 of file util_md.F
Fortran runtime error: End of record
At line 1656 of file util_md.F
Fortran runtime error: End of record
At line 1656 of file util_md.F
Fortran runtime error: End of record

I then tracked down the code util_md.F. At the line 1656, the code is part of the subroutine md_abort as follows:

subroutine md_abort(string, icode)
     implicit none
  1. include "global.fh"
  2. include "stdio.fh"
     character*(*) string
character*255 card
integer icode
if(ga_nodeid().eq.0) then
write(luout,1000) 0,string,icode
1000 format(/,1x,10('*'),/,' * ',i3,': ',a,i5,/,1x,10('*'))
card=' '
else
write(card,1001) ga_nodeid(),string,icode -----> This is the line 1656
1001 format(' * ',i3,': ',a,i5,' * ')
endif
call ga_error(card,icode)
return
end

Any idea what might have gone wrong here?

Best,
Tee

Forum Vet
Tee
You might want to replace the code with this simpler one

      subroutine md_abort(string, icode)
      implicit none
#include "global.fh"
#include "stdio.fh"
      character*(*) string
      character*255 card
      integer icode
      write(luout,1000) ga_nodeid(),string,icode
 1000 format(/,1x,10('*'),/,' * ',i3,': ',a,i5,/,1x,10('*'))
      card=' '
      call ga_error(card,icode)
      return
      end

Clicked A Few Times
Hi Edo,

I have tried your suggestion, editing util_md.F and recompiling the code. It seems to fix the previous error but I came across the the following error instead:

solvent molecule 464 moving to non-neighbor 0 from 35 1.05 1.36 0.03 1.05 1.33 -0.06 0.99 1.30 0.08

This error is the same as the one experienced on cascade and reported to MSCF consulting before. Somehow, with the error above, I cannot restart the job from where it's aborted on the cascade. But the job can be restarted with the code (before the util_md.F has been edited as suggested) but sometimes fails with the 'end of record' error in the original post on another cluster I'm using. Whether or not I edited util_md.F, I still have a problem anyway.

Tee

Quote:Edoapra Jan 23rd 4:52 pm
Tee
You might want to replace the code with this simpler one

      subroutine md_abort(string, icode)
      implicit none
#include "global.fh"
#include "stdio.fh"
      character*(*) string
      character*255 card
      integer icode
      write(luout,1000) ga_nodeid(),string,icode
 1000 format(/,1x,10('*'),/,' * ',i3,': ',a,i5,/,1x,10('*'))
      card=' '
      call ga_error(card,icode)
      return
      end


Forum >> NWChem's corner >> Running NWChem