11:29:00 AM PDT - Fri, Jul 18th 2014 |
|
Alfredo
I think I have experienced myself similar problems on Infiniband networks.
In order to better understand what is going on, we need to have a closer look at the problem.
1) Did you see any other error/warning message (either in the error or output file), for example relative to memory?
2) Your kernel setting is correct, however, there might be something on the openib side preventing the value to be set correctly.
Therefore, we need to look at what value of SHMMAX is actually used during your NWChem runs
To do this, I suggest you to recompile the tools after applying a patch.
Here is what you should do
1) cd $NWCHEM_TOP/src/
2) wget http://nwchemgit.github.io/images/Reportshmmax.patch.gz
3) gzip -d Reportshmmax.patch.gz
4) patch -p0 < Reportshmmax.patch
5) cd tools/build
6) make install
7) cd ../..
8) make link
If you now try to run NWChem, you should be getting -- in the initial part of the output file -- a line that reports the value of SHMMAX
Once we are sure about the value of SHMMAX being used, we might have to look at problems of memory registration in openib
|
|