I'm trying to do MP2 geometry optimization with nwchem on the BlueGene/Q machine (Fermi) at CINECA. I'd like to report my experience with the code on this architecture, and thus hopefully obtain some guidance or suggestions on how to improve the performance - or understand what I'm doing wrong. In short, I find scaling is good only if 1 core/node is used, and crashes occur if more than one I/O node (1024 cores) is requested.

I've put everything here in this google doc:


due to problems with "The specified URL cannot be found" that I wasn't able to resolve.
Any comments or suggestions most welcome!