Compiling nwchem-6.3 in a contemporary HPC with Xeon PHI


Click here for full thread
Gets Around
Obm,

I am using the 64 bit integer interface to OpenBLAS, and I have compiled OpenBLAS without threading (USE_THREAD=0) for NWChem.

Between several BLAS-linked programs I use, some prefer the 32 bit integer interface, some prefer 64 bit, and some prefer threads on or off. I actually have all 4 permutations of OpenBLAS threading and integer size installed, and I use wrapper scripts that set LD_LIBRARY_PATH to make sure that each program finds its preferred OpenBLAS variant.

I am using the default openmpi 1.6.5 that is packaged for Ubuntu 14.04.

I am using the bundled Global Arrays that came with the June 2014 NWChem snapshot: http://nwchemgit.github.io/download.php?f=Nwchem-dev.revision25716-src.2014-06-09.tar.gz

The bundled GA is svn revision 10496 of Global Arrays, but I don't know what official version number that corresponds to because I can't view the svn repository for Global Arrays. My ARMCI_NETWORK=SOCKETS.

I only have 4 physical cores -- I run NWChem on a laptop -- and this was the timing I saw for your test input above running on 4 cores:

Total times cpu: 3428.8s wall: 3442.3s

I don't know what sort of scaling to expect for this job on 40 cores, but your speedup seems to be substantial, unlike what you observed for tiny QA calculations.

Maybe you are only having speed problems with small jobs? I have seen mysterious slowdown for single-node execution on my laptop depending on the ARMCI_NETWORK setting. See for example the later posts in this thread: http://nwchemgit.github.io/Special_AWCforum/st/id1303/compile_nwchem-6-3_with_open...