Progress/Plan for optimizing NWChem to Xeon Phi Knights Landing?


Click here for full thread
Gets Around
Quote:Chiensh Mar 20th 7:29 am
Than you Jeff
Quote:Jhammond Mar 15th 9:53 pm

All the work I know of pertains to NWPW and CC modules.

That also what I expect, and I understand that these 2 methods are so important for NWChem and they have highest priority to be ported over this architecture, however HF and DFT scf are the fundamental part of almost all calculations, so I hope OMP/MPI hybrid parallelizations will be available ASAP.

We understand this. However, SCF calculations bottleneck in atomic integrals. The NWChem atomic integral library is fast on standard server hardware (e.g. Xeon) but it is not vectorized nor is it threaded. It's not even thread-safe, either, so we either need to rewrite most of the code or refactor the code to use another atomic integral library. Neither of these efforts are easy.

Quote:Chiensh Mar 20th 7:29 am

Quote:Jhammond Mar 15th 9:53 pm

C240 isn't big enough for that many MPI ranks. Try running 32 ranks per node on 1-16 nodes.
In generally, it is imprudent to start with full-machine jobs. Run on one node and scale up slowly.

I see, and I already finished the smaller calculations, and expect will be inefficient to run on such a large MPI rank, but I did not expect it simply hang ... ....


It's possible that it was just running ridiculously slowly. In any case, if you scaled up slowly, you know where the optimal number of nodes is already.

Quote:Chiensh Mar 20th 7:29 am

I also noticed that the current compiling document for KNL is confusing (http://nwchemgit.github.io/index.php/Compiling_NWChem#How-to:_Intel_Xeon_Phi):
...
This section describes both the newer KNL and older KNC hardware, in reverse chronological order.
Compiling NWChem on self-hosted Intel Xeon Phi Knights Landing processors
NWChem 6.6 (and later versions) support OpenMP threading, which is essential to obtaining good performance with NWChem on Intel Xeon Phi many-core processors.
As of November 2016, the development version of NWChem contains threading support in the TCE coupled-cluster codes (primarily non-iterative triples in e.g. CCSD(T)), semi-direct CCSD(T), and plane-wave DFT (i.e. NWPW).
...


Our documentation is not always perfect. What do you want to see changed here? I will fix it.

Quote:Chiensh Mar 20th 7:29 am

however, enabling the USE_F90_ALLOCATABLE flag for stable version of NWChem 6.6 will cause compilation error:
...
because l_a and l_t are not defined when USE_F90_ALLOCATABLE is enabled.


This is just a bug. It does not exist in the latest version of the code. Can you download the trunk version instead?