Progress/Plan for optimizing NWChem to Xeon Phi Knights Landing?


Click here for full thread
Clicked A Few Times
Hi All,

May I know is there any update on the nwchem development for KNL (socket based version)?

We have a 144-node KNL system and I managed to compiled a copy of NWChem with ARMCI-MPi using intel compiler and impi, OpenMP is enabled and MIC-AVX512 was added in the compilers flags.

When I run a C240 dft benchmark job across the full system,

(1) I run 144 MPI task (1 per node) and OMP_NUM_THREADS=64, it works, but is not impressingly fast, I noted that no more then 1 core in each node (or socket) has ever been used (I supposed OMP/hybrid parallization has not been implemented for DFT yet, please correct me if I am wrong)

(2) I run the same job with 9216 MPI tasks instead (1 task per core), but it just hang after printing out the basis set information
Summary of "ao basis" -> "ao basis" (cartesian)
------------------------------------------------------------------------------
Tag Description Shells Functions and Types
---------------- ------------------------------ ------ ---------------------
C user specified 6 15 3s2p1d
(hang here)


Can anyone suggested me the best way to build nwchem on a KNL system with EDR IB connected, i.e. what is the choice of ARMCI_NETWORK, what combination of MKL, LAPACK and Scalapack, as well as how to use the MIC-AVX512 instruction. In addition also please suggest the best way to run nwchem on this system (i.e. pure MPI or hybrid MPI-OMP ?)

Thanks a lot!

PS I just found this section of the compilation instruction, but welcome if there are more hints to me, Thanks!
http://nwchemgit.github.io/index.php/Compiling_NWChem#How-to:_Intel_Xeon_Phi


~ Dominic Chien