Progress/Plan for optimizing NWChem to Xeon Phi Knights Landing?

Click here for full thread

Clicked A Few Times

4:29:41 AM PDT - Wed, Mar 15th 2017
Hi All, May I know is there any update on the nwchem development for KNL (socket based version)? We have a 144-node KNL system and I managed to compiled a copy of NWChem with ARMCI-MPi using intel compiler and impi, OpenMP is enabled and MIC-AVX512 was added in the compilers flags. When I run a C240 dft benchmark job across the full system, (1) I run 144 MPI task (1 per node) and OMP_NUM_THREADS=64, it works, but is not impressingly fast, I noted that no more then 1 core in each node (or socket) has ever been used (I supposed OMP/hybrid parallization has not been implemented for DFT yet, please correct me if I am wrong) (2) I run the same job with 9216 MPI tasks instead (1 task per core), but it just hang after printing out the basis set information Summary of "ao basis" -> "ao basis" (cartesian) ------------------------------------------------------------------------------ Tag Description Shells Functions and Types ---------------- ------------------------------ ------ --------------------- C user specified 6 15 3s2p1d (hang here) Can anyone suggested me the best way to build nwchem on a KNL system with EDR IB connected, i.e. what is the choice of ARMCI_NETWORK, what combination of MKL, LAPACK and Scalapack, as well as how to use the MIC-AVX512 instruction. In addition also please suggest the best way to run nwchem on this system (i.e. pure MPI or hybrid MPI-OMP ?) Thanks a lot! PS I just found this section of the compilation instruction, but welcome if there are more hints to me, Thanks! http://nwchemgit.github.io/index.php/Compiling_NWChem#How-to:_Intel_Xeon_Phi ~ Dominic Chien