Compiling nwchem-6.3 in a contemporary HPC with Xeon PHI

Click here for full thread

Clicked A Few Times

8:42:56 AM PDT - Thu, Jun 5th 2014
Comparing armci comex threading and PHI
After the QA tests (which I recommend comparing by hand, sometimes the "Failure" is a false alarm) I've done a small and straightforward benchmark using only 1 node, and I find the results surprising Average timings for "armci" "armci+openmp" and "comex" libraries. Exactly the same job script. SCF + DFT energy job in a medium sized system. 1 node (40 processors, 64 gb ram), 10 processors used, OMP_NUM_THREADS=4 when applicable. Time armci :Total times cpu: 6401.5s wall: 6414.5s armci + openmp :Total times cpu: 6649.3s wall: 6382.4s comex :Total times cpu: 31301.5s wall: 31351.1s armci + openmp with phi :Total times cpu: 6610.3s wall: 6352.4s 1. Nothing is offloaded to the phi card 2. A sampling of the process for 15 minutes showed no threading. At least dgemm should have threaded within this time period. 3. COMEX binary in a single node is extremely slow (~5x) I really don't know what went wrong. The binary seems to contain correct references, but are they called? Is it normal for COMEX to be this slow? Environment settings for Intel MIC automatic offload OFFLOAD_REPORT=2 MKL_MIC_ENABLE=1

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Dr. O. Baris Malcioglu,
University of Liege,
Bât. B5 Physique de la matière condensée
allée du 6 Août 17
4000 Liège 1
Belgique