Difficulties offloading the ccsd calculation with Xeon Phi


Click here for full thread
Just Got Here
Hello,

I'm having some difficulties getting the Xeon Phi offload working with NWChem 6.6.

I compiled NWChem without errors, and the vectorization output during compilation showed that the offload sections were properly compiled. I do not have any problems executing the application, and NWChem identifies the Xeon Phi devices (verified with OFFLOAD_REPORT=2).

NWC_RANKS_PER_DEVICE=1 mpirun -np 2 ./nwchem ./input.nw
 argument  1 = ./input.nw

...


                     0  ppn                      2
                     0  offload_master 
                     1  offload_master 
[Offload] [MIC 0] [File]            util_mic_support.c
[Offload] [MIC 0] [Line]            198
[Offload] [MIC 0] [Tag]             Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        3.734172(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   0 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        0.000408(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   4 (bytes)

00: micdev=0 nprocs=59 rank_on_dev=0 ranks_per_device=1 affinity='KMP_PLACE_THREADS=59c,4t,0o' pos=27
[Offload] [MIC 0] [File]            util_mic_support.c
[Offload] [MIC 0] [Line]            216
[Offload] [MIC 0] [Tag]             Tag 1
[Offload] [HOST]  [Tag 1] [CPU Time]        0.004035(seconds)
[Offload] [MIC 0] [Tag 1] [CPU->MIC Data]   512 (bytes)
[Offload] [MIC 0] [Tag 1] [MIC Time]        0.003767(seconds)
[Offload] [MIC 0] [Tag 1] [MIC->CPU Data]   0 (bytes)

[Offload] [MIC 1] [File]            util_mic_support.c
[Offload] [MIC 1] [Line]            198
[Offload] [MIC 1] [Tag]             Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        3.839402(seconds)
[Offload] [MIC 1] [Tag 0] [CPU->MIC Data]   0 (bytes)
[Offload] [MIC 1] [Tag 0] [MIC Time]        0.000371(seconds)
[Offload] [MIC 1] [Tag 0] [MIC->CPU Data]   4 (bytes)

01: micdev=1 nprocs=59 rank_on_dev=0 ranks_per_device=1 affinity='KMP_PLACE_THREADS=59c,4t,0o' pos=27
[Offload] [MIC 1] [File]            util_mic_support.c
[Offload] [MIC 1] [Line]            216
[Offload] [MIC 1] [Tag]             Tag 1
[Offload] [HOST]  [Tag 1] [CPU Time]        0.004166(seconds)
[Offload] [MIC 1] [Tag 1] [CPU->MIC Data]   512 (bytes)
[Offload] [MIC 1] [Tag 1] [MIC Time]        0.003898(seconds)
[Offload] [MIC 1] [Tag 1] [MIC->CPU Data]   0 (bytes)
...


The problem I'm having is that this is the only time OFFLOAD_REPORT output is generated during execution, even during the ccsd calculation which should also be offloaded. Since these reports are not being generated elsewhere, I can only conclude that the computations are handled by the CPU. I suspect the problem is with my input, and thus I'm hoping someone can help me come up with a working example.

My input file is based on the ccsd sample, and it runs very well on the CPU.

start n2 

geometry
  symmetry d2h
  n 0 0 0.542
end

basis spherical
  n library cc-pvtz
end

mp2
  freeze core
end

task mp2 optimize

ccsd
  freeze core
end

set ccsd:use_trpdrv_omp T
set ccsd:use_trpdrv_offload T

task ccsd(t)
task ccsd
task ccsd+t(ccsd)


According to my sources, Intel and the ccsd documentation, I think that this input file should work and I do not understand why the calculation is not being offloaded. I've also tried all three versions of the ccsd calculation, and verified that NWC_RANKS_PER_DEVICE=0 runs exclusively on the CPU. I'm currently running the input file provide by the Intel source, but I may not have an answer until tomorrow because of the time required to solve each iteration.

Also, to illustrate what my goals are: I'm not using NWChem to solve any particular problem, I'm 100% interested in utilizing the offload to Xeon Phi for performance testing and execution modeling. I just need an input file that will use NWChem in offload mode, reported correctly by OFFLOAD_REPORT.

All help is greatly appreciated, thank you.
Gary