NWChem 6.5 and Intel Xeon Phis with intel 14.0.3 and intelmpi 4.1.3


Click here for full thread
Clicked A Few Times
Hi Again Edorado - apologies for the delay in reply.

I've got the QA tests running on a single node with 2 Phis attached.

The next issues I have is when I try and run any calculation on more than one node.

e.g. the C2H4.nwQA test gives the following when run on nodes=2:ppn=2

2: error ival=-5
(rank:2 hostname:pillowb5 pid:78877):ARMCI DASSERT fail. ../../ga-5-3/armci/src/devices/openib/openib.c:armci_send_complete():459 cond:(pdscr->status==IBV_WC_SUCCESS)
Last System Error Message from Task 2:: Bad address
3: error ival=5
(rank:3 hostname:pillowb5 pid:78877):ARMCI DASSERT fail. ../../ga-5-3/armci/src/devices/openib/openib.c:armci_send_complete():459 cond:(pdscr->status==IBV_WC_SUCCESS)
Last System Error Message from Task 3:: Bad address
application called MPI_Abort(comm=0x84000001, 1) - process 3
application called MPI_Abort(comm=0x84000001, 1) - process 2
rank = 3, revents = 8, state = 8
Assertion failed in file ../../socksm.c at line 2963: (it:plfd->revents & POLLERR) == 0
internal ABORT - proces 1
rank = 2, revents = 8, state = 8
Assertion failed in file ../../socksm.c at line 2963: (it:plfd->revents & POLLERR) == 0
internal ABORT - proces 0


I do not get this problem when I don't use the USE_OPENMP=1 and USE_OFFLOAD=1 variables when compiling.