Number of MPI Task Per Note when ARMCI NETWORK=OpenIB and MCDRAM


Clicked A Few Times
Hi

May I confirm that only 1 MPI Task is allowed per node when MCDRAM is used and ARMCI_NETWORK=OpeinIB?

I compiled 2 copies of NWChem 6.8, one set FASTMEM=F (nwchem-cache-opeinb) and one FASTMEM=T (nwchem-mcdram-opeinb) , I run both on a KNL cluster which enable MCDRAM=FLAT mode,

I managed to run nwchem-cache-opeinb with multiple tasks (e.g. 16 mpi tasks with OMP thread=4) on this cluster:
chiensh@r1i5n21:~/nwtest$ mpirun -perhost 16 -np 256  ./nwchem-cache-openib ./W8.nw
argument 1 = ./W8.nw
NWChem w/ OpenMP: maximum threads = 4

============================== echo of input deck ============================== echo
start rubbish.w8
#scratch_dir /dev/shm/chiensh
#permanent_dir /home/users/astar/ihpc/chiensh/nwtest
memory stack 1000 mb heap 200 mb global 10000 mb noverify
#print medium "task time" "ga stats" "ma stats" "version" "rtdbvalues"
geometry units angstrom noautoz noprint
#---------------
#Octamer *** D2d
#---------------

When I run nwchem-mcdram-opeinb, it fail;

chiensh@r1i5n21:~/nwtest$ mpirun -perhost 16 -np 256  ./nwchem-mcdram-openib ./W8.nw
argument 1 = ./W8.nw
ARMCI supports block process mapping only
0:Cannot run: improper task to host mapping!: 0
(rank:0 hostname:r1i5n21 pid:14352):ARMCI DASSERT fail. ../../ga-
5.6.3/armci/src/common/clusterinfo.c:process_hostlist():189 cond:0
Last System Error Message from Task 0:: No such file or directory
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
nwchem-mcdram-ope 0000000005250BB4 Unknown Unknown Unknown
libpthread-2.17.s 00002AAAB08EA370 Unknown Unknown Unknown
libibverbs.so.1.0 00002AAAB0D0D4E9 ibv_destroy_cq Unknown Unknown
nwchem-mcdram-ope 0000000005200A80 Unknown Unknown Unknown
nwchem-mcdram-ope 00000000051E351E Unknown Unknown Unknown
nwchem-mcdram-ope 00000000051E61F5 Unknown Unknown Unknown
nwchem-mcdram-ope 00000000051E3FD7 Unknown Unknown Unknown
nwchem-mcdram-ope 00000000051CCF1B Unknown Unknown Unknown
nwchem-mcdram-ope 000000000040CA77 Unknown Unknown Unknown
nwchem-mcdram-ope 000000000040C89E Unknown Unknown Unknown
libc-2.17.so 00002AAAB169EB35 __libc_start_main Unknown Unknown
nwchem-mcdram-ope 000000000040C7A9 Unknown Unknown Unknown

Forum Vet
Same issue discussed in
http://nwchemgit.github.io/Special_AWCforum/st/id1433

It seems that there is a problem in your submission script,
more precisely in the mpirun/mpiexec options.
Your MPI hostfile could be of the kind

node0
node1
node0
node1

ARMCI is not going to work with this kind of settings, the hostfile should be of the following form, instead

node0
node0
node1
node1


Forum >> NWChem's corner >> Running NWChem