Compiling NWChem with Intel 2016


Clicked A Few Times
Hi,

Has anyone successfully compiled NWChem 6.6 with the latest Intel compilers (Parallel Studio 2016) on Linux? For me it makes an executable which runs fine in serial but hangs with no error when run with 2 or more processes. I had no problems compiling 6.6 with Intel 2015 so I suspect it is a compiler problem but just wondering if others see the same.

For info here is my build script with paths set for Intel 2016 update 2:

export NWCHEM_TOP=/home/twk/dev/nwchem-6.6-ifort16-64to32
export NWCHEM_TARGET=LINUX64
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export LIBMPI="-Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /opt/intel/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt -Xlinker -rpath -Xlinker /opt/intel/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/5.1/intel64/lib/release_mt -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/5.1/intel64/lib -lmpifort -lmpi -lmpigi -ldl -lrt -lpthread"
export MPI_LIB=/opt/intel/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib
export MPI_INCLUDE=/opt/intel/compilers_and_libraries_2016.2.181/linux/mpi/intel64/include
export MKLDIR=/opt/intel/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64
# lp for 32-bit libraries. Leaving out scalapack for now
export BLASOPT="-L${MKLDIR} -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm"
export USE_64TO32=y
export HAS_BLAS=yes
export USE_OPENMP=y
cd $NWCHEM_TOP/src

make FC=ifort nwchem_config NWCHEM_MODULES=all 
make 64_to_32 &> make.64_to_32.log
make FC=ifort &> make.log

Forum Vet
Please unset MPI_LIB MPI_INCLUDE LIBMPI
Please unset MPI_LIB MPI_INCLUDE LIBMPI.
Set PATH to the directory where mpif90 is installed, e.g.
export PATH=/opt/intel/compilers_and_libraries_2016.2.181/linux/mpi/intel64/bin:$PATH
unset MPI_LIB
unset MPI_INCLUDE
unset LIBMPI
If this still fails, please upload the log files to a public viewable website.
http://nwchemgit.github.io/index.php/FAQ#How_to_fix_configure:_error:_could_not_compile_simp...
http://nwchemgit.github.io/index.php/Compiling_NWChem#MPI_variables

Clicked A Few Times
Hi Edo,

Unfortunately it still fails in the same way. I've uploaded the compile logs and a sample test output to show where it hangs:

ftp://ftp.dl.ac.uk/qcg/nwchem-forum/

(make.log, make.64_to_32.log, h2o_opt.p2.out)

Thanks,

Tom

Forum Vet
Could you attach a debugger to the running/stuck processes and provide a debug stack trace?

Clicked A Few Times
Here is process 0:

                           NWChem Geometry Optimization
----------------------------


^C
Program received signal SIGINT, Interrupt.
MPID_nem_mpich_blocking_recv (cell=<optimised out>, in_fbox=<optimised out>,
completions=<optimised out>)
at ../../src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:1148
1148 ../../src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h: No such file or directory.
(gdb) bt
#0 MPID_nem_mpich_blocking_recv (cell=<optimised out>,
in_fbox=<optimised out>, completions=<optimised out>)
at ../../src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:1148
#1 PMPIDI_CH3I_Progress (progress_state=0x7fffffffaa88,
is_blocking=-227683904)
at ../../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:455
#2 0x00007ffff362be39 in MPIC_Wait (request_ptr=0x7fffffffaa88)
at ../../src/mpi/coll/helper_fns.c:281
#3 0x00007ffff362c175 in MPIC_Send (buf=0x7fffffffaa88, count=-227683904,
datatype=1, dest=0, tag=201578176, comm=201578352, errflag=0x7fffffffad60)
at ../../src/mpi/coll/helper_fns.c:357
#4 0x00007ffff3506fd2 in MPIR_Bcast_binomial (buffer=0x7fffffffaa88,
count=-227683904, datatype=1, root=0, comm_ptr=0xc03d6c0,
errflag=0xc03d770) at ../../src/mpi/coll/bcast.c:294
#5 0x00007ffff350b279 in MPIR_Bcast_intra (buffer=0x7fffffffaa88,
count=-227683904, datatype=1, root=0, comm_ptr=0xc03d6c0,
errflag=0xc03d770) at ../../src/mpi/coll/bcast.c:1613
#6 0x00007ffff35092cf in I_MPIR_Bcast_intra (buffer=0x7fffffffaa88,
count=-227683904, datatype=1, root=0, comm_ptr=0xc03d6c0,
errflag=0xc03d770) at ../../src/mpi/coll/bcast.c:2031
#7 0x00007ffff350ca2b in MPIR_Bcast (buffer=<optimised out>,
count=<optimised out>, datatype=<optimised out>, root=<optimised out>,
comm_ptr=<optimised out>, errflag=<optimised out>)
at ../../src/mpi/coll/bcast.c:1854
#8 MPIR_Bcast_impl (buffer=0x7fffffffaa88, count=-227683904, datatype=1,
root=0, comm_ptr=0xc03d6c0, errflag=0xc03d770)
at ../../src/mpi/coll/bcast.c:1829
#9 0x00007ffff350c43e in PMPI_Bcast (buffer=0x7fffffffaa88, count=-227683904,
datatype=1, root=0, comm=201578176) at ../../src/mpi/coll/bcast.c:2438
#10 0x0000000002ea6131 in wnga_msg_brdcst ()
#11 0x0000000002e9ae0c in GA_Brdcst ()
#12 0x0000000000b75f78 in rtdb_broadcast ()
#13 0x0000000000b766a6 in rtdb_get ()
#14 0x0000000000b75802 in rtdb_get_ ()
#15 0x0000000000adb1ec in geom_rtdb_load (rtdb=140733193388464, geom=0,
name=<error reading variable: Cannot access memory at address 0x1>,
.tmp.NAME.len_V$ba1=0) at geom.F:599
#16 0x0000000000539ac0 in driver_initialize (rtdb=140733193388464, geom=0)
at opt_drv.F:1163
#17 0x0000000000527625 in driver (rtdb=140733193388464) at opt_drv.F:45
#18 0x000000000042189b in task_optimize (rtdb=140733193388464)
at task_optimize.F:146
#19 0x00000000004114a8 in task (rtdb=140733193388464) at task.F:384
#20 0x00000000004085e5 in nwchem () at nwchem.F:285
#21 0x00000000004080de in main ()
#22 0x00007ffff2916ec5 in __libc_start_main (main=0x4080b0 <main>, argc=2,
argv=0x7fffffffc498, init=<optimised out>, fini=<optimised out>,
rtld_fini=<optimised out>, stack_end=0x7fffffffc488) at libc-start.c:287
#23 0x0000000000407fdd in _start ()

And process 1:

^C
Program received signal SIGINT, Interrupt.
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007ffff36d34f2 in process_vm_readv (pid=<optimised out>,
lvec=<optimised out>, liovcnt=<optimised out>, rvec=<optimised out>,
riovcnt=<optimised out>, flags=<optimised out>)
at ../../src/mpid/ch3/channels/nemesis/src/mpid_nem_lmt_dcp.c:27
#2 dcp_read (hndl=-22448, dst=<optimised out>, src=<optimised out>,
len=<optimised out>)
at ../../src/mpid/ch3/channels/nemesis/src/mpid_nem_lmt_dcp.c:119
#3 dcp_recv (vc=<optimised out>, req=<optimised out>,
dcp_cookie=<optimised out>, recv_done=<optimised out>)
at ../../src/mpid/ch3/channels/nemesis/src/mpid_nem_lmt_dcp.c:148
#4 MPID_nem_lmt_dcp_start_recv (vc=0x5bf9, rreq=0x7fffffffa860, s_cookie=...)
at ../../src/mpid/ch3/channels/nemesis/src/mpid_nem_lmt_dcp.c:287
#5 0x00007ffff36d23ad in do_cts (vc=<optimised out>, rreq=<optimised out>,
complete=<optimised out>)
at ../../src/mpid/ch3/channels/nemesis/src/mpid_nem_lmt.c:612
#6 pkt_RTS_handler (vc=0x5bf9, pkt=0x7fffffffa860, buflen=0x1,
rreqp=0xffffffffffffffff)
at ../../src/mpid/ch3/channels/nemesis/src/mpid_nem_lmt.c:282
#7 0x00007ffff3520dbe in PMPIDI_CH3I_Progress (progress_state=0x5bf9,
is_blocking=-22432)
at ../../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:488
#8 0x00007ffff362be39 in MPIC_Wait (request_ptr=0x5bf9)
at ../../src/mpi/coll/helper_fns.c:281
#9 0x00007ffff362c47a in MPIC_Recv (buf=0x5bf9, count=-22432, datatype=1,
source=-1, tag=1, comm=0, status=0x7fffffffab00, errflag=0x7fffffffad60)
at ../../src/mpi/coll/helper_fns.c:407
#10 0x00007ffff350722b in MPIR_Bcast_binomial (buffer=0x5bf9, count=-22432,
datatype=1, root=-1, comm_ptr=0x1, errflag=0x0)
at ../../src/mpi/coll/bcast.c:244
#11 0x00007ffff350b279 in MPIR_Bcast_intra (buffer=0x5bf9, count=-22432,
datatype=1, root=-1, comm_ptr=0x1, errflag=0x0)
at ../../src/mpi/coll/bcast.c:1613
#12 0x00007ffff35092cf in I_MPIR_Bcast_intra (buffer=0x5bf9, count=-22432,
datatype=1, root=-1, comm_ptr=0x1, errflag=0x0)
at ../../src/mpi/coll/bcast.c:2031
#13 0x00007ffff350ca2b in MPIR_Bcast (buffer=<optimised out>,
count=<optimised out>, datatype=<optimised out>, root=<optimised out>,
comm_ptr=<optimised out>, errflag=<optimised out>)
at ../../src/mpi/coll/bcast.c:1854
#14 MPIR_Bcast_impl (buffer=0x5bf9, count=-22432, datatype=1, root=-1,
comm_ptr=0x1, errflag=0x0) at ../../src/mpi/coll/bcast.c:1829
#15 0x00007ffff350c43e in PMPI_Bcast (buffer=0x5bf9, count=-22432, datatype=1,
root=-1, comm=1) at ../../src/mpi/coll/bcast.c:2438
#16 0x0000000002ea6131 in wnga_msg_brdcst ()
#17 0x0000000002e9ae0c in GA_Brdcst ()
#18 0x0000000000b75f78 in rtdb_broadcast ()
#19 0x0000000000b766a6 in rtdb_get ()
#20 0x0000000000b75802 in rtdb_get_ ()
#21 0x0000000000adb1ec in geom_rtdb_load (
rtdb=<error reading variable: Cannot access memory at address 0x5bf9>,
geom=29052167,
name=<error reading variable: Cannot access memory at address 0x1>,
.tmp.NAME.len_V$ba1=-1) at geom.F:599
#22 0x0000000000539ac0 in driver_initialize (
rtdb=<error reading variable: Cannot access memory at address 0x5bf9>,
geom=29052167) at opt_drv.F:1163
#23 0x0000000000527625 in driver (
rtdb=<error reading variable: Cannot access memory at address 0x5bf9>)
at opt_drv.F:45
#24 0x000000000042189b in task_optimize (
rtdb=<error reading variable: Cannot access memory at address 0x5bf9>)
at task_optimize.F:146
#25 0x00000000004114a8 in task (
rtdb=<error reading variable: Cannot access memory at address 0x5bf9>)
at task.F:384
#26 0x00000000004085e5 in nwchem () at nwchem.F:285
#27 0x00000000004080de in main ()
#28 0x00007ffff2916ec5 in __libc_start_main (main=0x4080b0 <main>, argc=2,
argv=0x7fffffffc498, init=<optimised out>, fini=<optimised out>,
rtld_fini=<optimised out>, stack_end=0x7fffffffc488) at libc-start.c:287
#29 0x0000000000407fdd in _start ()


Tom

Forum Vet
BLAS_SIZE=4
Tom
please set
export BLAS_SIZE=4

The recompile (from scratch = make clean) the tools and 64to32blas directories

Clicked A Few Times
After scouring the Intel forums I think I've found the bug. It relates to this:

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/610561

When I use the suggested workaround I_MPI_SHM_LMT=shm, both the pingpong test and NWChem run successfully to completion.

Apparently this problem is specific to Ubuntu. Hopefully as it is a known issue it will disappear in future releases of the Intel MPI library (I am currently using 5.1.3).

Tom


Forum >> NWChem's corner >> Compiling NWChem