5:45:43 PM PDT - Thu, Mar 23rd 2017 |
|
centos 7.3 NWChem 6.6 segmentation fault
|
I compiled NWC 6.6.revision.-src.2015-10-20 on an Intel system w 64 gb memory, basically following the standard procedure which has worked for other Centos 7 systems. An nwchem binary was created in /bin/ which I usually take to be a successful compilation. However, when I try to run a test job, a segmentation fault is output to the console. Any ideas what went wrong?
Here are the details:
OS and installed programs:
OS: CentOS-7.3-x86_64 7-3.1611.el7
openmpi.x86_64 1.10.0-10.el7 and
openmpi-develop.x86_64 1.10.0-10.el7
make.x86_64 3.82-21.el7
python.x86_64 2.7.5-39.el7_2
python-devel.x86_64 2.7.5-39.el7_2
gcc.x86_64 4.8.5-4.el7
gcc-c++.x86_64 4.8.5-4.el7
gcc-gfortran.x86_64 4.8.5-4.el7
perl.x86_64 4:5.16.3-286.el7
perl-libs.x86_64 4:5.16.3-286.el7
tcsh.x86_64 4:5.16.3-286.el7
openssh.x86_64 6.6.1pl-25.el7_2
openssh-clients.x86_64 6.6.1pl-25.el7_2
openblas.x86_64 0.2.19-3.el7
openblas-devel.x86_64 0.2.19-3.el7
openblas-openmp.x86_64 0.2.19-3.el7
openblas-openmp64.x86_64 0.2.19-3.el7
openblas-openmp64_.x86_64 0.2.19-3.el7
openblas-serial64.x86_64 0.2.19-3.el7
openblas-serial64_.x86_64 0.2.19-3.el7
openblas-threads.x86_64 0.2.19-3.el7
openblas-threads64.x86_64 0.2.19-3.el7
openblas-threads64_.x86_64 0.2.19-3.el7
scalapack-openmpi-devel.x86_64 2.0.2-15.el7
scalapack-common.x86_64 2.0.2-15.el7
blas.x86_64 3.4.2-5.el7
blas-devel.x86_64 3.4.2-5.el7
environment-modules.x86_64 3.2.10-10.el7
hwloc-libs.x86_64 1.7-5.el7
infinipath-psm.x86_64 3.3-0.g6f42cdb1bb8.2.el7
lapack.x86_64 3.4.2-5.el7
lapack-devel.x86_64 3.4.2-5.el7
libfabric.x86_64 1.1.0-2.el7
libibumad.x86_64 1.3.10.2-1.el7
libpsm2.x86_64 0.7-4.el7
opensm-libs.x86_64 3.3.19-1.el7
elpa-openmpi.x86_64 2015.02.002-4.el7
elpa-openmpi-devel.x86_64 2015.02.002-4.el7
atlas.x86_64 3.10.1-10.el7
blacs-common.x86_64 2.0.2-15.el7
blacs-openmpi.x86_64 2.0.2-15.el7
compat-openmpi16.x86_64 1.6.4-10.el7
elpa-common.noarch 2015.02.002-4.el7
elpa-devel.noarch 2015.02.002-4.el7
libesmtp.x86_64 1.0.6-7.el7
Following patches were installed:
Tddft_mxvec20.patch
Config_libs66.patch
Cosmo_meminit.patch
Sym_abelian.patch
Xccvs98.patch
Dplot_tolrho.patch
Driver_smalleig.patch
Ga_argv.patch
Ga_defs.patch
Zgesvd.patch
Cosmo_dftprint.patch
Util_gnumakefile.patch
Util_getppn.patch
Notdir_fc.patch
Xatom_vdw.patch
The environmental variables were set:
export USE_MPI=y
export NWCHEM_TARGET=LINUX64
export USE_INTERNALBLAS=y
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
export PATH=/usr/lib64/openmpi/bin/:$PATH
export NWCHEM_MODULES="all"
export NWCHEM_TOP=/usr/local/nwchem-6.6
export BLAS_SIZE=4
export SCALAPACK_SIZE=4
export USE_64TO32=y
The console command and reply were as follows:
mpirun -np 2 /usr/local/nwchem/bin/nwchem n2.in > n2-4.out
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
- 0 0x7F8D4D4BE467
- 1 0x7F8D4D4BEAAE
- 2 0x7F8D4C7A924F
- 0 0x7F35D21D7467
- 1 0x7F35D21D7AAE
- 2 0x7F35D14C224F
- 3 0x2BCE6C0 in dcopy_
- 3 0x2BCE6C0 in dcopy_
- 4 0x2B310B3 in ycopy_
- 4 0x2B310B3 in ycopy_
- 5 0x9B6999 in pstat_init_ at pstat_init.F:32
- 5 0x9B6999 in pstat_init_ at pstat_init.F:32
- 6 0x406960 in MAIN__ at nwchem.F:204
- 6 0x406960 in MAIN__ at nwchem.F:204
The output.out file contents:
corsair3.cns.uaf.edu.1197hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
corsair3.cns.uaf.edu.1198hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
argument 1 = n2.in
mpirun noticed that process rank 0 with PID 1197 on node corsair3 exited on signal 11 (Segmentation fault).
Thanks for any suggestions,
John Keller
|