Segmentation Fault


Click here for full thread
Clicked A Few Times
Thanks Edoapra,

Sorry for the late response. It took me a while to get 640 processors' allocation to test it out. I was able to overcome the previous error by using DIIS and ARMCI_DEFAULT_SHMMAX = 48000 ( I have 24 cores per node with 128 GB, which makes per core availability to be 5.3 GB.

Now I have moved to a larger cluster (280 atoms Silicon).

Here is my input script.

start Si_3x3x3_pc3_QC_scratch
title "Si 3x3x3 QC scratch in pc-3 basis set"

memory total 4000 mb
geometry units au
Si -15.39 -15.39 -15.39
Si -5.13 -15.39 -15.39
Si 5.13 -15.39 -15.39
Si 15.39 -15.39 -15.39
Si -10.26 -10.26 -15.39
Si 0 -10.26 -15.39
Si 10.26 -10.26 -15.39
Si -15.39 -5.13 -15.39
Si -5.13 -5.13 -15.39
Si 5.13 -5.13 -15.39
Si 15.39 -5.13 -15.39
Si -10.26 0 -15.39
Si 0 0 -15.39
Si 10.26 0 -15.39
Si -15.39 5.13 -15.39
Si -5.13 5.13 -15.39
Si 5.13 5.13 -15.39
Si 15.39 5.13 -15.39
Si -10.26 10.26 -15.39
Si 0 10.26 -15.39
Si 10.26 10.26 -15.39
Si -15.39 15.39 -15.39
Si -5.13 15.39 -15.39
Si 5.13 15.39 -15.39
Si 15.39 15.39 -15.39
Si -10.26 -15.39 -10.26
Si 0 -15.39 -10.26
Si 10.26 -15.39 -10.26
Si -15.39 -10.26 -10.26
Si -5.13 -10.26 -10.26
Si 5.13 -10.26 -10.26
Si 15.39 -10.26 -10.26
Si -10.26 -5.13 -10.26
Si 0 -5.13 -10.26
Si 10.26 -5.13 -10.26
Si -15.39 0 -10.26
Si -5.13 0 -10.26
Si 5.13 0 -10.26
Si 15.39 0 -10.26
Si -10.26 5.13 -10.26
Si 0 5.13 -10.26
Si 10.26 5.13 -10.26
Si -15.39 10.26 -10.26
Si -5.13 10.26 -10.26
Si 5.13 10.26 -10.26
Si 15.39 10.26 -10.26
Si -10.26 15.39 -10.26
Si 0 15.39 -10.26
Si 10.26 15.39 -10.26
Si -15.39 -15.39 -5.13
Si -5.13 -15.39 -5.13
Si 5.13 -15.39 -5.13
Si 15.39 -15.39 -5.13
Si -10.26 -10.26 -5.13
Si 0 -10.26 -5.13
Si 10.26 -10.26 -5.13
Si -15.39 -5.13 -5.13
Si -5.13 -5.13 -5.13
Si 5.13 -5.13 -5.13
Si 15.39 -5.13 -5.13
Si -10.26 0 -5.13
Si 0 0 -5.13
Si 10.26 0 -5.13
Si -15.39 5.13 -5.13
Si -5.13 5.13 -5.13
Si 5.13 5.13 -5.13
Si 15.39 5.13 -5.13
Si -10.26 10.26 -5.13
Si 0 10.26 -5.13
Si 10.26 10.26 -5.13
Si -15.39 15.39 -5.13
Si -5.13 15.39 -5.13
Si 5.13 15.39 -5.13
Si 15.39 15.39 -5.13
Si -10.26 -15.39 0
Si 0 -15.39 0
Si 10.26 -15.39 0
Si -15.39 -10.26 0
Si -5.13 -10.26 0
Si 5.13 -10.26 0
Si 15.39 -10.26 0
Si -10.26 -5.13 0
Si 0 -5.13 0
Si 10.26 -5.13 0
Si -15.39 0 0
Si -5.13 0 0
Si 5.13 0 0
Si 15.39 0 0
Si -10.26 5.13 0
Si 0 5.13 0
Si 10.26 5.13 0
Si -15.39 10.26 0
Si -5.13 10.26 0
Si 5.13 10.26 0
Si 15.39 10.26 0
Si -10.26 15.39 0
Si 0 15.39 0
Si 10.26 15.39 0
Si -15.39 -15.39 5.13
Si -5.13 -15.39 5.13
Si 5.13 -15.39 5.13
Si 15.39 -15.39 5.13
Si -10.26 -10.26 5.13
Si 0 -10.26 5.13
Si 10.26 -10.26 5.13
Si -15.39 -5.13 5.13
Si -5.13 -5.13 5.13
Si 5.13 -5.13 5.13
Si 15.39 -5.13 5.13
Si -10.26 0 5.13
Si 0 0 5.13
Si 10.26 0 5.13
Si -15.39 5.13 5.13
Si -5.13 5.13 5.13
Si 5.13 5.13 5.13
Si 15.39 5.13 5.13
Si -10.26 10.26 5.13
Si 0 10.26 5.13
Si 10.26 10.26 5.13
Si -15.39 15.39 5.13
Si -5.13 15.39 5.13
Si 5.13 15.39 5.13
Si 15.39 15.39 5.13
Si -10.26 -15.39 10.26
Si 0 -15.39 10.26
Si 10.26 -15.39 10.26
Si -15.39 -10.26 10.26
Si -5.13 -10.26 10.26
Si 5.13 -10.26 10.26
Si 15.39 -10.26 10.26
Si -10.26 -5.13 10.26
Si 0 -5.13 10.26
Si 10.26 -5.13 10.26
Si -15.39 0 10.26
Si -5.13 0 10.26
Si 5.13 0 10.26
Si 15.39 0 10.26
Si -10.26 5.13 10.26
Si 0 5.13 10.26
Si 10.26 5.13 10.26
Si -15.39 10.26 10.26
Si -5.13 10.26 10.26
Si 5.13 10.26 10.26
Si 15.39 10.26 10.26
Si -10.26 15.39 10.26
Si 0 15.39 10.26
Si 10.26 15.39 10.26
Si -15.39 -15.39 15.39
Si -5.13 -15.39 15.39
Si 5.13 -15.39 15.39
Si 15.39 -15.39 15.39
Si -10.26 -10.26 15.39
Si 0 -10.26 15.39
Si 10.26 -10.26 15.39
Si -15.39 -5.13 15.39
Si -5.13 -5.13 15.39
Si 5.13 -5.13 15.39
Si 15.39 -5.13 15.39
Si -10.26 0 15.39
Si 0 0 15.39
Si 10.26 0 15.39
Si -15.39 5.13 15.39
Si -5.13 5.13 15.39
Si 5.13 5.13 15.39
Si 15.39 5.13 15.39
Si -10.26 10.26 15.39
Si 0 10.26 15.39
Si 10.26 10.26 15.39
Si -15.39 15.39 15.39
Si -5.13 15.39 15.39
Si 5.13 15.39 15.39
Si 15.39 15.39 15.39
Si -7.695 -7.695 -7.695
Si -7.695 -7.695 2.565
Si -7.695 -7.695 12.825
Si -7.695 2.565 -7.695
Si -7.695 2.565 2.565
Si -7.695 2.565 12.825
Si -7.695 12.825 -7.695
Si -7.695 12.825 2.565
Si -7.695 12.825 12.825
Si 2.565 -7.695 -7.695
Si 2.565 -7.695 2.565
Si 2.565 -7.695 12.825
Si 2.565 2.565 -7.695
Si 2.565 2.565 2.565
Si 2.565 2.565 12.825
Si 2.565 12.825 -7.695
Si 2.565 12.825 2.565
Si 2.565 12.825 12.825
Si 12.825 -7.695 -7.695
Si 12.825 -7.695 2.565
Si 12.825 -7.695 12.825
Si 12.825 2.565 -7.695
Si 12.825 2.565 2.565
Si 12.825 2.565 12.825
Si 12.825 12.825 -7.695
Si 12.825 12.825 2.565
Si 12.825 12.825 12.825
Si -7.695 -12.825 -12.825
Si -7.695 -12.825 -2.565
Si -7.695 -12.825 7.695
Si -7.695 -2.565 -12.825
Si -7.695 -2.565 -2.565
Si -7.695 -2.565 7.695
Si -7.695 7.695 -12.825
Si -7.695 7.695 -2.565
Si -7.695 7.695 7.695
Si 2.565 -12.825 -12.825
Si 2.565 -12.825 -2.565
Si 2.565 -12.825 7.695
Si 2.565 -2.565 -12.825
Si 2.565 -2.565 -2.565
Si 2.565 -2.565 7.695
Si 2.565 7.695 -12.825
Si 2.565 7.695 -2.565
Si 2.565 7.695 7.695
Si 12.825 -12.825 -12.825
Si 12.825 -12.825 -2.565
Si 12.825 -12.825 7.695
Si 12.825 -2.565 -12.825
Si 12.825 -2.565 -2.565
Si 12.825 -2.565 7.695
Si 12.825 7.695 -12.825
Si 12.825 7.695 -2.565
Si 12.825 7.695 7.695
Si -12.825 -7.695 -12.825
Si -12.825 -7.695 -2.565
Si -12.825 -7.695 7.695
Si -12.825 2.565 -12.825
Si -12.825 2.565 -2.565
Si -12.825 2.565 7.695
Si -12.825 12.825 -12.825
Si -12.825 12.825 -2.565
Si -12.825 12.825 7.695
Si -2.565 -7.695 -12.825
Si -2.565 -7.695 -2.565
Si -2.565 -7.695 7.695
Si -2.565 2.565 -12.825
Si -2.565 2.565 -2.565
Si -2.565 2.565 7.695
Si -2.565 12.825 -12.825
Si -2.565 12.825 -2.565
Si -2.565 12.825 7.695
Si 7.695 -7.695 -12.825
Si 7.695 -7.695 -2.565
Si 7.695 -7.695 7.695
Si 7.695 2.565 -12.825
Si 7.695 2.565 -2.565
Si 7.695 2.565 7.695
Si 7.695 12.825 -12.825
Si 7.695 12.825 -2.565
Si 7.695 12.825 7.695
Si -12.825 -12.825 -7.695
Si -12.825 -12.825 2.565
Si -12.825 -12.825 12.825
Si -12.825 -2.565 -7.695
Si -12.825 -2.565 2.565
Si -12.825 -2.565 12.825
Si -12.825 7.695 -7.695
Si -12.825 7.695 2.565
Si -12.825 7.695 12.825
Si -2.565 -12.825 -7.695
Si -2.565 -12.825 2.565
Si -2.565 -12.825 12.825
Si -2.565 -2.565 -7.695
Si -2.565 -2.565 2.565
Si -2.565 -2.565 12.825
Si -2.565 7.695 -7.695
Si -2.565 7.695 2.565
Si -2.565 7.695 12.825
Si 7.695 -12.825 -7.695
Si 7.695 -12.825 2.565
Si 7.695 -12.825 12.825
Si 7.695 -2.565 -7.695
Si 7.695 -2.565 2.565
Si 7.695 -2.565 12.825
Si 7.695 7.695 -7.695
Si 7.695 7.695 2.565
Si 7.695 7.695 12.825
end

basis spherical
Si library pc-3 file /opt/nwchem/data/libraries/
end

dft
xc slater 1.0 perdew81 1.0
iterations 100
direct
convergence density 1.0e-6
mult 1
decomp
smear 0.002
print "convergence" "final vectors analysis"
end

task dft energy
print "total time" "ma stats"


While running it on 960 processors I'm getting the following error:


1: WARNING:armci_set_mem_offset: offset changed 24435412992 to 24558231552
697: WARNING:armci_set_mem_offset: offset changed -734643781632 to -734525075456
361: WARNING:armci_set_mem_offset: offset changed -65125773312 to -65007108096
745: WARNING:armci_set_mem_offset: offset changed -405684514816 to -405565816832
291: WARNING:armci_set_mem_offset: offset changed 413424697344 to 413543346176
319: WARNING:armci_set_mem_offset: offset changed -141358018560 to -141239353344
241: WARNING:armci_set_mem_offset: offset changed 19755319296 to 19873976320
529: WARNING:armci_set_mem_offset: offset changed -387992182784 to -387873492992
459: WARNING:armci_set_mem_offset: offset changed 548066770944 to 548185444352
769: WARNING:armci_set_mem_offset: offset changed 17979842560 to 18098483200
771: WARNING:armci_set_mem_offset: offset changed -77830660096 to -77712019456

------------------------------------------------------------------------
movecs_write: ma failed 17920
------------------------------------------------------------------------
------------------------------------------------------------------------
current input line :
303: task dft energy
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------


Last System Error Message from Task 0:: Inappropriate ioctl for device
[cli_0]: aborting job:
application called MPI_Abort(comm=0x84000007, 17920) - process 0
srun: First task exited 30s ago
srun: tasks 1-39: running
srun: task 0: exited
srun: Terminating job step 1655055.5
[mpiexec@comet-11-04.sdsc.edu] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed
[mpiexec@comet-11-04.sdsc.edu] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@comet-11-04.sdsc.edu] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec@comet-11-04.sdsc.edu] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion


Any idea what might be causing it?