Segmentation Violation error for Cr2 PBE aug-cc-pvqz


Gets Around
Hi,

i'm getting the following error with NWchem 6.0 and http://nwchemgit.github.io/images/Nwchem-src-2011-Oct-25.tar.gz
(installed from RPMS http://nwchemgit.github.io/Special_AWCforum/st/id262/RPMS_of_NWchem.html on CentOS 5 x86_64):
....
127:Segmentation Violation error, status=: 11
(rank:127 hostname:XXX pid:12303):ARMCI DASSERT fail. signaltrap.c:SigSegvHandler():301 cond:0
....
Problem looks similar to:
http://nwchemgit.github.io/Special_AWCforum/st/id43/#post_533
I run the input below (structure from doi:10.1063/1.2162161) up to 128 cores (2GB memory per core).

charge 0.0
geometry noautoz noautosym
Cr 8.0 8.0 8.0
Cr 8.0 8.0 9.679
end
basis spherical
\* library aug-cc-pvqz
end

dft
mult 1
xc xpbe96 cpbe96
iterations 300
convergence gradient 0.0005
convergence energy 1e-06
convergence density 1e-05
convergence nolevelshifting
grid coarse nodisk
smear 0.0
tolerances tight
direct
noio
end

property
dipole
end

memory total 1500 Mb noverify

task dft energy

aug-cc-pvtz runs fine on 4 cores.

The following does not fix the problem with aug-cc-pvqz (tried with NWchem 6.0 on 4 cores):
0. running http://nwchemgit.github.io/images/Nwchem-6.0-binary-redhat-5-5-gcc-4-1-2.tgz in serial on a system with 24GB memory with "memory total 22000 Mb noverify"
1. using plain geometry (instead of geometry noautoz noautosym)
2. grid medium + removed direct and noio keywords
3. basis cartesian

Forum Vet
I'll need more info then the last line of the output. Can you please provide the last part of theoutput so I can see where it fails.

Bert


Quote:Marcindulak Dec 21st 3:20 pm
Hi,

i'm getting the following error with NWchem 6.0 and http://nwchemgit.github.io/images/Nwchem-src-2011-Oct-25.tar.gz
(installed from RPMS http://nwchemgit.github.io/Special_AWCforum/st/id262/RPMS_of_NWchem.html on CentOS 5 x86_64):
....
127:Segmentation Violation error, status=: 11
(rank:127 hostname:XXX pid:12303):ARMCI DASSERT fail. signaltrap.c:SigSegvHandler():301 cond:0
....
Problem looks similar to:
http://nwchemgit.github.io/Special_AWCforum/st/id43/#post_533
I run the input below (structure from doi:10.1063/1.2162161) up to 128 cores (2GB memory per core).

charge 0.0
geometry noautoz noautosym
Cr 8.0 8.0 8.0
Cr 8.0 8.0 9.679
end
basis spherical
\* library aug-cc-pvqz
end

dft
mult 1
xc xpbe96 cpbe96
iterations 300
convergence gradient 0.0005
convergence energy 1e-06
convergence density 1e-05
convergence nolevelshifting
grid coarse nodisk
smear 0.0
tolerances tight
direct
noio
end

property
dipole
end

memory total 1500 Mb noverify

task dft energy

aug-cc-pvtz runs fine on 4 cores.

The following does not fix the problem with aug-cc-pvqz (tried with NWchem 6.0 on 4 cores):
0. running http://nwchemgit.github.io/images/Nwchem-6.0-binary-redhat-5-5-gcc-4-1-2.tgz in serial on a system with 24GB memory with "memory total 22000 Mb noverify"
1. using plain geometry (instead of geometry noautoz noautosym)
2. grid medium + removed direct and noio keywords
3. basis cartesian

Gets Around
Hi,

it fails at:

Superposition of Atomic Density Guess



Sum of atomic energies: -2762.66188442
0:Segmentation Violation error, status=: 11
....

Forum Vet
Could you try reducing the memory per core, i.e. "memory total 1000 mb".

Bert



Quote:Marcindulak Dec 28th 1:49 pm
Hi,

it fails at:

Superposition of Atomic Density Guess



Sum of atomic energies: -2762.66188442
0:Segmentation Violation error, status=: 11
....

Gets Around
Hi,

it fails with "memory total 500 Mb noverify" on 16 cores, 3GB per core.
The aug-cc-pvtz job runs in serial on a single node with 24GB memory with "memory total 500 Mb noverify",
while running top reports ~ 100MB "RES" resident size used.
Is it really a memory problem - see point 0?

Gets Around
What helps is to replace (a weird experiment) the first S function of cc-pvqz type (the one starting with exponent 11016640.0000000) with the one of cc-pvtz type (starting with exponent 61771940.0000000) keeping the rest of (aug)cc-pvqz type untouched, so it results for Cr atom in a Summary of "ao basis" change from "32 140 9s8p6d4f3g2h" (aug-cc-pvqz) to " "31 139 8s8p6d4f3g2h" (the created hybrid). Is it a problem with the cc-pvqz basis set family or a problem with NWchem handling them? I see similar crashes for almost all 3-d elements dimers, also for cc-pv5z set. It is even possible to trim the cc-pvqz basis to a much smaller one (as long as i contains the first s-type one), so one could investigate the crash in a debugger.

Forum Vet
patch
Marcin
Please apply the patch below to the directory $NWCHEM_TOP/src/NWints/texas
http://nwchemgit.github.io/images/Ab_prime2.patch.gz

In other words, please do the following
cd $NWCHEM_TOP/src/NWints/texas
wget http://nwchemgit.github.io/images/Ab_prime2.patch.gz
gzip -d Ab_prime2.patch.gz
patch -p0 < Ab_prime2.patch

and recompile.

Please let me know if this work for you, too.

Cheers, Edo

  • Guest -
Thanks. The 6.0 patched version with aug-cc-pvqz passes the GUESS.

The patched development version http://nwchemgit.github.io/images/Nwchem-src-2011-Oct-25.tar.gz however,
does not ever reach the GUESS (last print "Schwarz screening/accCoul: 1.00D-08") and fails with:

2:2:ga_matmul:ga_matmul_irreg:xerbla:double: lapack error:: 911
(rank:2 hostname:XXX pid:18049):ARMCI DASSERT fail. ../../ga-5-0/armci/src/armci.c:ARMCI_Error():279 cond:0
\*\* On entry to DGEMM parameter number 8 had an illegal value
xerbla:double: lapack error 911

Problem looks similar to http://www.emsl.pnl.gov/docs/nwchem/nwchem-support/2009/09/0010.Re:_NWCHEM_NWChem_OpenMPI:...

I compiled both patched versions with rpmbuild on my personal CentOS 5 x86_64 (haven't used build.opensuse.org),
which links to CentOS 5 blas/lapack, so as build is partly manual this could be due to a mistake.
Interestingly my non-patched Nwchem-src-2011-Oct-25 fails at GUESS, similarly to the non-patched 6.0:

0:Segmentation Violation error, status=: 11
(rank:0 hostname:XXX pid:18468):ARMCI DASSERT fail. ../../ga-5-0/armci/src/signaltrap.c:SigSegvHandler():312 cond:0

Could it be these are some 32/64 bit issues?
Does your installation (i assume you used a development source) works after patching on x86_64?

Forum Vet
Yes
Your ga_matmul error seems a 32bit vs 64bit integer issue.
My development computer is a x86_64 box where I compile using the LINUX64 target.
When I use optimized LAPACK/BLAS that use 32-bit integer (a.k.a. integer*4 in Fortran parlance), I use the 64_to_32 steps described in the $NWCHEM_TOP/install file

  1) cd $NWCHEM_TOP/src
2) make clean
3) make 64_to_32
4) make USE_64TO32=y HAS_BLAS=yes BLASOPT=" optimized BLAS"
e.g. for IBM64: make USE_64TO32=y HAS_BLAS=yes BLASOPT="-lessl -lmass"

If this still does not work, you might need recompiled (from scratch) the tools directory, after patching the tools GNUmakefile ($NWCHEM_TOP/src/tools/GNUmakefile) with the following patch
http://nwchemgit.github.io/images/GNUmakefile.toolsoct25.patch.gz
plus define the env. variables BLAS_LIB (same value as BLASOPT), plus BLAS_SIZE=4
In other words (the following instructions are for csh/tcsh)
1) cd NWCHEM_TOP/src/tools
2) wget http://nwchemgit.github.io/images/GNUmakefile.toolsoct25.patch.gz
3) gzip -d GNUmakefile.toolsoct25.patch.gz
4) patch -p0 < GNUmakefile.toolsoct25.patch
5) rm -rf build install
6) setenv BLAS_LIB "location of optimized blas/lapack"
7) setenv BLAS_SIZE 4
8) recompile

  • Guest -
Thanks again. For the moment I see a result of the patch, with aug-cc-pv5z, Co2:
ab_prim_2: increased dimmx to:: 729
So, still some variables need to be tweaked. Please consider that when making 6.1.0 release if possible.
I will have a look at the 32/64 bit problems at a later time.

Forum Vet
Thanks for the feedback.

Gets Around
With nwchem-6.1 my 64-bit compilation problems seem resolved: http://nwchemgit.github.io/Special_AWCforum/st/id349/#post_1149


Forum >> NWChem's corner >> Running NWChem