ARMCI ONESIDED SIZEOF IREQ


Just Got Here
Hello,

Not sure if this is a problem with compiling or running. We've compiled NWCHEM on Cray XE6m-200 using instructions from the documentation. When we try to run it we get and error:

ARMCI configured for 3 cluster nodes. Network protocol is 'Cray Onesided'.
ARMCI_ONESIDED_SIZEOF_IREQ is not sized correctly.
ARMCI_ONESIDED_SIZEOF_IREQ = 21016
sizeof(armci_ireq_t) = 22040
Application 828140 exit codes: 134

Anyone knows what can be the problem and workaround?

Thanks,
Alex.

Forum Vet
Please edit the following file
$NWCHEM_TOP/src/tools/ga-5-3/armci/src-gemini/armci.h
and change the following line from
#define ARMCI_ONESIDED_SIZEOF_IREQ 21016
to
#define ARMCI_ONESIDED_SIZEOF_IREQ 22040

Then, recompile and relink by typing


cd $NWCHEM_TOP/src/tools/build
make FC=ftn install
cd ../..
make FC=ftn link

Just Got Here
Thank you! Now it starts, but I get:
nwchem: ../../ga-5-3/armci/src-gemini/buffers.c:625: _armci_buf_get: Assertion `ar->req.active == 0' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x357A4CD in _gfortrani_backtrace at backtrace.c:258
#1  0x3562C20 in _gfortrani_backtrace_handler at compile_options.c:129
#2  0x342F87F in system
#3  0x3759C7C in __libc_fork at fork.c:188
#4  0xFFFFFFFFFFFFFFFF


This is my input file:
start caca

title "caca"
charge 0

memory total 1 gb

geometry units ang  print xyz noautoz
 C                    -0.69678829     1.73570628    -0.00142243
 C                    -0.01620581     1.54099739     1.21131057
 C                     1.32902041     1.15171656     1.19964001
 C                     2.00018884     0.95527676    -0.01379517
 C                     1.31155734     1.15232665    -1.22005238
 C                    -0.03685172     1.54253226    -1.20631979
 H                    -0.52673449     1.71951829     2.15748547
 H                     1.85815361     1.03470211     2.14251901
 H                     3.05396468     0.68571506    -0.02008303
 H                     1.82534835     1.04031559    -2.17156523
 H                    -0.57103646     1.73534382    -2.13423958
 O                    -2.02300658     2.11936078    -0.05844134
 H                    -2.31951336     2.43887107     0.81230826
end

basis spherical
#  * library cc-pVTZ
  * library cc-pVDZ
end


scf
 rhf
 singlet
 maxiter 80
end



mp2
 freeze core atomic
end


task direct_mp2



What can be the problem?

Thanks,
Alex.

Forum Vet
Not quite sure what the problem could be.
I would try the following
1) decrease the memory requirement from the input file
2) use only half of the cores on each node

Just Got Here
I've tried playing with the memory per core, but it didn't help. I've compiled a version with dmapp and Cray version of GA (as recommended in the docs) and looks like it is working. I guess it is not as good as gemini armci network for XE6? Do you know if there is any preference and performance difference between these 2 on Gemini interconnect?

Thanks,
Alex.

Forum Vet
Alex
We are aware of the current issues (both DMAPP and GEMINI ports) on Cray machines.
We plan to release a fix for this within the next few months.
If you are interested in testing this new version before it is released, please drop me a PM


Forum >> NWChem's corner >> Running NWChem