Compiling with MPI under cygwin


Clicked A Few Times
Dear people,

I have some issues with kompilation under cygwin (Windows 7). I managed to kompile without MPI, and nwchem seems to work. Hoewever, we have a machine with two processors and six cores each, so I want to kompile with MPI (using MPICH2). I'm using the following environmental variables:

setenv NWCHEM_TOP /home/don2/ivo/nwchem-6.1
setenv NWCHEM_TARGET CYGWIN
setenv NWCHEM_MODULES all
setenv FC gfortran
setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y
setenv LIBMPI "-lmpich -lopa -lmpl -lrt -lpthread"
setenv MPI_LIB /home/don2/ivo/mpilndir/lib
setenv MPI_INCLUDE /home/don2/ivo/mpilndir/include

This fails however, and this is the end of the make.log:

gfortran -Wextra -ffast-math -march=pentium4 -mtune=pentium4 -Xlinker --export-dynamic -L/home/don2/ivo/nwchem-6.1/lib/CYGWIN -L/home/don2/ivo/nwchem-6.1/src/tools/install/lib -o /home/don2/ivo/nwchem-6.1/bin/CYGWIN/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lcons -lperfm -ldntmc -lccca -lnwcutil -lga -lpeigs -lperfm -lcons -lbq -lnwcutil -llapack -lblas -llapack -lblas -L/home/don2/ivo/mpilndir/lib -lmpich -lopa -lmpl -lrt -lpthread -lm
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: warning: --export-dynamic is not supported for PE targets, did you mean --export-all-symbols?
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: cannot find -lmpich
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: cannot find -lopa
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: cannot find -lmpl
collect2: ld returned 1 exit status
GNUmakefile:38: recipe for target `all' failed
make: *** [all] Error 1

This tells me that the linker cannot find the MPI libraries in the MPI_INCLUDE directory, but I don't understand why.

Please help, thanks in advance.
Ivo

Forum Vet
I don't see why the linker, other then the libraries are not in /home/don2/ivo/mpilndir/lib .

What does "ls -ltr /home/don2/ivo/mpilndir/lib" give you?

Bert


Quote:Ivo Apr 25th 2:49 pm
Dear people,

I have some issues with kompilation under cygwin (Windows 7). I managed to kompile without MPI, and nwchem seems to work. Hoewever, we have a machine with two processors and six cores each, so I want to kompile with MPI (using MPICH2). I'm using the following environmental variables:

setenv NWCHEM_TOP /home/don2/ivo/nwchem-6.1
setenv NWCHEM_TARGET CYGWIN
setenv NWCHEM_MODULES all
setenv FC gfortran
setenv USE_MPI y
setenv USE_MPIF y
setenv USE_MPIF4 y
setenv LIBMPI "-lmpich -lopa -lmpl -lrt -lpthread"
setenv MPI_LIB /home/don2/ivo/mpilndir/lib
setenv MPI_INCLUDE /home/don2/ivo/mpilndir/include

This fails however, and this is the end of the make.log:

gfortran -Wextra -ffast-math -march=pentium4 -mtune=pentium4 -Xlinker --export-dynamic -L/home/don2/ivo/nwchem-6.1/lib/CYGWIN -L/home/don2/ivo/nwchem-6.1/src/tools/install/lib -o /home/don2/ivo/nwchem-6.1/bin/CYGWIN/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lcons -lperfm -ldntmc -lccca -lnwcutil -lga -lpeigs -lperfm -lcons -lbq -lnwcutil -llapack -lblas -llapack -lblas -L/home/don2/ivo/mpilndir/lib -lmpich -lopa -lmpl -lrt -lpthread -lm
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: warning: --export-dynamic is not supported for PE targets, did you mean --export-all-symbols?
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: cannot find -lmpich
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: cannot find -lopa
/usr/lib/gcc/i686-pc-cygwin/4.5.3/../../../../i686-pc-cygwin/bin/ld: cannot find -lmpl
collect2: ld returned 1 exit status
GNUmakefile:38: recipe for target `all' failed
make: *** [all] Error 1

This tells me that the linker cannot find the MPI libraries in the MPI_INCLUDE directory, but I don't understand why.

Please help, thanks in advance.
Ivo

Clicked A Few Times
Here it is. I include the include dir as well:

> ls -ltr /home/don2/ivo/mpilndir/lib
total 3152
-rwx------+ 1 SYSTEM SYSTEM 132360 Sep 1 2011 mpi.lib
-rwx------+ 1 SYSTEM SYSTEM 10430 Sep 1 2011 mpe.lib
-rwx------+ 1 SYSTEM SYSTEM 93844 Sep 1 2011 rlog.lib
-rwx------+ 1 SYSTEM SYSTEM 324322 Sep 1 2011 cxx.lib
-rwx------+ 1 SYSTEM SYSTEM 133560 Sep 1 2011 fmpich2.lib
-rwx------+ 1 SYSTEM SYSTEM 1936 Sep 1 2011 irlog2rlog.lib
-rwx------+ 1 SYSTEM SYSTEM 400390 Sep 1 2011 fmpich2g.lib
-rwx------+ 1 SYSTEM SYSTEM 4644 Sep 1 2011 TraceInput.lib
-rwx------+ 1 SYSTEM SYSTEM 474252 Sep 1 2011 libmpi.a
-rwx------+ 1 SYSTEM SYSTEM 1420022 Sep 1 2011 libfmpich2g.a
-rwx------+ 1 SYSTEM SYSTEM 210386 Sep 1 2011 libmpicxx.a
> ls -ltr /home/don2/ivo/mpilndir/include
total 385
-rwx------+ 1 SYSTEM SYSTEM 1833 Nov 2 2007 mpe_logf.h
-rwx------+ 1 SYSTEM SYSTEM 696 Nov 2 2007 clog_const.h
-rwx------+ 1 SYSTEM SYSTEM 1322 Nov 17 2009 mpe_misc.h
-rwx------+ 1 SYSTEM SYSTEM 11102 Nov 17 2009 mpe_log.h
-rwx------+ 1 SYSTEM SYSTEM 1353 Nov 17 2009 clog_uuid.h
-rwx------+ 1 SYSTEM SYSTEM 4857 Nov 17 2009 clog_commset.h
-rwx------+ 1 SYSTEM SYSTEM 355 Oct 21 2010 mpe.h
-rwx------+ 1 SYSTEM SYSTEM 16765 Sep 1 2011 mpio.h
-rwx------+ 1 SYSTEM SYSTEM 100243 Sep 1 2011 mpicxx.h
-rwx------+ 1 SYSTEM SYSTEM 57471 Sep 1 2011 mpi.h
-rwx------+ 1 SYSTEM SYSTEM 731 Sep 1 2011 clog_inttypes.h
-rwx------+ 1 SYSTEM SYSTEM 19119 Sep 1 2011 mpif.h
-rwx------+ 1 SYSTEM SYSTEM 8417 Sep 1 2011 mpi_sizeofs.mod
-rwx------+ 1 SYSTEM SYSTEM 36141 Sep 1 2011 mpi_constants.mod
-rwx------+ 1 SYSTEM SYSTEM 90551 Sep 1 2011 mpi_base.mod
-rwx------+ 1 SYSTEM SYSTEM 3702 Sep 1 2011 mpi.mod

mpilndir is a link to the mpi directory in the windows structure. That resides within Program Files, so I decided to make a link, to avoid the space in the name. It shouldn't be problem that it is a link, right?

Forum Vet
I don't see mpich.a opa.a mpl.a in the lib directory you are showing.

Bert



[QUOTE=Ivo Apr 26th 6:54 am]Here it is. I include the include dir as well:

> ls -ltr /home/don2/ivo/mpilndir/lib
total 3152
-rwx------+ 1 SYSTEM SYSTEM 132360 Sep 1 2011 mpi.lib
-rwx------+ 1 SYSTEM SYSTEM 10430 Sep 1 2011 mpe.lib
-rwx------+ 1 SYSTEM SYSTEM 93844 Sep 1 2011 rlog.lib
-rwx------+ 1 SYSTEM SYSTEM 324322 Sep 1 2011 cxx.lib
-rwx------+ 1 SYSTEM SYSTEM 133560 Sep 1 2011 fmpich2.lib
-rwx------+ 1 SYSTEM SYSTEM 1936 Sep 1 2011 irlog2rlog.lib
-rwx------+ 1 SYSTEM SYSTEM 400390 Sep 1 2011 fmpich2g.lib
-rwx------+ 1 SYSTEM SYSTEM 4644 Sep 1 2011 TraceInput.lib
-rwx------+ 1 SYSTEM SYSTEM 474252 Sep 1 2011 libmpi.a
-rwx------+ 1 SYSTEM SYSTEM 1420022 Sep 1 2011 libfmpich2g.a
-rwx------+ 1 SYSTEM SYSTEM 210386 Sep 1 2011 libmpicxx.a
> ls -ltr /home/don2/ivo/mpilndir/include
total 385
-rwx------+ 1 SYSTEM SYSTEM 1833 Nov 2 2007 mpe_logf.h
-rwx------+ 1 SYSTEM SYSTEM 696 Nov 2 2007 clog_const.h
-rwx------+ 1 SYSTEM SYSTEM 1322 Nov 17 2009 mpe_misc.h
-rwx------+ 1 SYSTEM SYSTEM 11102 Nov 17 2009 mpe_log.h
-rwx------+ 1 SYSTEM SYSTEM 1353 Nov 17 2009 clog_uuid.h
-rwx------+ 1 SYSTEM SYSTEM 4857 Nov 17 2009 clog_commset.h
-rwx------+ 1 SYSTEM SYSTEM 355 Oct 21 2010 mpe.h
-rwx------+ 1 SYSTEM SYSTEM 16765 Sep 1 2011 mpio.h
-rwx------+ 1 SYSTEM SYSTEM 100243 Sep 1 2011 mpicxx.h
-rwx------+ 1 SYSTEM SYSTEM 57471 Sep 1 2011 mpi.h
-rwx------+ 1 SYSTEM SYSTEM 731 Sep 1 2011 clog_inttypes.h
-rwx------+ 1 SYSTEM SYSTEM 19119 Sep 1 2011 mpif.h
-rwx------+ 1 SYSTEM SYSTEM 8417 Sep 1 2011 mpi_sizeofs.mod
-rwx------+ 1 SYSTEM SYSTEM 36141 Sep 1 2011 mpi_constants.mod
-rwx------+ 1 SYSTEM SYSTEM 90551 Sep 1 2011 mpi_base.mod
-rwx------+ 1 SYSTEM SYSTEM 3702 Sep 1 2011 mpi.mod

mpilndir is a link to the mpi directory in the windows structure. That resides within Program Files, so I decided to make a link, to avoid the space in the name. It shouldn't be problem that it is a link, right?

Forum Vet
mpif90 -show
Ivo,
You need to use the output of
mpif90 -show
to set up correctly LIBMPI

Clicked A Few Times
@Bert: No, they're clearly not there. I freshly installed MPICH2-1.4.1p1, for windows, 64bit version. Can there be a version issue?

@Edoapra: I don't have mpif90.

On another note, I am also not married to MPICH2. I could try another MPI implementation. Which one would be recommended?

Forum Vet
/home/don2/ivo/mpilndir/bin should have either mpif77 or mpif90.

I don't know why you specified

   setenv LIBMPI "-lmpich -lopa -lmpl -lrt -lpthread"

while the libraries are not there in the first place.

Given the libraries available, you may want to try

  setenv LIBMPI "-lmpi"

and go from there. There might be additional libraries needed, but the linker will indicate that.

Bert



Quote:Ivo Apr 27th 12:39 pm
@Bert: No, they're clearly not there. I freshly installed MPICH2-1.4.1p1, for windows, 64bit version. Can there be a version issue?

@Edoapra: I don't have mpif90.

On another note, I am also not married to MPICH2. I could try another MPI implementation. Which one would be recommended?

Forum Vet
/home/don2/ivo/mpilndir/bin/mpicc
Quote:Ivo Apr 27th 4:39 am

@Edoapra: I don't have mpif90.


Did you check in

/home/don2/ivo/mpilndir/bin
?

You must have -- at least -- mpicc.

Edo

Clicked A Few Times
I think I managed with

setenv LIBMPI "-lmpi -lfmpich2g -lrt -lpthread"

At least the compilation finished without complaints.
I think the issue arose because I was more or less blindly following the compilation instructions from this website, and they were not completely appropriate for my case. I'm not very experienced in compiling programs, so that was very confusing for me. Your comments helped me to understand the issue and resolve it.

I still cannot run parallel because I have an issue with MPI, appearently, but I will try to resolve that elsewhere.

Clicked A Few Times
For future reference:
In the end was unable to run nwchem in parallel using the binary MPI for windows.
Instead I had to download the source code, and kompile it myself. This was fairly easy once I knew which packages I neended for that in cygwin.
Once that was in place and working I was able to kompile nwchem without any problems.
I'm running the QA tests right now, will post the results later.

Clicked A Few Times
QA
Quite a few of the testing jobs have failed:

prop_h2o		gives -0 instead of 0, not really problamatic
dplot			pspw module crashes
oh2			should fail
sadsmall		y and z seem to be mirrored
pspw			pspw module crashes
pspw_SiC		pspw module crashes
pspw_md			pspw module crashes
paw			pspw module crashes
pspw_polarizability	pspw module crashes
pspw_stress		pspw module crashes
band			pspw module crashes
tce_cr_eom_t_ch_rohf	right numbers, but in different order
tce_cr_eom_t_ozone	crashes with segmentation violation
tce_active_ccsdt	crashes with segmentation violation
tce_lr_ccsd_tq		crashes with segmentation violation
tce_eomsd_eomsol1	crashes with segmentation violation
tce_eomsd_eomsol2	crashes with segmentation violation
tce_uracil_creomact	crashes with segmentation violation
i2_zora_so		no error, but different numbers
o2_zora_so		no error, but different numbers
lys_qmmm		no error, but different numbers
ethane_qmmm		no error, but slightly different numbers
qmmm_opt0		no error, but different numbers
prop_ch3f		z mirrored
ch3f_trans_cosmo	no error, but slightly different numbers
ch3f_trans_cam_nmr	no error, but slightly different numbers
acr_lcblyp		no error, but slightly different numbers
h2o-response		no error, but slightly different numbers
mep-test		no error, but slightly different numbers
k6h2o			crashes with unresolved atom types in fragment wat
mcscf_ch2		convergence failed
mcscf_ozone		no error, but different numbers
sif_sodft		no error, but slightly different numbers
tropt-ch3nh2		no error, but slightly different numbers
h3_dirdyvtst		crashes with child process terminated prematurely
geom_load_xyz		no such file or directory (xyz file not copied to scratch?)
h2o_hcons		no error, but slightly different numbers
etf_hcons		no error, but slightly different numbers
cnh5_m06-2x		test not there
bq_nio			convergence fails
ch3_m06-hf		test not there
dntmc_h2o_nh3		could not find verified output (???)
talc			pspw module crashes
aump2			crashes with segmentation violation
hess_nh3_dimer		no error, but slightly different numbers
pbo_nesc1e		no error, but slightly different numbers
oniom3			no error, but different numbers
cytosine_ccsd		crashes with segmentation violation
h2o_selci		no error, but slightly different numbers
tce_polar_ccsd_big	crashes with segmentation violation
hess_biph		crashes with segmentation violation


The pspw module causes a crash with no clear error message (something about MA_verify_allocator_stuff and D3dB_Vector_ISumAll popping stack).
The tce module often crashes with a segmentation violation.
I don't mind so much that these modules don't work, although I might be interested in the pspw module in the future.

In the cases that I've written "slightly different numbers", I mean to say that the difference are so small that I don't think it matters.
A couple of jobs give significantly different numbers though, and that worries me a bit: i2_zora_so, o2_zora_so, oniom3 for example. Any idea what could be wrong?

Forum Vet
If you use the QA test scripts the comparisons will be made based on set tolerances. The following ones are fine:

prop_h2o gives -0 instead of 0, not really problamatic
oh2 should fail
sadsmall y and z seem to be mirrored
tce_cr_eom_t_ch_rohf right numbers, but in different order
ethane_qmmm no error, but slightly different numbers
prop_ch3f z mirrored
ch3f_trans_cosmo no error, but slightly different numbers
ch3f_trans_cam_nmr no error, but slightly different numbers
acr_lcblyp no error, but slightly different numbers
h2o-response no error, but slightly different numbers
mep-test no error, but slightly different numbers
sif_sodft no error, but slightly different numbers
tropt-ch3nh2 no error, but slightly different numbers
h2o_hcons no error, but slightly different numbers
etf_hcons no error, but slightly different numbers
h2o_selci no error, but slightly different numbers
hess_nh3_dimer no error, but slightly different numbers
pbo_nesc1e no error, but slightly different numbers


I need more information on the crashes for the ones below @ bert.dejong@pnnl.gov:

dplot pspw module crashes
pspw pspw module crashes
pspw_SiC pspw module crashes
pspw_md pspw module crashes
paw pspw module crashes
pspw_polarizability pspw module crashes
pspw_stress pspw module crashes
band pspw module crashes
talc pspw module crashes

The ones below suggest they may be running out of memory. How much memory do you have on the system, and how much memory per core?

tce_cr_eom_t_ozone crashes with segmentation violation
tce_active_ccsdt crashes with segmentation violation
tce_lr_ccsd_tq crashes with segmentation violation
tce_eomsd_eomsol1 crashes with segmentation violation
tce_eomsd_eomsol2 crashes with segmentation violation
tce_uracil_creomact crashes with segmentation violation
cytosine_ccsd crashes with segmentation violation
tce_polar_ccsd_big crashes with segmentation violation
hess_biph crashes with segmentation violation
aump2 crashes with segmentation violation

Would need to see the output files you have for these to understand @ bert.dejong@pnnl.gov:

i2_zora_so no error, but different numbers
o2_zora_so no error, but different numbers
lys_qmmm no error, but different numbers
qmmm_opt0 no error, but different numbers
oniom3 no error, but different numbers
k6h2o crashes with unresolved atom types in fragment wat
mcscf_ch2 convergence failed
mcscf_ozone no error, but different numbers
h3_dirdyvtst crashes with child process terminated prematurely
bq_nio convergence fails

You found some errors in the naming of files in our test suite for the cases below. We'll have these fixed:

cnh5_m06-2x test not there
ch3_m06-hf test not there
dntmc_h2o_nh3 could not find verified output (???)
geom_load_xyz no such file or directory (xyz file not copied to scratch?)

Bert




Quote:Ivo May 16th 3:09 pm
Quite a few of the testing jobs have failed:

prop_h2o		gives -0 instead of 0, not really problamatic
dplot			pspw module crashes
oh2			should fail
sadsmall		y and z seem to be mirrored
pspw			pspw module crashes
pspw_SiC		pspw module crashes
pspw_md			pspw module crashes
paw			pspw module crashes
pspw_polarizability	pspw module crashes
pspw_stress		pspw module crashes
band			pspw module crashes
tce_cr_eom_t_ch_rohf	right numbers, but in different order
tce_cr_eom_t_ozone	crashes with segmentation violation
tce_active_ccsdt	crashes with segmentation violation
tce_lr_ccsd_tq		crashes with segmentation violation
tce_eomsd_eomsol1	crashes with segmentation violation
tce_eomsd_eomsol2	crashes with segmentation violation
tce_uracil_creomact	crashes with segmentation violation
i2_zora_so		no error, but different numbers
o2_zora_so		no error, but different numbers
lys_qmmm		no error, but different numbers
ethane_qmmm		no error, but slightly different numbers
qmmm_opt0		no error, but different numbers
prop_ch3f		z mirrored
ch3f_trans_cosmo	no error, but slightly different numbers
ch3f_trans_cam_nmr	no error, but slightly different numbers
acr_lcblyp		no error, but slightly different numbers
h2o-response		no error, but slightly different numbers
mep-test		no error, but slightly different numbers
k6h2o			crashes with unresolved atom types in fragment wat
mcscf_ch2		convergence failed
mcscf_ozone		no error, but different numbers
sif_sodft		no error, but slightly different numbers
tropt-ch3nh2		no error, but slightly different numbers
h3_dirdyvtst		crashes with child process terminated prematurely
geom_load_xyz		no such file or directory (xyz file not copied to scratch?)
h2o_hcons		no error, but slightly different numbers
etf_hcons		no error, but slightly different numbers
cnh5_m06-2x		test not there
bq_nio			convergence fails
ch3_m06-hf		test not there
dntmc_h2o_nh3		could not find verified output (???)
talc			pspw module crashes
aump2			crashes with segmentation violation
hess_nh3_dimer		no error, but slightly different numbers
pbo_nesc1e		no error, but slightly different numbers
oniom3			no error, but different numbers
cytosine_ccsd		crashes with segmentation violation
h2o_selci		no error, but slightly different numbers
tce_polar_ccsd_big	crashes with segmentation violation
hess_biph		crashes with segmentation violation


The pspw module causes a crash with no clear error message (something about MA_verify_allocator_stuff and D3dB_Vector_ISumAll popping stack).
The tce module often crashes with a segmentation violation.
I don't mind so much that these modules don't work, although I might be interested in the pspw module in the future.

In the cases that I've written "slightly different numbers", I mean to say that the difference are so small that I don't think it matters.
A couple of jobs give significantly different numbers though, and that worries me a bit: i2_zora_so, o2_zora_so, oniom3 for example. Any idea what could be wrong?


Forum >> NWChem's corner >> Compiling NWChem