problem on armci-network protocol


Clicked A Few Times
Hello everyone.

I'm using NWChem 6.0 installed on one PC cluster. Now I'm trying to install it to another cluster also.

I set some environments in .bashrc file in my home folder and comfiled the program with
[make nwchem_config] and [make FC=ifort >& make.log] commands.

When it's over, I checked the nwchem execute file is created in $NWCHEM_TOP/bin/LINUX64/ folder.


Here is the problem. I tested the program with simple input file, with [./nwchem test.nw] command. Then I found some error message comes out. Here it is...


libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,0,0]: OpenIB on host pcs5 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
nwchem 0000000002A1659D Unknown Unknown Unknown
nwchem 0000000002A150A5 Unknown Unknown Unknown
nwchem 00000000029AC319 Unknown Unknown Unknown
nwchem 000000000295744F Unknown Unknown Unknown
nwchem 000000000295BF32 Unknown Unknown Unknown
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown


I think this error message is related with Armci network like InfiniBand or Giganet and I guess the program is trying to find OPENIB protocol.

The strange thing is that, the cluster that I'm using do not have any network like InfiniBand or Giganet.
So I wrote [export ARMCI_NETWORK=SOCKETS] in bashrc file for the environment setup. I think there was no problem in compiling process(I checked the make.log file), but don't know why such error message comes out.


Is there anybody who had same or similar problem before? I need your help...

Thanks in advance.

Yjlee

Forum Vet
Could you send me the make.log at bert.dejong@pnnl.gov so we can see what happened.

Bert

Quote:Yjleedaniel May 30th 1:41 am
Hello everyone.

I'm using NWChem 6.0 installed on one PC cluster. Now I'm trying to install it to another cluster also.

I set some environments in .bashrc file in my home folder and comfiled the program with
[make nwchem_config] and [make FC=ifort >& make.log] commands.

When it's over, I checked the nwchem execute file is created in $NWCHEM_TOP/bin/LINUX64/ folder.


Here is the problem. I tested the program with simple input file, with [./nwchem test.nw] command. Then I found some error message comes out. Here it is...


libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,0,0]: OpenIB on host pcs5 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT) # I typed [Ctrl+c] here
Image PC Routine Line Source
nwchem 0000000002A1659D Unknown Unknown Unknown
nwchem 0000000002A150A5 Unknown Unknown Unknown
nwchem 00000000029AC319 Unknown Unknown Unknown
nwchem 000000000295744F Unknown Unknown Unknown
nwchem 000000000295BF32 Unknown Unknown Unknown
libpthread.so.0 0000003331E0C5B0 Unknown Unknown Unknown
libpthread.so.0 0000003331E0AF8B Unknown Unknown Unknown


I think this error message is related with Armci network like InfiniBand or Giganet and I guess the program is trying to find OPENIB protocol.

The strange thing is that, the cluster that I'm using do not have any network like InfiniBand or Giganet.
So I wrote [export ARMCI_NETWORK=SOCKETS] in bashrc file for the environment setup. I think there was no problem in compiling process(I checked the make.log file), but don't know why such error message comes out.


Is there anybody who had same or similar problem before? I need your help...

Thanks in advance.

Yjlee

Clicked A Few Times
I just send you that files
I just send you the make.log file and environment setup scripts.

Thank you.

Forum Vet
Hi Yongjin,

Couple of things after reviewing the info:

1. NWChem did not compile in any Infiniband information. It looks like this comes from the OpenMPI itself.

2. What is the network between your nodes in your cluster? There must be some network. The OpenMPI installation suggest it has IB. You may verify this.

3. If you want to compile over sockets, you should not specify any MPI variables when compiling. Instead of starting the job with mpirun you will have to use the parallel.x command to run the code in parallel.

4. When you want to compile with MPI, you should not set the ARMCI_NETWORK to SOCKETS. You should not set this at all (see the compile instructions on the NWChem web page).

Bert

Quote:Yjleedaniel May 31st 6:55 am
I just send you the make.log file and environment setup scripts.

Thank you.

Clicked A Few Times
sorry for late
Sorry for late reply...

I discussed about this problem with the administrator of the cluster system.

He checked the openmpi and found that your first comment were right.
openmpi itself was looking infiniband.
(Actually, he didn't know about it before. Other cluster users use mpich, instead of openmpi)

The problem was fixed and NWChem is running without any problem. Well, now the problem is problem about
my knowledge and understanding about computational chemistry and program itself.

Anyway, thank you again!

Forum Vet

Bert

Quote:Yjleedaniel Jun 13th 3:01 pm
Sorry for late reply...

I discussed about this problem with the administrator of the cluster system.

He checked the openmpi and found that your first comment were right.
openmpi itself was looking infiniband.
(Actually, he didn't know about it before. Other cluster users use mpich, instead of openmpi)

The problem was fixed and NWChem is running without any problem. Well, now the problem is problem about
my knowledge and understanding about computational chemistry and program itself.

Anyway, thank you again!


Forum >> NWChem's corner >> Compiling NWChem