Endless TCGMSG run


  • Guest -
hi, i run nwchem 6.0 on linux ia32.
it runs on two machines alone and in parallel with the machine itself as only host, but when i try to run it from one on both i get no errors of any kind, and nwchem runs fine (again), it's in the process tables on both but it doesn't stop. the jobs i try are very fast, less than a minute on a single machine, yet they continue forever in parallel. i use the TCGMSG. this is the output i get:

bash-3.1$ parallel nwchem etoh_time.nw
tmp = /home/user/pdir/nwchem.p
Creating: host=pc-008, user=user,
file=/home/user/nwchem-6.0/bin/LINUX/nwchem, port=51876
/home/user/nwchem-6.0/bin/LINUX/nwchem, len=38
etoh_time.nw, len=12
  -master, len=7
pc-008.initlab.org, len=18
    51876, len=5
2, len=1
4, len=1
0, len=1
0, len=1
Creating: host=lab100-pc012, user=user,
file=/home/user/nwchem-6.0/bin/LINUX/nwchem, port=44494
/home/user/nwchem-6.0/bin/LINUX/nwchem, len=38
etoh_time.nw, len=12
  -master, len=7
pc-008, len=6
44494, len=5
2, len=1
4, len=1
1, len=1
2, len=1
argument 1 = etoh_time.nw
argument 2 = -master
argument 3 = pc-008.initlab.org
argument 4 = 51876
argument 5 = 2
argument 6 = 4
argument 7 = 0
argument 8 = 0
ARMCI configured for 2 cluster nodes. Network protocol is 'TCP/IP Sockets'.

and it doesn't end,
just keeps running forever, no matter how many times i try.

i can even log with ssh from one machine to another and then run it or run it in parallel with single machine (the one running) but that's it.

  • Guest -
i forgot: my machines are E7600@2.8 and E7400@2.8.
the compilation was done on the E7600, with ATLAS compiled with gcc 4.3.3 and gfortran 4.3.3, and nwchem compiled with icc 10.0.026 and ifort 10.0.026. then the program was moved to the E7400, and again it runs fine

the parallel starts as well, it just doesnt actually go to the calculation and freezes like that

Forum Vet
Quote: Oct 19th 6:35 pm
i forgot: my machines are E7600@2.8 and E7400@2.8.
the compilation was done on the E7600, with ATLAS compiled with gcc 4.3.3 and gfortran 4.3.3, and nwchem compiled with icc 10.0.026 and ifort 10.0.026. then the program was moved to the E7400, and again it runs fine

the parallel starts as well, it just doesnt actually go to the calculation and freezes like that


1. You could try "parallel hello" to see if it works with a simple hello world.

2. Did you create a nwchem.p file (see http://nwchemgit.github.io/index.php/Running)?
  
3. If this is truly a 32-bit architecture, did you compile it as such (covert from 64 to 32 bit, see INSTALL file).

Bert


Forum >> NWChem's corner >> Running NWChem