"inconsistency processing clusterinfo" error when trying to use multiple cluster nodes...


Click here for full thread
Clicked A Few Times
Tried with ARMCI = MPI_TS different error...
So first to answer the above question. >hostname on any node in my cluster returns 'node#', where # is replaced by the integer node number. So all my nodes are named as node1, node2, node3 ... The \etc\hosts file on all nodes also has the 'Node#' as an alternate name/alias for each node.

I can 'ssh node# <cmd>' from any node to any other using either the lowercase or capitalized version of the names.

Here's the error I got using MPI_TS as my compilation choice. This time it did run properly on a single node.


nwchem: ../../ga-5-3/comex/src-mpi/comex.c:1359: comex_init: Assertion `0 == status' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
nwchem: ../../ga-5-3/comex/src-mpi/comex.c:1359: comex_init: Assertion `0 == status' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
nwchem: ../../ga-5-3/comex/src-mpi/comex.c:197: _mq_test: Assertion `0 == rc' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
nwchem: ../../ga-5-3/comex/src-mpi/comex.c:197: _mq_test: Assertion `0 == rc' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
nwchem: ../../ga-5-3/comex/src-mpi/comex.c:197: _mq_test: Assertion `0 == rc' failed.
nwchem: ../../ga-5-3/comex/src-mpi/comex.c:197: _mq_test: Assertion `0 == rc' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
  1. 0 0x7FF702970777
  2. 0 0x7FF1EA30D777
  3. 1 0x7FF1EA30DD7E
  4. 1 0x7FF702970D7E
  5. 2 0x7FF1E9C5FD3F
  6. 2 0x7FF7022C2D3F
  7. 3 0x7FF7022C2CC9
  8. 3 0x7FF1E9C5FCC9
  9. 4 0x7FF7022C60D7
  10. 4 0x7FF1E9C630D7
  11. 5 0x7FF7022BBB85
  12. 5 0x7FF1E9C58B85
  13. 6 0x7FF7022BBC31
  14. 6 0x7FF1E9C58C31
  15. 0 0x7FC24BA6B777
  16. 1 0x7FC24BA6BD7E
  17. 2 0x7FC24B3BDD3F
  18. 3 0x7FC24B3BDCC9
  19. 4 0x7FC24B3C10D7
  20. 5 0x7FC24B3B6B85
  21. 6 0x7FC24B3B6C31
  22. 0 0x7F77D791F777
  23. 1 0x7F77D791FD7E
  24. 2 0x7F77D7271D3F
  25. 3 0x7F77D7271CC9
  26. 4 0x7F77D72750D7
  27. 5 0x7F77D726AB85
  28. 6 0x7F77D726AC31
  29. 0 0x7F829398B777
  30. 1 0x7F829398BD7E
  31. 2 0x7F82932DDD3F
  32. 3 0x7F82932DDCC9
  33. 4 0x7F82932E10D7
  34. 5 0x7F82932D6B85
  35. 6 0x7F82932D6C31
  36. 0 0x7FEB98228777
  37. 1 0x7FEB98228D7E
  38. 2 0x7FEB97B7AD3F
  39. 3 0x7FEB97B7ACC9
  40. 4 0x7FEB97B7E0D7
  41. 5 0x7FEB97B73B85
  42. 6 0x7FEB97B73C31
  43. 7 0x4B71A07 in _mq_test at comex.c:197
  44. 8 0x4B73154 in comex_barrier at comex.c:1208
  45. 9 0x4B735CF in comex_init at comex.c:1395
  46. 10 0x4B7369F in comex_init_args at comex.c:1411
  47. 11 0x4B6E7E5 in PARMCI_Init_args at armci.c:178
  48. 12 0x4B3A42A in install_nxtval
  49. 13 0x4B3A1CD in tcgi_alt_pbegin
  50. 14 0x4B3A235 in tcgi_pbegin
  51. 15 0x4B38F1B in pbeginf_
  52. 16 0x54551D in nwchem at nwchem.F:84
  53. 7 0x4B73622 in comex_init at comex.c:1359 (discriminator 1)
  54. 8 0x4B7369F in comex_init_args at comex.c:1411
  55. 9 0x4B6E7E5 in PARMCI_Init_args at armci.c:178
  56. 7 0x4B71A07 in _mq_test at comex.c:197
  57. 8 0x4B73154 in comex_barrier at comex.c:1208
  58. 9 0x4B735CF in comex_init at comex.c:1395
  59. 10 0x4B7369F in comex_init_args at comex.c:1411
  60. 11 0x4B6E7E5 in PARMCI_Init_args at armci.c:178
  61. 10 0x4B3A42A in install_nxtval
  62. 11 0x4B3A1CD in tcgi_alt_pbegin
  63. 12 0x4B3A42A in install_nxtval
  64. 12 0x4B3A235 in tcgi_pbegin
  65. 13 0x4B3A1CD in tcgi_alt_pbegin
  66. 13 0x4B38F1B in pbeginf_
  67. 14 0x4B3A235 in tcgi_pbegin
  68. 14 0x54551D in nwchem at nwchem.F:84
  69. 15 0x4B38F1B in pbeginf_
  70. 16 0x54551D in nwchem at nwchem.F:84
  71. 7 0x4B73622 in comex_init at comex.c:1359 (discriminator 1)
  72. 7 0x4B71A07 in _mq_test at comex.c:197
  73. 7 0x4B71A07 in _mq_test at comex.c:197
  74. 8 0x4B73154 in comex_barrier at comex.c:1208
  75. 8 0x4B7369F in comex_init_args at comex.c:1411
  76. 9 0x4B735CF in comex_init at comex.c:1395
  77. 8 0x4B73154 in comex_barrier at comex.c:1208
  78. 10 0x4B7369F in comex_init_args at comex.c:1411
  79. 9 0x4B735CF in comex_init at comex.c:1395
  80. 10 0x4B7369F in comex_init_args at comex.c:1411
  81. 11 0x4B6E7E5 in PARMCI_Init_args at armci.c:178
  82. 9 0x4B6E7E5 in PARMCI_Init_args at armci.c:178
  83. 11 0x4B6E7E5 in PARMCI_Init_args at armci.c:178
  84. 12 0x4B3A42A in install_nxtval
  85. 12 0x4B3A42A in install_nxtval
  86. 10 0x4B3A42A in install_nxtval
  87. 13 0x4B3A1CD in tcgi_alt_pbegin
  88. 13 0x4B3A1CD in tcgi_alt_pbegin
  89. 11 0x4B3A1CD in tcgi_alt_pbegin
  90. 14 0x4B3A235 in tcgi_pbegin
  91. 12 0x4B3A235 in tcgi_pbegin
  92. 14 0x4B3A235 in tcgi_pbegin
  93. 13 0x4B38F1B in pbeginf_
  94. 15 0x4B38F1B in pbeginf_
  95. 15 0x4B38F1B in pbeginf_
  96. 14 0x54551D in nwchem at nwchem.F:84
  97. 16 0x54551D in nwchem at nwchem.F:84
  98. 16 0x54551D in nwchem at nwchem.F:84