NWChem over Gigabit Ethernet

Click here for full thread

5:35:33 AM PDT - Fri, Jul 15th 2011
Hello and thanks for your reply. I run 6 processes on node 1 and 4 processes on node 2. Both nodes use DDR 3 memory with the same bandwidth. I never observed any swapping. Further I use the "direct" directive in all runs, so I think the disk speed cannot be limiting. As mentioned, scaling during SCF if very good. I ran one job on node 1 and the same on both nodes. On both nodes the wall clock time decreased with a divisor of 1.6. Since both nodes together have (theoretical) 34,66 GHz, which is 1.65 times more than node 1 alone, this is almost perfect. However, during an optimization the total wall clock time decreases only with a divisor of 1.3. I assume that this is due to the gradient evaluation and the thereby occurring drop in cpu usage. The problem exist also if I run only 1 process per node, so maybe latency is limiting. A ping request takes usually about 0.08 ms from one node to the other. Do you have any experience whether this is to long or how else I may check the latency? Or do you have any ideas how to decrease the impact of high latency? Thanks.