2:42:09 PM PDT - Wed, Jun 29th 2011 |
|
I use NWChem on two nodes, which are connected to each other over 4 bonded gigabit ethernet ports. Bandwidth tests with iperf showed a usable bandwidth of 2,33 GBit/s. Now I tried to start a Job distributed on both nodes. During SCF the scaling is very good. However, during gradient evaluation the CPU usage on the second node drops to 30 to 50 % and on the first node to 80 to 90 %. The bandwidth usage never exceeds approx. 15% of the available 2,33 GBit/s. So I like to know, whether it is possible to improve bandwidth usage and scaling performance.
Hardware:
node 1: AMD Phenom II 1090T (6 x 3,51 GHz), 8 GB RAM
node 2: AMD Phenom II 965 (4 x 3,4 GHz), 4 GB RAM
Software:
Linux 2.6.37 running openSuSE 11.4
NWChem Apr 15 2011
/proc/sys/net/ipv4/tcp_low_latency set to 1
mtu set to 7200, which is the network driver's maximum
OpenMPI 1.4.3
Thanks
|