NWChem over Gigabit Ethernet

Click here for full thread

2:42:09 PM PDT - Wed, Jun 29th 2011
I use NWChem on two nodes, which are connected to each other over 4 bonded gigabit ethernet ports. Bandwidth tests with iperf showed a usable bandwidth of 2,33 GBit/s. Now I tried to start a Job distributed on both nodes. During SCF the scaling is very good. However, during gradient evaluation the CPU usage on the second node drops to 30 to 50 % and on the first node to 80 to 90 %. The bandwidth usage never exceeds approx. 15% of the available 2,33 GBit/s. So I like to know, whether it is possible to improve bandwidth usage and scaling performance. Hardware: node 1: AMD Phenom II 1090T (6 x 3,51 GHz), 8 GB RAM node 2: AMD Phenom II 965 (4 x 3,4 GHz), 4 GB RAM Software: Linux 2.6.37 running openSuSE 11.4 NWChem Apr 15 2011 /proc/sys/net/ipv4/tcp_low_latency set to 1 mtu set to 7200, which is the network driver's maximum OpenMPI 1.4.3 Thanks