5:09:02 PM PDT - Wed, Jul 6th 2011 |
|
I have not seen anything on scaling performance, i.e. faster runs with more processors.
Are the nodes used fully packed? Nodes have different speeds, which will affect memory access, network access. They are probably different boards, do they even have the same bandwidth to memory? Nodes have different amounts of memory. Are you taxing node 2 more (some swapping)?
If there are differences bandwidth between the two nodes then one of them could be waiting more then the other.
NWChem uses disk too, are the disk systems the same?
As to your questions:
Improve bandwidth usage, might be that your molecule and data distribution doesn't need more. Much is actually driven by latency as the messages are not too large.
Scaling performance, don't see any data so can't comment.
Bert
Quote: Jun 29th 9:42 pmI use NWChem on two nodes, which are connected to each other over 4 bonded gigabit ethernet ports. Bandwidth tests with iperf showed a usable bandwidth of 2,33 GBit/s. Now I tried to start a Job distributed on both nodes. During SCF the scaling is very good. However, during gradient evaluation the CPU usage on the second node drops to 30 to 50 % and on the first node to 80 to 90 %. The bandwidth usage never exceeds approx. 15% of the available 2,33 GBit/s. So I like to know, whether it is possible to improve bandwidth usage and scaling performance.
Hardware:
node 1: AMD Phenom II 1090T (6 x 3,51 GHz), 8 GB RAM
node 2: AMD Phenom II 965 (4 x 3,4 GHz), 4 GB RAM
Software:
Linux 2.6.37 running openSuSE 11.4
NWChem Apr 15 2011
/proc/sys/net/ipv4/tcp_low_latency set to 1
mtu set to 7200, which is the network driver's maximum
OpenMPI 1.4.3
Thanks
|