1:46:58 PM PST - Thu, Nov 5th 2015 |
|
Is it a sufficiently demanding calculation that you would expect it to make efficient use of the assigned hardware resources? Really small calculations will show poor or even negative scaling at 12 cores even when all cores are on the same motherboard.
Try using tcpdump to see how many bytes and packets are transferred during your test run. If there are a lot of smallish messages, I think you are fundamentally limited by latency.
Apart from your scaling woes, you may wish to install version 6.6 from source. There have been a lot of bug fixes and enhancements since 6.3. The Ubuntu package won't be linked with a high performance BLAS either.
|
|