5:20:26 PM PDT - Fri, Jun 6th 2014 |
|
Yury, I am using GNU compilers and the default openmpi system package that comes with Ubuntu 14.04 rather than Intel tools, but I too had poor performance with MPI-TS. I normally build with ARMCI_NETWORK=SOCKETS but I wanted to try MPI-TS in an attempt to fix problems similar to those reported here: http://nwchemgit.github.io/Special_AWCforum/st/id1298/armci_malloc%3Amalloc_1_fail...
I am using the most recent code snapshot in downloads: http://nwchemgit.github.io/download.php?f=Nwchem-6.3.revision25564-src.2014-05-03.tar.gz
Running serially, the QA test case tce_polar_ccsd_small took 314 seconds wall clock time.
With ARMCI_NETWORK=SOCKETS the same case ran in 139 seconds using 4 cores on a 4 core i7 system. That's OK scaling, not great, but my scaling is limited on most non-direct jobs because of slow disk I/O. I also tried an "mpi serial" job with the same build using mpirun -np 1 and that took 317 seconds wall clock time, basically unchanged from simple serial.
With ARMCI_NETWORK=MPI-TS on the same hardware and OS the wall clock time went to 518 seconds on 4 cores for tce_polar_ccsd_small. That's slower than serial, as you experienced.
|