Parallel execution takes longer than serial


Click here for full thread
Forum Vet
There can be many reasons for this.

Using parallel.x you are essentially communicating via sockets, even on a single node. I would recommend you compile with MPI.

Another aspect can be disk. Depending on the disk bandwidth and latency, having multiple processors write to the same file system might slow down the calculation.

Bert



 [QUOTE=Deburgess Apr 7th 12:44 pm]Under parallel execution, the cpu time decreases by a factor of (1/3), but the wall time doubles compared to serial execution:

Parallel Timing:

    >>>  JOB COMPLETED     AT Sat Apr  7 07:46:57 2012  <<<

Task  times  cpu:       48.1s     wall:      178.5s
Summary of allocated global arrays


 No active global arrays



                        GA Statistics for process    0
------------------------------

      create   destroy   get      put      acc     scatter   gather  read&inc
calls: 0 0 0 0 0 0 0 0
number of processes/call 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes total: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes remote: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 0 bytes
MA_summarize_allocated_blocks: starting scan ...
MA_summarize_allocated_blocks: scan completed: 0 heap blocks, 0 stack blocks
MA usage statistics:

allocation statistics:
heap stack
---- -----
current number of blocks 0 0
maximum number of blocks 292 19
current total bytes 0 0
maximum total bytes 160737864 32171064
maximum total K-bytes 160738 32172
maximum total M-bytes 161 33


                               NWChem Input Module
-------------------

Total times cpu: 56.9s wall: 317.7s
Creating: host=dirac.asbury.edu, user=deburg0,
file=/scratch/deburg0/nwchem-6.1/bin/LINUX64/nwchem, port=49546




Serial Timing:
    >>>  JOB COMPLETED     AT Sat Apr  7 08:11:59 2012  <<<

Task  times  cpu:      151.1s     wall:      151.3s
Summary of allocated global arrays


 No active global arrays



                        GA Statistics for process    0
------------------------------

      create   destroy   get      put      acc     scatter   gather  read&inc
calls: 0 0 0 0 0 0 0 0
number of processes/call 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes total: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes remote: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 0 bytes
MA_summarize_allocated_blocks: starting scan ...
MA_summarize_allocated_blocks: scan completed: 0 heap blocks, 0 stack blocks
MA usage statistics:

allocation statistics:
heap stack
---- -----
current number of blocks 0 0
maximum number of blocks 292 19
current total bytes 0 0
maximum total bytes 638357352 128682232
maximum total K-bytes 638358 128683
maximum total M-bytes 639 129


                               NWChem Input Module
-------------------
Total times cpu: 179.2s wall: 179.6s[/quote]