There can be many reasons for this.
Using parallel.x you are essentially communicating via sockets, even on a single node. I would recommend you compile with MPI.
Another aspect can be disk. Depending on the disk bandwidth and latency, having multiple processors write to the same file system might slow down the calculation.
Bert
[QUOTE=Deburgess Apr 7th 12:44 pm]Under parallel execution, the cpu time decreases by a factor of (1/3), but the wall time doubles compared to serial execution:
Parallel Timing:
>>> JOB COMPLETED AT Sat Apr 7 07:46:57 2012 <<<
Task times cpu: 48.1s wall: 178.5s
Summary of allocated global arrays
No active global arrays
GA Statistics for process 0
------------------------------
create destroy get put acc scatter gather read&inc
calls: 0 0 0 0 0 0 0 0
number of processes/call 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes total: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes remote: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 0 bytes
MA_summarize_allocated_blocks: starting scan ...
MA_summarize_allocated_blocks: scan completed: 0 heap blocks, 0 stack blocks
MA usage statistics:
allocation statistics:
heap stack
---- -----
current number of blocks 0 0
maximum number of blocks 292 19
current total bytes 0 0
maximum total bytes 160737864 32171064
maximum total K-bytes 160738 32172
maximum total M-bytes 161 33
NWChem Input Module
-------------------
Total times cpu: 56.9s wall: 317.7s
Creating: host=dirac.asbury.edu, user=deburg0,
file=/scratch/deburg0/nwchem-6.1/bin/LINUX64/nwchem, port=49546
Serial Timing:
>>> JOB COMPLETED AT Sat Apr 7 08:11:59 2012 <<<
Task times cpu: 151.1s wall: 151.3s
Summary of allocated global arrays
No active global arrays
GA Statistics for process 0
------------------------------
create destroy get put acc scatter gather read&inc
calls: 0 0 0 0 0 0 0 0
number of processes/call 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes total: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes remote: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 0 bytes
MA_summarize_allocated_blocks: starting scan ...
MA_summarize_allocated_blocks: scan completed: 0 heap blocks, 0 stack blocks
MA usage statistics:
allocation statistics:
heap stack
---- -----
current number of blocks 0 0
maximum number of blocks 292 19
current total bytes 0 0
maximum total bytes 638357352 128682232
maximum total K-bytes 638358 128683
maximum total M-bytes 639 129
NWChem Input Module
-------------------
Total times cpu: 179.2s wall: 179.6s[/quote]
|