11:46:36 PM PDT - Sun, Jun 16th 2013 |
|
tried adding
io ga
2eorb
2emet 13
to TCE.
In addition I've set
tilesize 20
attilesize 30.
Didn't help.
However, playing with the #nodes/#cores revealed, that only beyond 30 nodes does the program get past the mentioned crush point and runs the CCSD iterations.
Also I noticed that the speed is highest when using 2 cores per node. Running on 30 nodes with 2 cores per node doesn't seem very rational.
Any clue what could be wrong when running on less than 30 nodes?
Btw, for many-nodes run I used nodes with less memory: 4GB/core with max 64GB.
|
|