Wall time vs CPU time


Clicked A Few Times
Hello.

I been trying to run a calculation with a relatively big structure (see input file) but, when I checked the wall and the CPU times there is a difference roughly of 2X the wall time. The calculation was initially performed in semidirect mode DFT, I thought that it may be the problem so I switched to direct but did not work either.

CPU : AMD FX 8150
RAM : 16GB

start m3
charge 0
memory heap 1000 mb stack 1000 mb global 2000 mb total 4000 mb
geometry units angstroms
C                  0.26288924    2.05335770   -2.41437049
C 1.48317563 2.19349886 -1.73969757
C 1.51239146 2.74466222 -0.45156429
C 0.32132107 3.15568582 0.16189547
C -0.89896545 3.01554363 -0.51277700
C -0.92818149 2.46437857 -1.80090955
H 0.24058225 1.63253176 -3.39788862
H 2.39258444 1.87967433 -2.20808819
H 0.34362826 3.57651344 1.14541289
H -1.80837404 3.32936999 -0.04438717
H -1.85989756 2.35737504 -2.31603614
Si 3.20167023 2.93866190 0.48240646
C 3.10944263 4.46584022 1.67522683
H 4.04115824 4.57284005 2.19035503
H 2.91235898 5.35082409 1.10701773
H 2.32394288 4.31616622 2.38620351
C 3.55899947 1.33411171 1.51261736
H 3.60986727 0.49180202 0.85472158
H 4.49071508 1.44111153 2.02774555
H 2.77349972 1.18443771 2.22359403
C 4.62584734 3.21003345 -0.80665406
H 5.55756295 3.31703327 -0.29152586
H 4.67671514 2.36772376 -1.46454984
H 4.42876369 4.09501732 -1.37486317
O 8.00459177 3.10509648 2.99603334
H 7.68413719 4.01003061 2.99778807
Si 9.83459177 3.10509648 2.99603334
H 10.32459134 1.71916914 2.99361462
H 10.32459134 3.80015513 1.79699460
H 10.32459203 3.79596519 4.19749086
end

basis
H library 6-311+G**
C library 6-311+G**
O library 6-311+G**
Si library lanl2dz_ecp
end

ecp
Si library lanl2dz_ecp
end

driver
maxiter 100
end

dft
mult 1
xc b3lyp
disp vdw 3
maxiter 200
end

task dft optimize
task dft frequencies



any ideas of what may be the problem?

Thanks!

Forum Vet
How many cores have you been using?
I would not use more than 4 core on a FX-8150 since there are only four floating-point cores.

Clicked A Few Times
Hi Edo.

I have been using eight cores. Now I am going to try with four and see what happens.

Thanks for your reply!

Clicked A Few Times
Hi Edo.

It seems that it does not work. I tried with four cores instead of eight and the same is happening, another thing that I did not notice is that the computer seems to lag for short times and when nwchem is solving the atomic-orbital integral each executable is using much less than 100%, but, when computing the SCF (I think) the running comes back to normal (roughly 100% of the core speed).

Thanks again!

Gets Around
It sounds like you don't have enough RAM allocated to integral storage to avoid disk access altogether. Regardless of how much RAM I give NWChem overall its automatic memory allocation does not seem to give very much to integral storage, so I increase it manually when I can and it gives good speedups.

Inside the DFT block try a directive like this:

semidirect memsize NUMBER_OF_WORDS filesize 0

You should be able to tell from your previous run log files how much space you actually need for the integrals.

EDIT: and with a fairly large basis set and system, plus only 16 GB of RAM, you may not be able to get "filesize 0" working. Some disk storage may be unavoidable, but that's what I would look at. You can still get some speedup by storing as much as possible in RAM via manually increasing memsize.

Forum Vet
Quote:Frank.ramirez Dec 9th 11:47 am
Hi Edo.

It seems that it does not work. I tried with four cores instead of eight and the same is happening, another thing that I did not notice is that the computer seems to lag for short times and when nwchem is solving the atomic-orbital integral each executable is using much less than 100%, but, when computing the SCF (I think) the running comes back to normal (roughly 100% of the core speed).

Thanks again!


Does the same behavior occur when you run direct?

Clicked A Few Times
Hi Edo.

That seems to solve the problem. Would you tell me why that may happen to avoid further problems.

Thanks a lot.

Forum Vet
Quote:Frank.ramirez Dec 9th 1:01 pm
Hi Edo.

That seems to solve the problem. Would you tell me why that may happen to avoid further problems.

Thanks a lot.


Four processes doing I/O (writing and then reading) are overwhelming the performances of the disk you are using.

Using conventional SCF and just a couple of processes might work.

Anyhow, most of the times the direct option (and maximizing the number of cores used) is likely to give the fastest time to solution.

The alternative is storing the integrals in memory, if you have enough memory available.

Clicked A Few Times
Thanks!


Forum >> NWChem's corner >> General Topics