CCSD(T) OpenMP Threads


Clicked A Few Times
I have been studying the performance of CCSD(T)/aug-cc-pvqz calculations of a a (H2O)6 molecules (~1000 basis functions)
using both TCE and conventional algorithms on a KNL cluster with a copy of source code obtained from developer trunk,

OpenMP have been implemented and enabled ("NWChem w/ OpenMP: maximum threads = 4" printed in the top of output )

I run multiple MPI rank and OMP threads in each node, but, for the conventional CCSD algorithm, I found that no matter what OMP_NUM_THREADS is set, only 1 OMP thread is used in CCSD iteration ("Using 1 OpenMP thread(s) in CCSD" printed in the output), and "top" shows that only 1 MPI rank is running 4 threads and 1 rank using 2 threads, and all other ranks are single threads. Is it expected to work like this, or there is a load balancing problem?

Thanks!

 >echo $OMP_NUM_THREADS
4
>mpirun -perhost 11 -np 220 $EXE0 w6cage_ccsd.nw

 argument  1 = w6cage_ccsd.nw
NWChem w/ OpenMP: maximum threads = 4
============================== echo of input deck ============================== echo

start w6cage_ccsd

memory stack 8000 mb heap 100 mb global 10000 mb noverify
...
***** ccsd parameters *****
iprt = 0
convi = 0.100E-03
maxit = 20
mxvec = 5
memory 1060598348
Using 1 OpenMP thread(s) in CCSD
IO offset 20.0000000000000
IO error message >End of File
file_read_ga: failing reading from ./w6cage_ccsd.t2
Failed reading restart vector from ./w6cage_ccsd.t2
Using MP2 initial guess vector


-------------------------------------------------------------------------
iter correlation delta rms T2 Non-T2 Main
energy energy error ampl ampl Block
time time time
-------------------------------------------------------------------------
1 -1.7186198644 -1.719D+00 5.469D-01 4530.96 0.11 4443.36
2 -1.7539631587 -3.534D-02 2.744D-01 4524.76 0.11 4445.22

Top:
top - 10:53:41 up 2 days, 19:42,  1 user,  load average: 13.91, 13.81, 13.89
Tasks: 2169 total, 588 running, 1581 sleeping, 0 stopped, 0 zombie
%Cpu(s): 21.7 us, 0.8 sy, 0.0 ni, 77.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 19767712+total, 14988190+free, 41493892 used, 6301332 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 14941336+avail Mem

PID USER PR NI VIRT RES SHR S  %CPU %MEM TIME+ COMMAND
21044 chiensh 20 0 5022244 2.483g 2.222g R 383.1 1.3 625:55.13 nwchem
21039 chiensh 20 0 12.614g 3.846g 239088 R 145.3 2.0 253:17.89 nwchem
21036 chiensh 20 0 12.626g 3.883g 270192 R 100.0 2.1 261:12.81 nwchem
21037 chiensh 20 0 12.626g 4.077g 269976 R 100.0 2.2 256:56.85 nwchem
21038 chiensh 20 0 12.611g 3.848g 239156 R 100.0 2.0 253:11.01 nwchem
21043 chiensh 20 0 12.611g 3.801g 181656 R 100.0 2.0 255:17.57 nwchem
21034 chiensh 20 0 12.644g 4.031g 302296 R 99.7 2.1 257:55.96 nwchem
21035 chiensh 20 0 12.625g 3.979g 272528 R 99.7 2.1 259:58.57 nwchem
21040 chiensh 20 0 12.614g 3.809g 239108 R 99.7 2.0 254:07.24 nwchem
21041 chiensh 20 0 12.610g 3.798g 238896 R 99.7 2.0 253:51.30 nwchem
21042 chiensh 20 0 12.611g 3.839g 190128 R 99.7 2.0 254:17.43 nwchem


Input
 echo
start w6cage_ccsd
memory stack 8000 mb heap 100 mb global 10000 mb noverify
geometry units angstrom noautoz noprint
...
end

basis "ao basis" spherical noprint
* library aug-cc-pvqz
end

scf
vectors input w6cage_ccsd.movecs
semidirect memsize 100000000 filesize 0
singlet
rhf
thresh 1e-7
tol2e 1e-14
end

ccsd
freeze atomic
NODISK
thresh 1e-4
end

task ccsd(t) energy

set ccsd:use_trpdrv_nb T
set ccsd:use_ccsd_omp T
set ccsd:use_trpdrv_omp T

Gets Around
It seems there is a bug in the logic. I'll see if I can fix it.

Gets Around
By the way, I get email for every GitHub issue but not for this Forum, so if you think I am the right person to address your issue, please create a GitHub issue.

Gets Around
I do not see this issue.

Here is my test input:

echo

start w5_rccsdpt_cc-pvdz_energy

memory stack 4000 mb heap 100 mb global 4000 mb noverify

permanent_dir /tmp
scratch_dir /tmp

geometry units angstrom noautoz noprint
  O       2.289015       0.225784       0.175030
  H       1.837891      -0.638872       0.046444
  H       2.811304       0.122451       0.974687
  O       0.929887      -2.095904      -0.167528
  H      -0.037083      -1.936553      -0.084181
  H       1.034959      -2.583078      -0.988978
  O      -1.718101      -1.549268       0.073447
  H      -1.882083      -0.580570       0.056990
  H      -2.170566      -1.871677       0.857083
  O      -1.987637       1.157925      -0.077866
  H      -1.103971       1.590183      -0.076556
  H      -2.534625       1.699982       0.496152
  O       0.498426       2.249945      -0.063688
  H       1.178269       1.547359       0.044627
  H       0.773193       2.742924      -0.841426
end

basis "ao basis" spherical noprint
  * library cc-pvdz
end

scf
  singlet
  rhf
  thresh 1e-8
end

ccsd
  freeze atomic
  thresh 1e-9
  #nodisk
end

set ccsd:use_ccsd_omp T
task ccsd energy

set ccsd:use_ccsd_omp F
task ccsd energy


Here is the relevant portion of the output file:
 ****************************************************************************
              the segmented parallel ccsd program:    1 nodes
 ****************************************************************************




 level of theory    ccsd
 number of core         5
 number of occupied    20
 number of virtual     95
 number of deleted      0
 total functions      120
 number of shells      60
 basis label          566



   ***** ccsd parameters *****
   iprt   =     0
   convi  =  0.100E-08
   maxit  =    20
   mxvec  =     5
 memory             537375052
  Using 16 OpenMP thread(s) in CCSD
  IO offset    20.0000000000000     
  IO error message >End of File
  file_read_ga: failing reading from /tmp/w5_rccsdpt_cc-pvdz_energy.t2
  Failed reading restart vector from /tmp/w5_rccsdpt_cc-pvdz_energy.t2
  Using MP2 initial guess vector 


-------------------------------------------------------------------------
 iter     correlation     delta       rms       T2     Non-T2      Main
             energy      energy      error      ampl     ampl      Block
                                                time     time      time
-------------------------------------------------------------------------

Gets Around
Your input file is incorrect. You must set the RTDB variables before invoking the task.

Good:
 set ccsd:use_trpdrv_nb T 
 set ccsd:use_ccsd_omp T
 set ccsd:use_trpdrv_omp T
 task ccsd(t) energy


Bad:
 task ccsd(t) energy
 set ccsd:use_trpdrv_nb T 
 set ccsd:use_ccsd_omp T
 set ccsd:use_trpdrv_omp T


Forum >> NWChem's corner >> Running NWChem