CPU load drops very low

Guest -

7:23:12 AM PST - Mon, Jan 9th 2012
Hi Apologies if this has been discussed. I am running NWChem on a Linux cluster (12 cores/24 threads) using the following command /opt/openmpi/bin/mpirun -np 24 nwchem bv6-esp-res.nw >& bv6-esp-res2.out & The job starts normally on all threads with high CPU load (~100%) but after some seconds the CPU load drops very low, to ~2%-15%. Also, the number of processes appears to decrease and fluctuate. Is this behaviour normal? Thanks in advance George

Forum Regular

5:56:38 PM PST - Mon, Jan 9th 2012
Hi, Can you post or send me your input file ? Thanks. -Niri niri.govind@pnnl.gov

Guest -

4:22:38 AM PST - Tue, Jan 10th 2012
George, Storngly suggest to run one process per core, not two (for one to reduce memory and communication pressure). Also, I don't know how much data is written to disk, which could affect the CPU usage. The number of processes should be constant. Bert Quote: Jan 9th 2:23 pm Hi Apologies if this has been discussed. I am running NWChem on a Linux cluster (12 cores/24 threads) using the following command /opt/openmpi/bin/mpirun -np 24 nwchem bv6-esp-res.nw >& bv6-esp-res2.out & The job starts normally on all threads with high CPU load (~100%) but after some seconds the CPU load drops very low, to ~2%-15%. Also, the number of processes appears to decrease and fluctuate. Is this behaviour normal? Thanks in advance George

Guest -

1:40:15 AM PST - Wed, Jan 11th 2012
Thanks for your reply. I am trying now to run NWChem on an Apple cluster. After a day or so of running the job aborts with the following message: [xgrid-node05:95464] * Process received signal * [xgrid-node05:95464] Signal: Segmentation fault (11) [xgrid-node05:95464] Signal code: Address not mapped (1) [xgrid-node05:95464] Failing at address: 0x17cd0a000 [xgrid-node05:95464] [ 0] 2 libSystem.B.dylib 0x00007fff8274f66a _sigtramp + 26 [xgrid-node05:95464] [ 1] 3 ??? 0x000000010de39d90 0x0 + 4527988112 [xgrid-node05:95464] [ 2] 4 nwchem 0x00000001025d4406 ga_dadd_ + 14 [xgrid-node05:95464] [ 3] 5 nwchem 0x00000001001a4959 diis_hamwgt_ + 345 [xgrid-node05:95464] [ 4] 6 nwchem 0x00000001001a3c99 diis_driver_ + 585 [xgrid-node05:95464] [ 5] 7 nwchem 0x000000010018a810 dft_scf_ + 16128 [xgrid-node05:95464] [ 6] 8 nwchem 0x0000000100186021 dft_main0d_ + 7728 [xgrid-node05:95464] [ 7] 9 nwchem 0x00000001002e7ad7 nwdft_ + 2936 [xgrid-node05:95464] [ 8] 10 nwchem 0x00000001002e7f81 dft_energy_ + 68 [xgrid-node05:95464] [ 9] 11 nwchem 0x0000000100008e92 task_energy_doit_ + 840 [xgrid-node05:95464] [10] 12 nwchem 0x000000010000a7d8 task_energy_ + 610 [xgrid-node05:95464] [11] 13 nwchem 0x0000000100014082 task_ + 3660 [xgrid-node05:95464] [12] 14 nwchem 0x000000010000312b MAIN__ + 1404 [xgrid-node05:95464] [13] 15 nwchem 0x000000010262586e main + 14 [xgrid-node05:95464] [14] 16 nwchem 0x0000000100001804 start + 52 Is this a memory problem? Should I recompile with export LARGE_FILES=TRUE ? George

Guest -

1:27:27 PM PST - Thu, Jan 12th 2012
Same problem
I have the same problem runnig with mpi. But i noticed the drop starts Before the scf iterations and presists during them. Also this seems to happen only for systems(S.P.E. even) with basis functions >380(approximately). Jonathan

Just Got Here

3:44:49 PM PST - Thu, Jan 12th 2012
Quote: Jan 12th 8:27 pm I have the same problem runnig with mpi. But i noticed the drop starts Before the scf iterations and presists during them. Also this seems to happen only for systems(S.P.E. even) with basis functions >380(approximately). Jonathan The following worked for me, adding the line, semidirect memsize 200000000 filesize 0 Which saves integrals to the given memory.

Gets Around

3:10:35 AM PST - Mon, Jan 30th 2012
I guess the problem you see is the IO of Nwchem: check the thread http://nwchemgit.github.io/Special_AWCforum/st/id271/junk_files%3A_what_are_they.h...

Guest -

12:57:37 PM PST - Tue, Jan 31st 2012
Thanks for this. Indeed, it was I/O problem. It occurred because I had the scratch dir located on the head node while running the job on another node.

Forum >> NWChem's corner >> Running NWChem