Problem with ubuntu and NWChem


Clicked A Few Times
Hi,

I use ubuntu 12.04 as the platform for NWChem.

When I ran a small job (molecule with 10 atoms), the job can be computed completely.

But for larger molecule (37 atoms), the computer restarted during the calculation.

It seems that when the computer restarts when the computational time is long.

The computer showed the following error:



Ubuntu 12.04.2 LTS ubuntu tty5

ubuntu login: [ 249.514018] [Hardware Error]: CPU:0oMC0_STATUS[-|UE|-|PCC|AddrV|CECC]: 0xb66d400000000135
[ 249.514018] [Hardware Error]: oMC0_ADDR: 0x00000000748da370
[ 249.514018] [Hardware Error]: Data Cache Error: Data/Tag DRD error.
[ 249.514018] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
[ 249.514018] [Hardware Error]: CPU 0: Machiine Check Exception: 4 Bank 0: b66d400000000135
[ 249.514018] [Hardware Error]: TSC c8a41658fc ADDR 748da370
[ 249.514018] [Hardware Error]: PROCESSOR 2: 100f23 TIME 1373431851 SOCKET 0 APIC 0 microcode 1000095
[ 249.514018] [Hardware Error]: CPU:0oMC0_STATUS[-|UE|-|PCC|AddrV|CECC]: 0xb66d400000000135
[ 249.514018] [Hardware Error]: oMC0_ADDR: 0x00000000748da370
[ 249.514018] [Hardware Error]: Data Cache Error: Data/Tag DRD error.
[ 249.514018] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
[ 249.514018] [Hardware Error]: Machine check: Invalid
[ 249.514018] Kernel panic - not syncing: Fatal machine check on current CPU
[ 249.514018] panic occurred, switching back to text console



The problem repeated continuously when I run NWChem.

However, the computer does not auto-restart even it is idle for a long long time.

Is the ubuntu not compabile the NWChem?

Thanks


hong420

Forum Vet
Hadrware error
Dear Hong420,
I think that the computer you are using has some hardware problems.
NWChem is perfectly compatible with NWChem (I have it on my laptop where I do most of my development work and I do run -- occassionally - some heavy runs on it).
I am tempted to believe that your computer has some hardware problem (possibly on the CPU cache) that NWChem runs can uncover and not small jobs.
Another explanation could be that the linux kernel you are using is incorrectly diagnosing HW errors that don't really occur.

Regards, Edo

Clicked A Few Times
Quote:Edoapra Jul 15th 9:01 am
Dear Hong420,
I think that the computer you are using has some hardware problems.
NWChem is perfectly compatible with NWChem (I have it on my laptop where I do most of my development work and I do run -- occassionally - some heavy runs on it).
I am tempted to believe that your computer has some hardware problem (possibly on the CPU cache) that NWChem runs can uncover and not small jobs.
Another explanation could be that the linux kernel you are using is incorrectly diagnosing HW errors that don't really occur.

Regards, Edo



Thank you for your reply.

"Another explanation could be that the linux kernel you are using is incorrectly diagnosing HW errors that don't really occur."

If it is the case, how can I check whether it is incorrect?

thank you.

Regards,
Hong420

Clicked A Few Times
Hello Hong420,

It does seem like it might be an issue when the computer is under heavy load.

I would suggest running stress linux, it will help identify which component, if any, is failing under high load.

regards

Clicked A Few Times
Thank you for all of your reply.

I have replace another CPU. The computer does not restart even I run a job with larger molecule.

Thank you.

Regards,
Hong420


Forum >> NWChem's corner >> General Topics