Floating Point Exception using shared file IO with TCE


Click here for full thread
Gets Around
bug in mutex patch?
I was running the QA tests in parallel mode after building the patched July snapshot and noticed that the li2h2_tce_ccsd.nw test was stuck. It had been running for 70 minutes with CPUs pinned at 100% but making no progress. The unpatched version completes the job in a few minutes with either serial or parallel execution. The patched version completes the job only in serial execution. Having written some deadlocking mutex code myself in the past, that's what this feels like to me. Here's the output from a 2 processor attempt up to the point it stalls, not that I spotted any great clues: http://www.sciencemadness.org/cc/li2h2_tce_ccsd.nwo.gz