Compiling nwchem-6.3 in a contemporary HPC with Xeon PHI

Click here for full thread
Gets Around
QA tests in single cpu still run 10 times faster than my fastest binary, and my binary is much faster than the binary supplied with ubuntu for example. How do they achieve that performance?

Which QA tests? If they are tests with significant disk I/O that can dominate wall clock time. It's possible that the single-processor QA outputs were generated on systems with fast scratch disk, like a RAID system, SSD, or (even faster) something like Lustre running over a fast network.