SCF Performance for Different ARMCI Network on Socket-based KNL Cluster

Click here for full thread
Gets Around
"mpirun -perhost 1 -genv CSP_NG 1" runs one MPI rank per node, does it not? I don't know how Casper even functions in that case. You need to launch N+G processes per node, where N is the number of application processes per node, and CSP_NG=G.