I've tried playing with the memory per core, but it didn't help. I've compiled a version with dmapp and Cray version of GA (as recommended in the docs) and looks like it is working. I guess it is not as good as gemini armci network for XE6? Do you know if there is any preference and performance difference between these 2 on Gemini interconnect?