error ival=5


Click here for full thread
Clicked A Few Times
Update
- I am now seeing ival=12, rather than ival=5
- The failures are all occurring on a node other than the one hosting ranks 0 through n (for testing, I have n=7).
- stderr shows the last system-level error as "Bad address." strace shows some errors like this from sched_setaffinity